Bioinformatics

showCategory parameter for visualizing compareCluster output

I am using dotplot() to visualize results from enrichGO(), enrichDO(), enricher() and compareCluster() in clusterProfiler R package. When specifying showCategory, I get the right number of categories except with the results of compareCluser().

In my case, I use compareCluster() on a list of 3 elements:

str(ClusterList)
List of 3
 $ All : chr [1:1450] "89886" "29923" "100132891" "101410536" ...
 $ g1  : chr [1:858] "89886" "29923" "100132891" "101410536" ...
 $ g2: chr [1:592] "5325" "170691" "29953" "283392" ...
CompareGO_BP=compareCluster(ClusterList, fun="enrichGO", pvalueCutoff=0.01, pAdjustMethod="BH", OrgDb=org.Hs.eg.db,ont="BP",readable=T)

dotplot(CompareGO_BP, showCategory=10, title="GO - Biological Process")

I ask for 10 categories, but I get 15 categories in All, 8 categories in g1 and 12 categories in g2. None of the categories, neither the sum of the categories are 10…

Is the option showCategory working in the case of comparison? Am I missing something here?

And which categories precisely will it plot? the most significant whatever my 3 cases or the most significant of each case?

The question was posted in Bioconductor support site. It seems quite confusing and I think I need to write a post to clarify it.

ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization

My R/Bioconductor package, ReactomePA, published in Molecular BioSystems.

ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization

G Yu, QY He. Molecular BioSystems 2016, 12:477-479

Received: 05 Oct 2015 Accepted: 20 Nov 2015 Online: 23 Nov 2015

functional enrichment analysis with NGS data

I found a Bioconductor package, seq2pathway, that can apply functional analysis to NGS data. It consists of two components, seq2gene and gene2pathway. seq2gene converts genomic coordination to genes while gene2pathway performs functional analysis at gene level.

I think it would be interesting to incorporate seq2gene with clusterProfiler. But it fail to run due to it call absolute path of python installed in the author’s computer.

parsing BED coordinates

In supplemental file of ChIPseeker paper, I compare distances to TSS reported by several ChIP annotation software, including ChIPseeker, ChIPpeakAnno, HOMER and PeakAnalyzer.

Although I found that the chromStart positions in HOMER output have a +1 shift compare to other software, I did not realize this issue since all other software are consistent.

ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization

My R/Bioconductor package, ChIPseeker, published in Bioinformatics.

ChIPseeker had been cited by http://www.biomedcentral.com/1471-2164/16/292 and http://www.jbc.org/content/early/2015/06/18/jbc.M115.668558.short, and was used (not cited) in http://nar.oxfordjournals.org/content/early/2015/06/27/nar.gkv642.abstract and http://emboj.embopress.org/content/early/2014/12/18/embj.201490061.abstract.

BMC favors source code plagiarism

I found source code plagiarism a year ago and reported this case to BMC Systems Biology:

I listed source code of many functions that are exactly copied from GOSemSim with only function name changed in my email. The detail of source code plagiarism can also be found at Proper use of GOSemSim.

use clusterProfiler as an universal enrichment analysis tool

clusterProfiler supports enrichment analysis of both hypergeometric test and gene set enrichment analysis. It internally supports Gene Ontology analysis of about 20 species, Kyoto Encyclopedia of Genes and Genomes (KEGG) with all species that have annotation available in KEGG database, DAVID annotation (only hypergeometric test supported), Disease Ontology and Network of Cancer Genes (via DOSE for human) and Reactome Pathway (via ReactomePA for several species). This is still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (eg GO via blastgo and KEGG via KAAS), unsupported ontology/pathway or customized annotation.

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis

My R/Bioconductor package, DOSE, published in Bioinformatics.

why clusterProfiler fails

@kaji331 compared cluserProfiler with GeneAnswers and found that clusterProfiler gives larger p values.

It eventually came out that he passed the input gene as numeric vector, which was supposed to be character and he used an old version of clusterProfiler which didn’t use as.character to coerce the input

But his comment forces me to test it.