I am using dotplot() to visualize results from enrichGO(), enrichDO(), enricher() and compareCluster() in clusterProfiler R package. When specifying showCategory, I get the right number of categories except with the results of compareCluser().
In my case, I use compareCluster() on a list of 3 elements:
str(ClusterList) List of 3 $ All : chr [1:1450] "89886" "29923" "100132891" "101410536" ... $ g1 : chr [1:858] "89886" "29923" "100132891" "101410536" ... $ g2: chr [1:592] "5325" "170691" "29953" "283392" ... CompareGO_BP=compareCluster(ClusterList, fun="enrichGO", pvalueCutoff=0.01, pAdjustMethod="BH", OrgDb=org.Hs.eg.db,ont="BP",readable=T) dotplot(CompareGO_BP, showCategory=10, title="GO - Biological Process")
I ask for 10 categories, but I get 15 categories in All, 8 categories in g1 and 12 categories in g2. None of the categories, neither the sum of the categories are 10…
Is the option showCategory working in the case of comparison? Am I missing something here?
And which categories precisely will it plot? the most significant whatever my 3 cases or the most significant of each case?
The question was posted in Bioconductor support site. It seems quite confusing and I think I need to write a post to clarify it.
I found a Bioconductor package, seq2pathway, that can apply functional analysis to NGS data. It consists of two components, seq2gene and gene2pathway. seq2gene converts genomic coordination to genes while gene2pathway performs functional analysis at gene level.
I think it would be interesting to incorporate seq2gene with clusterProfiler. But it fail to run due to it call absolute path of python installed in the author’s computer.
Although I found that the chromStart positions in HOMER output have a +1 shift compare to other software, I did not realize this issue since all other software are consistent.
ChIPseeker had been cited by http://www.biomedcentral.com/1471-2164/16/292 and http://www.jbc.org/content/early/2015/06/18/jbc.M115.668558.short, and was used (not cited) in http://nar.oxfordjournals.org/content/early/2015/06/27/nar.gkv642.abstract and http://emboj.embopress.org/content/early/2014/12/18/embj.201490061.abstract.
clusterProfiler supports enrichment analysis of both hypergeometric test and gene set enrichment analysis. It internally supports Gene Ontology analysis of about 20 species, Kyoto Encyclopedia of Genes and Genomes (KEGG) with all species that have annotation available in KEGG database, DAVID annotation (only hypergeometric test supported), Disease Ontology and Network of Cancer Genes (via DOSE for human) and Reactome Pathway (via ReactomePA for several species). This is still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (eg GO via blastgo and KEGG via KAAS), unsupported ontology/pathway or customized annotation.
clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.
GeneAnswers and found that clusterProfiler
gives larger p values.
It eventually came out that he passed the input
gene as numeric vector, which was supposed to be character and he used an old version of
clusterProfiler which didn’t use
as.character to coerce the input
But his comment forces me to test it.