showCategory parameter for visualizing compareCluster output

November 3, 2016 in R, Visualization, Bioinformatics

I am using dotplot() to visualize results from enrichGO(), enrichDO(), enricher() and compareCluster() in clusterProfiler R package. When specifying showCategory, I get the right number of categories except with the results of compareCluser().

In my case, I use compareCluster() on a list of 3 elements:

str(ClusterList) List of 3 $ All : chr [1:1450] “89886” “29923” “100132891” “101410536” … $ g1 : chr [1:858] “89886” “29923” “100132891” “101410536” … $ g2: chr [1:592] “5325” “170691” “29953” “283392” …

CompareGO_BP=compareCluster(ClusterList, fun=“enrichGO”, pvalueCutoff=0.01, pAdjustMethod=“BH”, OrgDb=org.Hs.eg.db,ont=“BP”,readable=T)

dotplot(CompareGO_BP, showCategory=10, title=“GO - Biological Process”)

I ask for 10 categories, but I get 15 categories in All, 8 categories in g1 and 12 categories in g2. None of the categories, neither the sum of the categories are 10…

Is the option showCategory working in the case of comparison? Am I missing something here?

And which categories precisely will it plot? the most significant whatever my 3 cases or the most significant of each case?

The question was posted in Bioconductor support site. It seems quite confusing and I think I need to write a post to clarify it.

functional enrichment analysis with NGS data

August 21, 2015 in Bioinformatics, Genomics, R

I found a Bioconductor package, seq2pathway, that can apply functional analysis to NGS data. It consists of two components, seq2gene and gene2pathway. seq2gene converts genomic coordination to genes while gene2pathway performs functional analysis at gene level.

I think it would be interesting to incorporate seq2gene with clusterProfiler. But it fail to run due to it call absolute path of python installed in the author’s computer.

parsing BED coordinates

August 7, 2015 in Bioinformatics, R

In supplemental file of ChIPseeker paper, I compare distances to TSS reported by several ChIP annotation software, including ChIPseeker, ChIPpeakAnno, HOMER and PeakAnalyzer.

Although I found that the chromStart positions in HOMER output have a +1 shift compare to other software, I did not realize this issue since all other software are consistent.

BMC favors source code plagiarism

May 27, 2015 in Bioinformatics, Research, R

I found source code plagiarism a year ago and reported this case to BMC Systems Biology:

![](http://guangchuangyu.github.io/blog_images/2015/plagiarism/Screenshot 2015-05-27 19.56.58.png)

I listed source code of many functions that are exactly copied from GOSemSim with only function name changed in my email. The detail of source code plagiarism can also be found at Proper use of GOSemSim.

use clusterProfiler as an universal enrichment analysis tool

May 11, 2015 in Bioinformatics, Systems Biology, R

clusterProfiler supports enrichment analysis of both hypergeometric test and gene set enrichment analysis. It internally supports Gene Ontology analysis of about 20 species, Kyoto Encyclopedia of Genes and Genomes (KEGG) with all species that have annotation available in KEGG database, DAVID annotation (only hypergeometric test supported), Disease Ontology and Network of Cancer Genes (via DOSE for human) and Reactome Pathway (via ReactomePA for several species). This is still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (eg GO via blastgo and KEGG via KAAS), unsupported ontology/pathway or customized annotation.

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

why clusterProfiler fails

August 7, 2014 in R, Bioinformatics

@kaji331 compared cluserProfiler with GeneAnswers and found that clusterProfiler gives larger p values.

It eventually came out that he passed the input gene as numeric vector, which was supposed to be character and he used an old version of clusterProfiler which didn’t use as.character to coerce the input

But his comment forces me to test it.

Phosphoproteome profile of human lung cancer cell line A549

November 23, 2010 in Bioinformatics, Proteomics, Publication

As an in vitro model for type II human lung cancer, A549 cells resist cytotoxicity via phosphorylation of proteins as demonstrated by many studies. However, to date, no large-scale phosphoproteome investigation has been conducted on A549. Here, we performed a systematical analysis of the phosphoproteome of A549 by using mass spectrometry (MS)-based strategies. This investigation led to the identification of 337 phosphorylation sites on 181 phosphoproteins. Among them, 67 phosphoproteins and 230 phosphorylation sites identified appeared to be novel with no previous characterization in lung cancer.

showCategory parameter for visualizing compareCluster output

functional enrichment analysis with NGS data

parsing BED coordinates

BMC favors source code plagiarism

use clusterProfiler as an universal enrichment analysis tool

why clusterProfiler fails

Phosphoproteome profile of human lung cancer cell line A549

Guangchuang Yu