clusterProfiler supports over-representation test and gene set enrichment analysis of Gene Ontology. It supports GO annotation from OrgDb object, GMT file and user’s own data.

support many species

In github version of clusterProfiler, enrichGO and gseGO functions removed the parameter organism and add another parameter OrgDb, so that any species that have OrgDb object available can be analyzed in clusterProfiler. Bioconductor have already provide OrgDb for about 20 species, see http://bioconductor.org/packages/release/BiocViews.html#___OrgDb, and users can build OrgDb via AnnotationHub.

Continue reading

News of ggtree

A new version of ggtree that works with ggplot2 (version >= 2.0.0) is now availabel.

new layers

Some functions, add_legend, hilight, annotation_clade and annotation_clade2 were removed. Instead we provide layer functions, geom_treescale, geom_hilight and geom_cladelabel. You can use + operator to add layers using these layer functions.

In addtion, we provide geom_point2, geom_text2 and geom_segment2 which works exactly as geom_point, geom_text and geom_segment except they allow ggtree users to do subsetting.

Continue reading

use emoji font in R

![](http://guangchuangyu.github.io/blog_images/2015/Screenshot 2015-12-16 10.55.49.png)

I have played with emoji in R for a while. My solution of using it is different from what implemented in emoGG.

emoGG is a good attemp to add emoji in ggplot2. It render emoji picture (png) and creat a layer, geom_emoji, to add emoji.

In my opinion, emoji should be treated as ordinary font in user interface, albeit it maynot be true internally.

It would be more flexible if we can use emoji as ordinary font and in this way user don’t need to learn extra stuff.

Continue reading

Thanks @mevers for raising the issue to me and his efforts in benchmarking clusterProfiler.

He pointed out two issues:

  • outputs from gseGO and GSEA-P are poorly overlap.
  • pvalues from gseGO are generally smaller and don’t show a lot of variation

For GSEA analysis, we have two inputs, a ranked gene list and gene set collections.

First of all, the gene set collections are very different. The GMT file used in his test is c5.cc.v5.0.symbols.gmt, which is a tiny subset of GO CC, while clusterProfiler used the whole GO CC corpus.

Continue reading

To simplify enriched GO result, we can use slim version of GO and use enricher function to analyze.

Another strategy is to use GOSemSim to calculate similarity of GO terms and remove those highly similar terms by keeping one representative term. To make this feature available to clusterProfiler users, I develop a simplify method to reduce redundant GO terms from output of enrichGO function.

require(clusterProfiler)
data(geneList, package="DOSE")
de <- names(geneList)[abs(geneList) > 2]
bp <- enrichGO(de, ont="BP")
enrichMap(bp)

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou