KEGG enrichment analysis with latest online data using clusterProfiler

February 1, 2015 in R

KEGG.db is not updated since 2012. The data is now pretty old, but many of the Bioconductor packages still using it for KEGG annotation and enrichment analysis. As pointed out in ‘Are there too many biological databases’, there is a problem that many out of date biological databases often don’t get offline. This issue also exists in web-server or software that using out-of-date data. For example, the WEGO web-server stopped updating GO annotation data since 2009, and WEGO still online with many people using it. The biological story may changed totally if using a recently updated data. Seriously, We should keep an eye on this issue.

Now enrichKEGG function is reloaded with a new parameter use_internal_data. This parameter is by default setting to FALSE, and enrichKEGG function will download the latest KEGG data for enrichment analysis. If the parameter use_internal_data is explicitly setting to TRUE, it will use the KEGG.db which is still supported but not recommended. With this new feature, supported species is unlimited if only there are KEGG annotations available in KEGG database. You can access the full list of species supported by KEGG via: http://www.genome.jp/kegg/catalog/org_list.html Now the organism parameter in enrichKEGG should be abbreviation of academic name, for example ‘hsa’ for human and ‘mmu’ for mouse. It accepts any species listed in http://www.genome.jp/kegg/catalog/org_list.html. In the current release version of clusterProfiler (in Bioconductor 3.0), enrichKEGG supports about 20 species, and the organism parameter accept common name of species, for instance “human” and “mouse”. For these previously supported species, common name is also supported. So that you script is still working with new version of clusterProfiler. For other species, common name is not supported, since I don’t want to maintain such a long mapping list with many species have no common name available and it may also introduce unexpected bugs.

why clusterProfiler fails

August 7, 2014 in R, Bioinformatics

@kaji331 compared cluserProfiler with GeneAnswers and found that clusterProfiler gives larger p values.

It eventually came out that he passed the input gene as numeric vector, which was supposed to be character and he used an old version of clusterProfiler which didn’t use as.character to coerce the input

But his comment forces me to test it.

enrichment map

August 3, 2014 in Visualization, R

In PLOB’s QQ group, someone asked how to change the color of enrichment map in Cytoscape. I am very curious how enrichment map can helps to interpret enrichment results. It took me 2 hours to implement it using R and I am surprised that the enrichment map is better than anticipated.

clusterProfiler in Bioconductor 2.8

March 27, 2011 in R

In recently years, high-throughput experimental techniques such as microarray and mass spectrometry can identify many lists of genes and gene products. The most widely used strategy for high-throughput data analysis is to identify different gene clusters based on their expression profiles. Another commonly used approach is to annotate these genes to biological knowledge, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), and identify the statistically significantly enriched categories. These two different strategies were implemented in many bioconductor packages, such as Mfuzz and BHC for clustering analysis and GOstats for GO enrichment analysis.

KEGG enrichment analysis with latest online data using clusterProfiler

why clusterProfiler fails

enrichment map

clusterProfiler in Bioconductor 2.8

Guangchuang Yu