KEGG Module Enrichment Analysis

KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:

  • pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)
  • structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)
  • functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)
  • signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)

KEGG Modules have a much more straightforwared interpretation in many situations and there was a feature request for implementing an enrichment test from clusterProfiler user. Both hypergeometric test and GSEA of KEGG Module are now supported in clusterProfiler. Just like KEGG Pathway Analysis, clusterProfiler accesses latest online data and supports more than 2000 species listed in http://www.genome.jp/kegg/catalog/org_list.html.

To prevent confusing new users who may not fammiliar with KEGG, I created two new functions, enrichMKEGG and gseMKEGG for enrichment test of KEGG Module and keep the original functions, enrichKEGG and gseKEGG for KEGG pathway analysis only.

library(clusterProfiler)

data(geneList)
de <- names(geneList)[1:100]
xx <- enrichMKEGG(de, organism='hsa', minGSSize=1)
head(summary(xx))

##            ID
## M00693 M00693
## M00286 M00286
## M00067 M00067
## M00691 M00691
##                                                                                    Description
## M00693                                                            Cell cycle - G2/M transition
## M00286                                                                            GINS complex
## M00067 Sulfoglycolipids biosynthesis, ceramide/1-alkyl-2-acylglycerol => sulfatide/seminolipid
## M00691                                               DNA damage-induced cell cycle checkpoints
##        GeneRatio BgRatio       pvalue     p.adjust       qvalue
## M00693       3/8 10/1528 0.0000111304 5.565199e-05 1.171621e-05
## M00286       2/8  4/1528 0.0001432508 3.581269e-04 7.539514e-05
## M00067       1/8  2/1528 0.0104472034 1.741201e-02 3.665685e-03
## M00691       1/8  7/1528 0.0361484900 4.518561e-02 9.512761e-03
##              geneID Count
## M00693 9133/890/983     3
## M00286   9837/51659     2
## M00067         7368     1
## M00691         1111     1

yy <- gseMKEGG(geneList)

## [1] "calculating observed enrichment scores..."
## [1] "calculating permutation scores..."
## [1] "calculating p values..."
## [1] "done..."

head(summary(yy))

##            ID                     Description setSize enrichmentScore
## M00337 M00337                Immunoproteasome      15       0.7583644
## M00340 M00340   Proteasome, 20S core particle      13       0.7935026
## M00354 M00354 Spliceosome, U4/U6.U5 tri-snRNP      29       0.6053503
##             NES      pvalue   p.adjust    qvalues
## M00337 2.063359 0.002298851 0.03675214 0.02968961
## M00340 2.047060 0.002409639 0.03675214 0.02968961
## M00354 1.913834 0.002564103 0.03675214 0.02968961

Please refer to vignette for more details.

Citation

Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.

Plea to against BMC
comments powered by Disqus