KEGG Module Enrichment Analysis
KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:
- pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)
- structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)
- functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)
- signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)
KEGG Modules have a much more straightforwared interpretation in many situations and there was a feature request for implementing an enrichment test from clusterProfiler user. Both hypergeometric test and GSEA of KEGG Module are now supported in clusterProfiler. Just like KEGG Pathway Analysis, clusterProfiler accesses latest online data and supports more than 2000 species listed in http://www.genome.jp/kegg/catalog/org_list.html.
To prevent confusing new users who may not fammiliar with KEGG, I created two new functions, enrichMKEGG and gseMKEGG for enrichment test of KEGG Module and keep the original functions, enrichKEGG and gseKEGG for KEGG pathway analysis only.
library(clusterProfiler)
data(geneList)
de <- names(geneList)[1:100]
xx <- enrichMKEGG(de, organism='hsa', minGSSize=1)
head(summary(xx))
# ID
# M00693 M00693
# M00286 M00286
# M00067 M00067
# M00691 M00691
# Description
# M00693 Cell cycle - G2/M transition
# M00286 GINS complex
# M00067 Sulfoglycolipids biosynthesis, ceramide/1-alkyl-2-acylglycerol => sulfatide/seminolipid
# M00691 DNA damage-induced cell cycle checkpoints
# GeneRatio BgRatio pvalue p.adjust qvalue
# M00693 3/8 10/1528 0.0000111304 5.565199e-05 1.171621e-05
# M00286 2/8 4/1528 0.0001432508 3.581269e-04 7.539514e-05
# M00067 1/8 2/1528 0.0104472034 1.741201e-02 3.665685e-03
# M00691 1/8 7/1528 0.0361484900 4.518561e-02 9.512761e-03
# geneID Count
# M00693 9133/890/983 3
# M00286 9837/51659 2
# M00067 7368 1
# M00691 1111 1
yy <- gseMKEGG(geneList)
# [1] "calculating observed enrichment scores..."
# [1] "calculating permutation scores..."
# [1] "calculating p values..."
# [1] "done..."
head(summary(yy))
# ID Description setSize enrichmentScore
# M00337 M00337 Immunoproteasome 15 0.7583644
# M00340 M00340 Proteasome, 20S core particle 13 0.7935026
# M00354 M00354 Spliceosome, U4/U6.U5 tri-snRNP 29 0.6053503
# NES pvalue p.adjust qvalues
# M00337 2.063359 0.002298851 0.03675214 0.02968961
# M00340 2.047060 0.002409639 0.03675214 0.02968961
# M00354 1.913834 0.002564103 0.03675214 0.02968961
Please refer to vignette for more details.
Citation
Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.