clusterProfiler.dplyr：用dplyr操作富集分析结果

出错的原因在于，我根本就没给GSEA分析的结果写barplot方法，所以默认去调用graphics::barplot.default了，于是出错。

但这样的图，对于我们来说，简直简单的不要不要的。

首先跑一下GSEA的分析, 这里用ReactomePA来跑一下通路的GSEA分析：

library(ReactomePA)
data(geneList)
x <- gsePathway(geneList)
结果中有308个通路是显著性的。

> x
#
# Gene Set Enrichment Analysis
#
#...@organism      human 
#...@setType      Reactome 
#...@keytype      ENTREZID 
#...@geneList      Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm      10000 
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...308 enriched terms found
'data.frame':    308 obs. of  11 variables:
 $ ID             : chr  "R-HSA-1474244" "R-HSA-216083" "R-HSA-3000178" "R-HSA-3000171" ...
 $ Description    : chr  "Extracellular matrix organization" "Integrin cell surface interactions" "ECM proteoglycans" "Non-integrin membrane-ECM interactions" ...
 $ setSize        : int  266 80 74 56 53 41 38 36 35 33 ...
 $ enrichmentScore: num  -0.458 -0.512 -0.626 -0.586 -0.592 ...
 $ NES            : num  -1.94 -1.86 -2.24 -1.99 -1.99 ...
 $ pvalue         : num  0.000133 0.000152 0.000154 0.000158 0.000159 ...
 $ p.adjust       : num  0.00295 0.00295 0.00295 0.00295 0.00295 ...
 $ qvalues        : num  0.00215 0.00215 0.00215 0.00215 0.00215 ...
 $ rank           : num  1943 1890 1890 2538 1890 ...
 $ leading_edge   : chr  "tags=33%, list=16%, signal=29%" "tags=39%, list=15%, signal=33%" "tags=46%, list=15%, signal=39%" "tags=45%, list=20%, signal=36%" ...
 $ core_enrichment: chr  "8038/11132/4017/1288/4811/3910/3371/1291/3791/831/1301/4238/7450/3685/80781/1280/1306/4314/3675/8425/977/4054/7"| __truncated__ "3371/1291/3791/7450/3685/80781/1280/3675/1278/4060/1277/1293/1295/58494/1281/83700/50509/1290/3693/1294/3339/12"| __truncated__ "3910/3371/1291/3685/1280/7042/3912/1278/4060/1277/1293/1281/50509/1290/3693/3339/1462/1289/1292/3908/3909/6678/"| __truncated__ "3915/6385/4921/1288/3910/3371/1301/3685/1280/3912/1278/1277/2247/1281/50509/1290/3693/3339/1289/3908/3909/3913/1300/1287" ...
#...Citation
  Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
  reactome pathway analysis and visualization. Molecular BioSystems
  2016, 12(2):477-479 
clusterProfiler.dplyr神器

用这个包，你可以对输出像data frame一样使用dplyr的函数去操作。

于是这里使用arrange去给NES的绝对值从大到小排序，NES是我们用于计算p值的统计量，正数越大则通路越有可能是激活的，而负数越小则通路越有可能是被抑制的，所以我们也经常看到使用这一统计量进行画图。

然后我们使用group_by把数据分组了，按照NES正负数来分，最后我们把每一组切出来前5个。也就是说分别取激活或抑制最大的5个通路。

library(clusterProfiler.dplyr)
y <- arrange(x, abs(NES)) %>% 
        group_by(sign(NES)) %>% 
        slice(1:5)

最后如我们所愿，取出10个通路。

> y
#
# Gene Set Enrichment Analysis
#
#...@organism      human 
#...@setType      Reactome 
#...@keytype      ENTREZID 
#...@geneList      Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm      10000 
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...10 enriched terms found
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    10 obs. of  12 variables:
 $ ID             : chr  "R-HSA-397014" "R-HSA-8978868" "R-HSA-9006934" "R-HSA-211859" ...
 $ Description    : chr  "Muscle contraction" "Fatty acid metabolism" "Signaling by Receptor Tyrosine Kinases" "Biological oxidations" ...
 $ setSize        : int  183 141 418 165 117 308 282 169 162 155
 $ enrichmentScore: num  -0.357 -0.37 -0.338 -0.373 -0.402 ...
 $ NES            : num  -1.45 -1.46 -1.49 -1.5 -1.54 ...
 $ pvalue         : num  0.004193 0.006929 0.000503 0.001696 0.003381 ...
 $ p.adjust       : num  0.0216 0.03289 0.00381 0.01039 0.01827 ...
 $ qvalues        : num  0.01572 0.02394 0.00277 0.00756 0.0133 ...
 $ rank           : num  2990 1951 2788 2129 2829 ...
 $ leading_edge   : chr  "tags=34%, list=24%, signal=26%" "tags=26%, list=16%, signal=22%" "tags=27%, list=22%, signal=21%" "tags=25%, list=17%, signal=21%" ...
 $ core_enrichment: chr  "59/9992/800/6543/4638/3750/2982/11280/3709/2977/6546/3672/7169/6335/783/10174/4604/6588/7139/27091/3784/1760/72"| __truncated__ "5563/284541/37/224/3248/1543/79966/2181/8644/5095/27306/80724/11001/33/6785/7923/6584/57834/1384/2166/60481/831"| __truncated__ "26052/534/2065/3709/7423/1215/2263/83464/26469/7057/4670/7072/1298/3915/5921/5441/200734/9846/9611/3315/466/111"| __truncated__ "4837/6799/1553/8459/284541/7366/8648/1543/2098/29104/1548/54996/27306/196/2232/2954/57834/6817/22977/1581/7358/"| __truncated__ ...
 $ sign(NES)      : num  -1 -1 -1 -1 -1 1 1 1 1 1
 - attr(*, "groups")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  2 variables:
  ..$ sign(NES): num  -1 1
  ..$ .rows    :List of 2
  .. ..$ : int  1 2 3 4 5
  .. ..$ : int  6 7 8 9 10
  ..- attr(*, ".drop")= logi FALSE
#...Citation
  Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
  reactome pathway analysis and visualization. Molecular BioSystems
  2016, 12(2):477-479

画图

然后你直接就可以对上面这样的对象使用ggplot进行画图，你没看错，你不需要转为data frame，而是直接对这个S4对象进行ggplot，魔法就在enrichplot包。然后一切都是ggplot语法了。

library(forcats)
library(ggplot2)
library(ggstance)
library(enrichplot)

ggplot(y, aes(NES, fct_reorder(Description, NES), fill=qvalues), showCategory=10) + 
    geom_barh(stat='identity') + 
    scale_fill_continuous(low='red', high='blue') + 
    theme_minimal() + ylab(NULL)