Dear Dr. Guangchuang Yu, I write to you regarding a doubt concerning the enrichGO function from Clustalprofiler package. I have been used this package before, but now I’m using the same R script and I have an error message.

This is the command I use:

go.bp <- enrichGO(gene = gene.df$ENSEMBL, universe = universe.ENSEMBLID, keytype = ‘ENSEMBL’, OrgDb = org.Ce.eg.db, ont = ‘BP’, pAdjustMethod = ‘BH’, pvalueCutoff = 0.01, qvalueCutoff = 0.05, readable=T)

and the error is the following one:

No gene set have size > 10 … –> return NULL…

My input list is attached to this email (101 genes in total). When I use this list in a web resource such as GOrilla it gives to me the proper GO terms.

Thank you very much in advance. Best regards,

María

最近clusterProfiler用户的问题,这个问题还蛮普遍。这个我在《why clusterProfiler fails》中也有谈到,并不是能出结果就是好的。没有结果也是一种结果。

这里首先他用了universe = universe.ENSEMBLID,而报错信息又是No gene set have size > 10 ...,可以看出他来当背景的基因集应该比较小,而enrichGO默认只[10,500]区间的基因集,太小和太大的基因集,通常意义不大,而且也容易给出很小的p值。当然这个参数是可以调的,你可以用全集,或者自己设置大小。

他在信中并没有给出universe.ENSEMBLID,而只是给了gene.df给我,如果没有设定universe的情况下,是可以跑出结果的:

> go.bp <- enrichGO(gene = gene.df$ENSEMBL, 
           keytype = 'ENSEMBL', 
           OrgDb = org.Ce.eg.db, 
           ont = 'BP', 
           pAdjustMethod = 'BH', 
           pvalueCutoff = 0.01, 
           qvalueCutoff = 0.05, 
           readable=T)

>
> go.bp
#
# over-representation test
#
#...@organism      Caenorhabditis elegans
#...@ontology      BP
#...@keytype      ENSEMBL
#...@gene      chr [1:101] "WBGene00013475" "WBGene00002198" "WBGene00004395" ...
#...pvalues adjusted by 'BH' with cutoff <0.01
#...23 enriched terms found
'data.frame':    23 obs. of  9 variables:
 $ ID         : chr  "GO:0018193" "GO:0008360" "GO:0018105" "GO:0018209" ...
 $ Description: chr  "peptidyl-amino acid modification" "regulation of cell shape" "peptidyl-serine phosphorylation" "peptidyl-serine modification" ...
 $ GeneRatio  : chr  "85/101" "55/101" "56/101" "56/101" ...
 $ BgRatio    : chr  "414/11158" "101/11158" "151/11158" "152/11158" ...
 $ pvalue     : num  3.21e-108 5.28e-92 1.57e-81 2.48e-81 1.60e-77 ...
 $ p.adjust   : num  6.87e-106 5.65e-90 1.12e-79 1.33e-79 6.83e-76 ...
 $ qvalue     : num  6.35e-106 5.23e-90 1.04e-79 1.23e-79 6.32e-76 ...
 $ geneID     : chr  "Y69E1A.3/kin-14/rol-3/C35E7.10/kin-5/Y116A8C.24/F59A3.8/F26E4.5/spe-8/C25A8.5/T21G5.1/F57B9.8/T25B9.5/W01B6.5/k"| __truncated__ "ZK507.3/K09C6.8/C09D4.3/R13H9.5/C56C10.6/Y65B4A.9/D2045.5/C39H7.1/B0218.5/ttbk-2/T05A7.6/M04F3.3/Y39G8C.2/K09E4"| __truncated__ "Y39G8B.5/ZK507.3/K09C6.8/C09D4.3/R13H9.5/C56C10.6/Y65B4A.9/D2045.5/C39H7.1/B0218.5/ttbk-2/T05A7.6/M04F3.3/Y39G8"| __truncated__ "Y39G8B.5/ZK507.3/K09C6.8/C09D4.3/R13H9.5/C56C10.6/Y65B4A.9/D2045.5/C39H7.1/B0218.5/ttbk-2/T05A7.6/M04F3.3/Y39G8"| __truncated__ ...
 $ Count      : int  85 55 56 56 55 56 55 55 24 29 ...
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287

这里他还提到了用GOrilla跑的话,有结果出来,而clusterProfiler没结果,这一方面是前面谈到的[10, 500]区间的问题,另一方面,也是《why clusterProfiler fails》里讲到的,因为clusterProfiler给出更保守、更可靠的结果。

我打开GOrilla主页,看了下文档,http://cbl-gorilla.cs.technion.ac.il/help.html,果然在P-value threshold这一块里写得很清楚:

Parameters

P-value threshold - Only GO terms with a p-value better than this threshold are reported.

Note that this p-value does not include the multiple hypothesis correction on the number of tested GO terms. To correct for this the p-value should be multiplied by the number of GO terms used as reported in the results page.

GOrilla并没有multiple testing correction, 而clusterProfiler用了p value adjustment和qvalue来控制multiple comparison。所以啊,clusterProfiler报出来的显著性要保守和可靠得多,出来的结果必须要比GOrilla少。

电梯