虽然我不喜欢DAVID,但很多用户喜欢,所以clusterProfiler也支持了,最近github上又有人要求支持自定义背景

Dear Guangchuang,

I am using clusterProfiler in Kegg pathway enrichment analysis, it is useful and nice. I am looking for a function which accept background and has ability to deal with Ensembl gene ID.

In a function enrichDAVID it can takes ensembl gene id as an input format, but not allows to enter background. enrichDAVID(gene = gene, idType="ENSEMBL_GENE_ID", annotation="KEGG_PATHWAY", species= "hsa")

Other command enrichKEGG has a background input but only takes entrez gene id, enrichKEGG(gene, organism = "hsa", keyType = "kegg", universe)

I have tried to convert my ensembl gene IDs to entrez gene id, but some ensembl gene IDs represent more than one entrez gene ID. I downloaded KEGG pathway dataset to apply fisher exact test. however, genes are in entrez ID and i am still dont know how to convert.

Continue reading

wrapping labels in ggplot2

在公众号biobabble后台有多人同时在问这个问题:

晒这个截屏主要想说一点,如果是一两句话就能说清楚的问题,可以提问,如果不是,则不要在后台提问,写邮件或者到论坛提问,是更好的方式,像截屏中显示的,图片显示过期,我根本就没看到过图片。在手机上是无法看的,而我正好几天没在电脑前,于是你们发的图片我看不了,而且我如果没有在24小时之内回复,公众平台就不允许我回复了,因为问题已经过期。所以在此强调,不要在后台发图片提问,不要在后台问稍复杂的问题。

这个问题其实很简单,用stringr包的str_wrap来完成文本自动换行就行了。

Continue reading

dotplot for GSEA result

For GSEA analysis, we are familar with the above figure which shows the running enrichment score. But for most of the software, it lack of visualization method to summarize the whole enrichment result.

In DOSE (and related tools including clusterProfiler, ReactomePA and meshes), we provide enrichMap and cnetplot to summarize GSEA result.

Continue reading

buildGOmap

周末Bioconductor上的问题,说的是他用了buildGOmap之后,在终端上输出了一长串,但没有文件产生,今天就来讲讲这个buildGOmap的前世今生。

当年写clusterProfiler的时候在暨大工作,主要也是自己的需求驱动,因为实验室里有做细菌,比如肺炎链球菌D39,在细菌界基本上所谓的GO分析,就是跑个电子注释,然后数一下数目,列个表格画个饼图。很难看到有富集性分析的身影,因为绝大多数的工具是只支持少量模式生物的,还有部分工具比如支持某些细菌,支持某些植物或者某些真菌,比如支持植物的,也只是支持少量的植物物种而已,都是些自己定制给自己用的玩意,放出来只是为了顺道灌水而已。

Continue reading

I am using dotplot() to visualize results from enrichGO(), enrichDO(), enricher() and compareCluster() in clusterProfiler R package. When specifying showCategory, I get the right number of categories except with the results of compareCluser().

In my case, I use compareCluster() on a list of 3 elements:

str(ClusterList) List of 3 $ All : chr [1:1450] “89886” “29923” “100132891” “101410536” … $ g1 : chr [1:858] “89886” “29923” “100132891” “101410536” … $ g2: chr [1:592] “5325” “170691” “29953” “283392” …

CompareGO_BP=compareCluster(ClusterList, fun=“enrichGO”, pvalueCutoff=0.01, pAdjustMethod=“BH”, OrgDb=org.Hs.eg.db,ont=“BP”,readable=T)

dotplot(CompareGO_BP, showCategory=10, title=“GO - Biological Process”)

I ask for 10 categories, but I get 15 categories in All, 8 categories in g1 and 12 categories in g2. None of the categories, neither the sum of the categories are 10…

Is the option showCategory working in the case of comparison? Am I missing something here?

And which categories precisely will it plot? the most significant whatever my 3 cases or the most significant of each case?

The question was posted in Bioconductor support site. It seems quite confusing and I think I need to write a post to clarify it.

Continue reading

leading edge analysis

leading edge and core enrichment

Leading edge analysis reports Tags to indicate the percentage of genes contributing to the enrichment score, List to indicate where in the list the enrichment score is attained and Signal for enrichment signal strength.

It would also be very interesting to get the core enriched genes that contribute to the enrichment.

Now DOSE, clusterProfiler and ReactomePA all support leading edge analysis and report core enriched genes.

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou