leading edge analysis

leading edge and core enrichment

Leading edge analysis reports Tags to indicate the percentage of genes contributing to the enrichment score, List to indicate where in the list the enrichment score is attained and Signal for enrichment signal strength.

It would also be very interesting to get the core enriched genes that contribute to the enrichment.

Now DOSE, clusterProfiler and ReactomePA all support leading edge analysis and report core enriched genes.

Continue reading

How to bug author

As an author of several Bioconductor packages. I found many questions from users are quite annoying. Some of them never use google and they are reluctant to read vignettes.

Step 1: make sure you are using the latest release

I found many peoples are using out-of-date packages. When they got an issue of an out-dated package, they never check whether the issue still exists in latest release.

Continue reading

We are happy to announce that ggtree supports interactive tree annotation/manipulation by implementing an identify method. Users can click on a node to highlight a clade, to label or rotate it etc.

Here is an example of highlighting clades using geom_hilight with identify:

Continue reading

不记得是什么时候知道统计之都的,但我记得最早知道的是太云,因为我用了他写的corrplot包。后来统计之都最早接触的也是太云,他给我写邮件问我能不能帮忙校对《ggplot2:数据分析与图形艺术》,从此开始和太云变成了网友。

我在暨大的时候,太云曾经邀请我去China-R会议做报告,但我觉得自己没什么好分享的,GOSemSim这个包是硕士的时候做的,不好去讲之前做的东西。而当时我写的另一个包clusterProfiler,纯粹是因为大量做富集分析的工具都是针对模式生物,而我们实验室有做各种细菌;另外有一些工具,背景设置是有问题的。自己实现一个包,不受别人的限制。即便是这个包现在受到了一定的认可,比如BioC 3.3中有个debrowser的包使用了clusterProfiler,而在BioC 3.4中又有个新包bioCancer也使用了clusterProfiler;再比如这次在北京,有好几个参会的人员在茶歇时问了clusterProfiler的问题。但始终觉得这只是个实用性的包而已,算法是别人的,而且已经比较老了,类似的工具简直就是成百上千。所以也是不好意思拿出来讲的。所以我拒绝了太云的邀请,一直也没有参加China-R的会议。

今年是第九届China-R会议,这次会议规模很大,有22个分会场,超过100个演讲嘉宾,参会人数超过4000人。这一次刚好有个Bioconductor的分会场,Matt写信给我,说我写过几个Bioconductor包,他本人喜欢我的ChIPseeker包,问我能否在会上分享与Bioconductor包相关的经验。这是Bioconductor在中国的首秀,我欣然接受,当然也是因为这两年我写了ChIPseekerggtree,我自己觉得还拿得出手🙈。

Continue reading

bitr_kegg

clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.

library(clusterProfiler)
data(gcSample)
hg <- gcSample[[1]]
head(hg)

## [1] "4597"  "7111"  "5266"  "2175"  "755"   "23046"

eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa')

## Warning in bitr_kegg(hg, fromType = "kegg", toType = "ncbi-proteinid",
## organism = "hsa"): 3.7% of input gene IDs are fail to map...

head(eg2np)

##     kegg ncbi-proteinid
## 1   8326      NP_003499
## 2  58487   NP_001034707
## 3 139081      NP_619647
## 4  59272      NP_068576
## 5    993      NP_001780
## 6   2676      NP_001487

np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa')

head(np2up)

##   ncbi-proteinid uniprot
## 1      NP_005457  O75586
## 2      NP_005792  P41567
## 3      NP_005792  Q6IAV3
## 4      NP_037536  Q13421
## 5      NP_006054  O60662
## 6   NP_001092002  O95398

The ID type (both fromType & toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’ or ‘uniprot’. The ‘kegg’ is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou