How to bug author

As an author of several Bioconductor packages. I found many questions from users are quite annoying. Some of them never use google and they are reluctant to read vignettes.

Step 1: make sure you are using the latest release

I found many peoples are using out-of-date packages. When they got an issue of an out-dated package, they never check whether the issue still exists in latest release.

Continue reading

We are happy to announce that ggtree supports interactive tree annotation/manipulation by implementing an identify method. Users can click on a node to highlight a clade, to label or rotate it etc.

Here is an example of highlighting clades using geom_hilight with identify:

Continue reading

不记得是什么时候知道统计之都的,但我记得最早知道的是太云,因为我用了他写的corrplot包。后来统计之都最早接触的也是太云,他给我写邮件问我能不能帮忙校对《ggplot2:数据分析与图形艺术》,从此开始和太云变成了网友。

我在暨大的时候,太云曾经邀请我去China-R会议做报告,但我觉得自己没什么好分享的,GOSemSim这个包是硕士的时候做的,不好去讲之前做的东西。而当时我写的另一个包clusterProfiler,纯粹是因为大量做富集分析的工具都是针对模式生物,而我们实验室有做各种细菌;另外有一些工具,背景设置是有问题的。自己实现一个包,不受别人的限制。即便是这个包现在受到了一定的认可,比如BioC 3.3中有个debrowser的包使用了clusterProfiler,而在BioC 3.4中又有个新包bioCancer也使用了clusterProfiler;再比如这次在北京,有好几个参会的人员在茶歇时问了clusterProfiler的问题。但始终觉得这只是个实用性的包而已,算法是别人的,而且已经比较老了,类似的工具简直就是成百上千。所以也是不好意思拿出来讲的。所以我拒绝了太云的邀请,一直也没有参加China-R的会议。

今年是第九届China-R会议,这次会议规模很大,有22个分会场,超过100个演讲嘉宾,参会人数超过4000人。这一次刚好有个Bioconductor的分会场,Matt写信给我,说我写过几个Bioconductor包,他本人喜欢我的ChIPseeker包,问我能否在会上分享与Bioconductor包相关的经验。这是Bioconductor在中国的首秀,我欣然接受,当然也是因为这两年我写了ChIPseekerggtree,我自己觉得还拿得出手🙈。

Continue reading

bitr_kegg

clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.

library(clusterProfiler)
data(gcSample)
hg <- gcSample[[1]]
head(hg)

## [1] "4597"  "7111"  "5266"  "2175"  "755"   "23046"

eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa')

## Warning in bitr_kegg(hg, fromType = "kegg", toType = "ncbi-proteinid",
## organism = "hsa"): 3.7% of input gene IDs are fail to map...

head(eg2np)

##     kegg ncbi-proteinid
## 1   8326      NP_003499
## 2  58487   NP_001034707
## 3 139081      NP_619647
## 4  59272      NP_068576
## 5    993      NP_001780
## 6   2676      NP_001487

np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa')

head(np2up)

##   ncbi-proteinid uniprot
## 1      NP_005457  O75586
## 2      NP_005792  P41567
## 3      NP_005792  Q6IAV3
## 4      NP_037536  Q13421
## 5      NP_006054  O60662
## 6   NP_001092002  O95398

The ID type (both fromType & toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’ or ‘uniprot’. The ‘kegg’ is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

Continue reading

KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:

  • pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)
  • structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)
  • functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)
  • signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou