Evolution of my BioC packages

July 13, 2016 in R

发现Youtube上有一个视频叫Evolution of clusterProfiler, 是Landon Wilkins用Gource做的。于是我也来玩一下，看一下自己这几年码代码的过程。

How to bug author

July 5, 2016 in R

As an author of several Bioconductor packages. I found many questions from users are quite annoying. Some of them never use google and they are reluctant to read vignettes.

Step 1: make sure you are using the latest release

I found many peoples are using out-of-date packages. When they got an issue of an out-dated package, they never check whether the issue still exists in latest release.

identify method for ggtree

June 28, 2016 in R

We are happy to announce that ggtree supports interactive tree annotation/manipulation by implementing an identify method. Users can click on a node to highlight a clade, to label or rotate it etc.

Here is an example of highlighting clades using geom_hilight with identify:

第九届ChinaR会议生物信息学分会

May 31, 2016 in R

不记得是什么时候知道统计之都的，但我记得最早知道的是太云，因为我用了他写的corrplot包。后来统计之都最早接触的也是太云，他给我写邮件问我能不能帮忙校对《ggplot2：数据分析与图形艺术》，从此开始和太云变成了网友。

我在暨大的时候，太云曾经邀请我去China-R会议做报告，但我觉得自己没什么好分享的，GOSemSim这个包是硕士的时候做的，不好去讲之前做的东西。而当时我写的另一个包clusterProfiler，纯粹是因为大量做富集分析的工具都是针对模式生物，而我们实验室有做各种细菌；另外有一些工具，背景设置是有问题的。自己实现一个包，不受别人的限制。即便是这个包现在受到了一定的认可，比如BioC 3.3中有个debrowser的包使用了clusterProfiler，而在BioC 3.4中又有个新包bioCancer也使用了clusterProfiler；再比如这次在北京，有好几个参会的人员在茶歇时问了clusterProfiler的问题。但始终觉得这只是个实用性的包而已，算法是别人的，而且已经比较老了，类似的工具简直就是成百上千。所以也是不好意思拿出来讲的。所以我拒绝了太云的邀请，一直也没有参加China-R的会议。

今年是第九届China-R会议，这次会议规模很大，有22个分会场，超过100个演讲嘉宾，参会人数超过4000人。这一次刚好有个Bioconductor的分会场，Matt写信给我，说我写过几个Bioconductor包，他本人喜欢我的ChIPseeker包，问我能否在会上分享与Bioconductor包相关的经验。这是Bioconductor在中国的首秀，我欣然接受，当然也是因为这两年我写了ChIPseeker和ggtree，我自己觉得还拿得出手🙈。

[Bioc 33] NEWS of my BioC packages

May 5, 2016 in R

Today is my birthday and it happened to be the release day of Bioconductor 3.3. It’s again the time to reflect what I’ve done in the past year.

convert biological ID with KEGG API using clusterProfiler

May 3, 2016 in R

bitr_kegg

clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.

library(clusterProfiler)
data(gcSample)
hg <- gcSample[[1]]
head(hg)

## [1] "4597"  "7111"  "5266"  "2175"  "755"   "23046"

eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa')

## Warning in bitr_kegg(hg, fromType = "kegg", toType = "ncbi-proteinid",
## organism = "hsa"): 3.7% of input gene IDs are fail to map...

head(eg2np)

##     kegg ncbi-proteinid
## 1   8326      NP_003499
## 2  58487   NP_001034707
## 3 139081      NP_619647
## 4  59272      NP_068576
## 5    993      NP_001780
## 6   2676      NP_001487

np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa')

head(np2up)

##   ncbi-proteinid uniprot
## 1      NP_005457  O75586
## 2      NP_005792  P41567
## 3      NP_005792  Q6IAV3
## 4      NP_037536  Q13421
## 5      NP_006054  O60662
## 6   NP_001092002  O95398

The ID type (both fromType & toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’ or ‘uniprot’. The ‘kegg’ is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

KEGG Module Enrichment Analysis

April 13, 2016 in R

KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:

pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)

structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)

functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)

signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)

Evolution of my BioC packages

How to bug author

Step 1: make sure you are using the latest release

identify method for ggtree

第九届ChinaR会议生物信息学分会

[Bioc 33] NEWS of my BioC packages

convert biological ID with KEGG API using clusterProfiler

bitr_kegg

KEGG Module Enrichment Analysis

Guangchuang Yu