Guangchuang Yu

a senior-in-age-but-not-senior-in-knowledge bioinformatician

convert biological ID with KEGG API using clusterProfiler

May 3, 2016 - 4 minute read - Comments R

bitr_kegg

clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.

library(clusterProfiler)
data(gcSample)
hg <- gcSample[[1]]
head(hg)

## [1] "4597"  "7111"  "5266"  "2175"  "755"   "23046"

eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa')

## Warning in bitr_kegg(hg, fromType = "kegg", toType = "ncbi-proteinid",
## organism = "hsa"): 3.7% of input gene IDs are fail to map...

head(eg2np)

##     kegg ncbi-proteinid
## 1   8326      NP_003499
## 2  58487   NP_001034707
## 3 139081      NP_619647
## 4  59272      NP_068576
## 5    993      NP_001780
## 6   2676      NP_001487

np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa')

head(np2up)

##   ncbi-proteinid uniprot
## 1      NP_005457  O75586
## 2      NP_005792  P41567
## 3      NP_005792  Q6IAV3
## 4      NP_037536  Q13421
## 5      NP_006054  O60662
## 6   NP_001092002  O95398

The ID type (both fromType & toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’ or ‘uniprot’. The ‘kegg’ is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

KEGG Module Enrichment Analysis

Apr 13, 2016 - 2 minute read - Comments R

KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:

  • pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)
  • structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)
  • functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)
  • signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)

yet an unofficial BioEdit for OSX

Mar 30, 2016 - 1 minute read - Comments Software

To my knowledge, BioEdit is the most comprehensive biological sequence alignment editor. Most of my labmates run this software using Parallels Desktop. For some of them, BioEdit is the only reason to install Parallels Desktop.

I need to edit my alignment recently, and install it in my iMac using Wine, which is a compatibility layer for running Windows applications on POSIX-compliant OS. Although it is famous in Linux community for many years, many OSX users never heard of it.

Google Drive @ HKU

Mar 24, 2016 - 1 minute read - Comments Software

寻找一个好的网盘一直是个困扰我的问题,Dropbox非常好,但空间有限,大陆的各种网盘都是渣渣,本来试用了一下百度云,但度娘实在不争气,体验非常差。我后来找到了个比较好的方案,那就是gitlab,可以创建无限量的project,每个project有10G的空间,这比github出手大方多了。唯一不足是.git文件夹也是非常占空间的。

到HKU两年多,才发现HKU的邮箱自带无限量的google drive网盘。

embed images in ggplot2 via subview and annotate a phylogenetic tree with images using inset function

Mar 20, 2016 - 1 minute read - Comments RVisualization

I extended the subview function to support embed image file in a ggplot object.

set.seed(123)
d = data.frame(x=rnorm(10), y=rnorm(10))

imgfile <- tempfile(, fileext=".png")
download.file("https://avatars1.githubusercontent.com/u/626539?v=3&u=e731426406dd3f45a73d96dd604bc45ae2e7c36f&s=140",
              destfile=imgfile, mode='wb')

p = ggplot(d, aes(x, y))
subview(p, imgfile, x=d$x[1], y=d$y[1]) + geom_point(size=5)