Guangchuang Yu

a senior-in-age-but-not-senior-in-knowledge bioinformatician

convert biological ID with KEGG API using clusterProfiler

May 3, 2016 - 4 minute read - Comments R


clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.

hg <- gcSample[[1]]

## [1] "4597"  "7111"  "5266"  "2175"  "755"   "23046"

eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa')

## Warning in bitr_kegg(hg, fromType = "kegg", toType = "ncbi-proteinid",
## organism = "hsa"): 3.7% of input gene IDs are fail to map...


##     kegg ncbi-proteinid
## 1   8326      NP_003499
## 2  58487   NP_001034707
## 3 139081      NP_619647
## 4  59272      NP_068576
## 5    993      NP_001780
## 6   2676      NP_001487

np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa')


##   ncbi-proteinid uniprot
## 1      NP_005457  O75586
## 2      NP_005792  P41567
## 3      NP_005792  Q6IAV3
## 4      NP_037536  Q13421
## 5      NP_006054  O60662
## 6   NP_001092002  O95398

The ID type (both fromType & toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’ or ‘uniprot’. The ‘kegg’ is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.

KEGG Module Enrichment Analysis

Apr 13, 2016 - 2 minute read - Comments R

KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:

  • pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)
  • structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)
  • functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)
  • signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)

yet an unofficial BioEdit for OSX

Mar 30, 2016 - 1 minute read - Comments Software

To my knowledge, BioEdit is the most comprehensive biological sequence alignment editor. Most of my labmates run this software using Parallels Desktop. For some of them, BioEdit is the only reason to install Parallels Desktop.

I need to edit my alignment recently, and install it in my iMac using Wine, which is a compatibility layer for running Windows applications on POSIX-compliant OS. Although it is famous in Linux community for many years, many OSX users never heard of it.

Google Drive @ HKU

Mar 24, 2016 - 1 minute read - Comments Software


到HKU两年多,才发现HKU的邮箱自带无限量的google drive网盘。

embed images in ggplot2 via subview and annotate a phylogenetic tree with images using inset function

Mar 20, 2016 - 1 minute read - Comments RVisualization

I extended the subview function to support embed image file in a ggplot object.

d = data.frame(x=rnorm(10), y=rnorm(10))

imgfile <- tempfile(, fileext=".png")
              destfile=imgfile, mode='wb')

p = ggplot(d, aes(x, y))
subview(p, imgfile, x=d$x[1], y=d$y[1]) + geom_point(size=5)