I am Guangchuang YU. A PhD candidate studying evolution of influenza A virus at The University of Hong Kong.
PhD in Evolution Study of Influenza A Virus, 2017
The University of Hong Kong
Master in Biochemistry and Molecular Biology, 2009
Anhui Medical University
BSc in Biotechnology, 2005
South China Agricultural University
Today is my birthday and it happened to be the release day of Bioconductor 3.3. It’s again the time to reflect what I’ve done in the past year.
clusterProfiler can convert biological IDs using
OrgDb object via the
bitr function. Now I implemented another function,
bitr_kegg for converting IDs through KEGG API.
library(clusterProfiler) data(gcSample) hg <- gcSample[] head(hg) ##  "4597" "7111" "5266" "2175" "755" "23046" eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa') ## Warning in bitr_kegg(hg, fromType = "kegg", toType = "ncbi-proteinid", ## organism = "hsa"): 3.7% of input gene IDs are fail to map... head(eg2np) ## kegg ncbi-proteinid ## 1 8326 NP_003499 ## 2 58487 NP_001034707 ## 3 139081 NP_619647 ## 4 59272 NP_068576 ## 5 993 NP_001780 ## 6 2676 NP_001487 np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa') head(np2up) ## ncbi-proteinid uniprot ## 1 NP_005457 O75586 ## 2 NP_005792 P41567 ## 3 NP_005792 Q6IAV3 ## 4 NP_037536 Q13421 ## 5 NP_006054 O60662 ## 6 NP_001092002 O95398
The ID type (both fromType & toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’ or ‘uniprot’. The ‘kegg’ is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the ‘kegg’ ID is
entrezgene ID for eukaryote species and
Locus ID for prokaryotes.
KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules:
- pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds)
- structural complexes – often forming molecular machineries, such as M00072 (Oligosaccharyltransferase)
- functional sets – for other types of essential sets, such as M00360 (Aminoacyl-tRNA synthases, prokaryotes)
- signature modules – as markers of phenotypes, such as M00363 (EHEC pathogenicity signature, Shiga toxin)
To my knowledge, BioEdit is the most comprehensive biological sequence alignment editor. Most of my labmates run this software using
Parallels Desktop. For some of them, BioEdit is the only reason to install
I need to edit my alignment recently, and install it in my iMac using Wine, which is a compatibility layer for running Windows applications on POSIX-compliant OS. Although it is famous in Linux community for many years, many OSX users never heard of it.
I extended the subview function to support embed image file in a
set.seed(123) d = data.frame(x=rnorm(10), y=rnorm(10)) imgfile <- tempfile(, fileext=".png") download.file("https://avatars1.githubusercontent.com/u/626539?v=3&u=e731426406dd3f45a73d96dd604bc45ae2e7c36f&s=140", destfile=imgfile, mode='wb') p = ggplot(d, aes(x, y)) subview(p, imgfile, x=d$x, y=d$y) + geom_point(size=5)
I am a teaching instructor for the following courses at University X: