Selected Publications

1.We present an R package, GGTREE, which provides programmable visualization and annotation of phyloge- netic trees. 2. GGTREE can read more tree file formats than other softwares, including newick, nexus, NHX, phylip and jplace formats, and support visualization of phylo, multiphylo, phylo4, phylo4d, obkdata and phyloseq tree objects defined in other R packages. It can also extract the tree/branch/node-specific and other data from the analysis outputs of BEAST, EPA, HYPHY, PAML, PHYLODOG, PPLACER, R8S, RAXML and REVBAYES software, and allows using these data to annotate the tree. 3. The package allows colouring and annotation of a tree by numerical/categorical node attributes, manipulat- ing a tree by rotating, collapsing and zooming out clades, highlighting user selected clades or operational taxo- nomic units and exploration of a large tree by zooming into a selected portion. 4. A two-dimensional tree can be drawn by scaling the tree width based on an attribute of the nodes. A tree can be annotated with an associated numerical matrix (as a heat map), multiple sequence alignment, subplots or silhouette images. 5. The package GGTREE is released under the ARTISTIC-2.0 LICENSE. The source code and documents are freely available through BIOCONDUCTOR (

Summary: ChIPseeker is an R package for annotating ChIP-seq data analysis. It supports annotating ChIP peaks and provides functions to visualize ChIP peaks coverage over chromosomes and profiles of peaks binding to TSS regions. Comparison of ChIP peak profiles and annotation are also supported. Moreover, it supports evaluating significant overlap among ChIP-seq datasets. Currently, ChIPseeker contains 15 000 bed file information from GEO database. These datasets can be downloaded and compare with user's own data to explore significant overlap datasets for inferring co-regulation or transcription factor complex for further investigation.

Recent Publications

More Publications

Recent & Upcoming Talks

  • Example Talk

    Sun, Jan 1, 2017, Hugo Academic Theme Conference

Recent Posts

More Posts

I don’t know whether ‘rename taxa’ is a common task or not. It seems not a good idea to rename taxa in Newick tree text, since it may introduce problems when mapping the original sequence alignment to the tree. If you just want to show different or additional information when plotting the tree, it is fine and easy to do it using ggtree: require(treeio) ## Loading required package: treeio require(ggtree) ## Loading required package: ggtree ## ggtree v1.


ggimage 0.1.4 is available on CRAN. This release introduces a new function called ggbackground for setting image background as ggplot canvas. require(ggplot2) p <- ggplot(iris) + aes(x = Sepal.Length, y = Sepal.Width, color=Species) + geom_point(size=5) + theme_classic() Suppose we have the above ggplot object, p, the only thing we need to do is passing the p with an image file name (local or remote) to ggbackground, as demonstrated below: require(ggimage) img = "https://assets.


With ggimage, we are able to plot images using grammar of graphics. The layers defined in ggimage can be directly applied to ggtree to annotate phylogenetic tree using local/online image files. ggtree seamlessly work with ggimage. The geom_tiplab and geom_nodelab can accept parameter of geom="image" to parse taxa labels as image files and use them to “label” the taxa using images instead of text strings. Here are some examples for demonstration.


Reassortment is an important strategy for influenza A viruses to introduce a HA subtype that is new to human populations, which creates the possibilities of pandemic.

A diagram showed above (Figure 2 of doi:10.1038/srep25549) is widely used to illustrate the reassortment events. While such diagrams are mostly manually draw and edit without software tool to automatically generate. Here, I implemented the hybrid_plot function for producing publication quality figure of reassortment events.


n <- 8

virus_info <- tibble(
    id = 1:7,
    x = c(rep(1990, 4), rep(2000, 2), 2009),
    y = c(1,2,3,5, 1.5, 3, 4),
    segment_color = list(
        rep('purple', n),
        rep('red', n),
        rep('darkgreen', n),
        rep('lightgreen', n),
        c('darkgreen', 'darkgreen', 'red', 'darkgreen', 'red', 'purple', 'red', 'purple'),
        c('darkgreen', 'darkgreen', 'red', 'darkgreen', 'darkgreen', 'purple', 'red', 'purple'),
        c('darkgreen', 'lightgreen', 'lightgreen', 'darkgreen', 'darkgreen', 'purple', 'red', 'purple'))

flow_info <- tibble(from = c(1,2,3,3,4,5,6),
                    to = c(5,5,5,6,7,6,7))
hybrid_plot(virus_info, flow_info)


My friend who doing his PhD study at Johns Hopkins just send me the link about a SR paper of plagiarism. I have very similar experence of a paper published on BMC Systems Biology, which plagiarized my work and the editor just decided to publish an erratum.

Deng etc. published an R package, ppiPre, that copied source code of my package, GOSemSim, and pretended that they developed these algorithms by themselves in their paper.

Here is the screenshot of the source code (left: ppiPre, right: GOSemSim).

You can find out more on my blog post.

As a developer of several open source software, I am glad that someone find my source code useful and happy if someone use my source code to make something better. But I am not happy if someone copies my source code by removing author information and changing function names to pretend the code was developed by himself. The situation is even worse in academic. Taking someone else’s works and passing it off as one’s own is definitely plagiarism and not allow in academic.




This package provides functions for pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization.


emojifont is an implementation of using emoji font in both base and ‘ggplot2’ graphics.


The ggtree package extending the ggplot2 package. It based on grammar of graphics and takes all the good parts of ggplot2. ggtree is designed for not only viewing phylogenetic tree but also displaying annotation data on the tree.


This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare their own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.


This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.


Pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization.


The clusterProfiler package implements methods to analyze and visualize functional profiles of genomic coordinates (supported by ChIPseeker), gene and gene clusters.


The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.


This is an example of using the custom widget to create your own homepage section.

I am a teaching instructor for the following courses at University X:

  • xx
  • yy