ggtree provides many helper functions for manupulating phylogenetic trees and make it easy to explore tree structure visually.

Here, as examples, I used ggtree to draw capital character G and C, which are first letter of my name :-).

To draw a tree in such shape, we need fan layout (circular layout with open angle) and then rotating the tree to let the open space on the correct position. Here are the source codes to produce the G and C shapes of tree. I am thinking about using the G shaped tree as ggtree logo. Have fun with ggtree :-)

Continue reading

OutbreakTools implements basic tools for the analysis of Disease Outbreaks.

It defines S4 class obkData to store case-base outbreak data. It also provides a function, plotggphy, to visualize such data on the phylogenetic tree.

library(OutbreakTools)
data(FluH1N1pdm2009)
attach(FluH1N1pdm2009)


x <- new("obkData", individuals = individuals, dna = FluH1N1pdm2009$dna,
         dna.individualID = samples$individualID, dna.date = samples$date,
         trees = FluH1N1pdm2009$trees)

plotggphy(x, ladderize = TRUE, branch.unit = "year",
          tip.color = "location", tip.size = 3, tip.alpha = 0.75)

Continue reading

ggtree can parse many software outputs and the evolution evidences inferred by these software can be used directly for tree annotation. ggtree not only works as an infrastructure that enables evolutionary data that inferred by commonly used software packages to be used in R, but also serves as a general tree visualization and annotation tool for the R community as it supports many S3/S4 objects defined by other R packages.

phyloseq for microbiome data

phyloseq class defined in the phyloseq package was designed for microbiome data. phyloseq package implemented plot_tree function using ggplot2. Although the function was implemented by ggplot2 and we can use theme, scale_color_manual etc for customization, the most valuable part of ggplot2, adding layer, is missing. plot_tree only provides limited parameters to control the output graph and it is hard to add layer unless user has expertise in both phyloseq and ggplot2.

Continue reading

I extended the subview function to support embed image file in a ggplot object.

set.seed(123)
d = data.frame(x=rnorm(10), y=rnorm(10))

imgfile <- tempfile(, fileext=".png")
download.file("https://avatars1.githubusercontent.com/u/626539?v=3&u=e731426406dd3f45a73d96dd604bc45ae2e7c36f&s=140",
	          destfile=imgfile, mode='wb')

p = ggplot(d, aes(x, y))
subview(p, imgfile, x=d$x[1], y=d$y[1]) + geom_point(size=5)

Continue reading

本文受魏太云(@cloud_wei)的邀请,最初在2015年发表于统计之都

进化树看起来和层次聚类很像。有必要解释一下两者的一些区别。

层次聚类的侧重点在于分类,把距离近的聚在一起。而进化树的构建可以说也是一个聚类过程,但侧重点在于推测进化关系和进化距离(evolutionary distance)。

层次聚类的输入是距离,比如euclidean或manhattan距离。把距离近的聚在一起。而进化树推断是从生物序列(DNA或氨基酸)的比对开始。最简单的方法是计算一下序列中不匹配的数目,称之为hamming distance(通常用序列长度做归一化),使用距离当然也可以应用层次聚类的方法。进化树的构建最简单的方法是非加权配对平均法(Unweighted Pair Group Method with Arithmetic Mean, UPGMA),这其实是使用average linkage的层次聚类。这种方法在进化树推断上现在基本没人用。更为常用的是邻接法(neighbor joining),两个节点距离其它节点都比较远,而这两个节点又比较近,它们就是neighbor,可以看出neighbor不一定是距离最近的两个节点。真正做进化的人,这个方法也基本不用。现在主流的方法是最大似然法(Maximum likelihood, ML),通过进化模型(evolutionary model)估计拓朴结构和分支长度,估计的结果具有最高的概率能够产生观测数据(多序列比对)。另外还有最大简约法和贝叶斯推断等方法用于构建进化树。

Continue reading

To answer the issue, I extend the covplot function to support viewing coverage of a list of GRanges objects or bed files.

library(ChIPseeker)
files <- getSampleFiles()
peak=GenomicRanges::GRangesList(CBX6=readPeakFile(files[[4]]),
                                CBX7=readPeakFile(files[[5]]))

p <- covplot(peak)
print(p)

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou