This section will demonstrate how to use treeio to parse tree with associated data to a single object in R.
This lesson assumes a basic familiarity with R and data frames.
This lesson does not cover methods and software for generating phylogenetic trees, nor does it it cover interpreting phylogenies. Here’s a quick primer on how to read a phylogeny that you should definitely review prior to this lesson, but it is by no means extensive. Genome-wide sequencing allows for examination of the entire genome, and from this, many methods and software tools exist for comparative genomics using SNP- and gene-based phylogenetic analysis, either from unassembled sequencing reads, draft assemblies/contigs, or complete genome sequences. These methods are beyond the scope of this lesson.
treeio
Packagetreeio is an R package that designed for phylogenetic tree data input and output. It is released as part of Bioconductor and ROpenSci projects.
Just like R packages from CRAN, you only need to install Bioconductor packages once (instructions here), then load them every time you start a new R session.
library(treeio)
Most tree viewer software (including R packages) focus on Newick and NEXUS file formats, and other evolutionary analysis software might also contains supporting evidence and/or analysis findings within the file that can be further analyzed in R or interpreted in phylogenetic context to help identifying evolutionary patterns.
treeio supports several file formats, including:
and software output from:
Let’s first import a phylogenetic tree in to R. Phylogenetic tree are mainly stored as Newick or Nexus formats that only contains tree structure with (phylogram) or without (cladogram) branch lengths.
Download the tree.nwk data by clicking here or using the link above. Let’s load the libraries you’ll need if you haven’t already, and then import the tree using read.tree()
. Displaying the object itself tells you a little bit about the tree, e.g. number of tips and nodes, a glance of tip (and node if available) labels.
library(treeio)
tree <- read.tree("data/tree.nwk")
tree
##
## Phylogenetic tree with 28 tips and 26 internal nodes.
##
## Tip labels:
## Phy000G05U_EMENI, Phy000GDP6_ASPNG, Phy003AMS0_602072, Phy000FJDH_ASPFL, Phy000FCLK_ASPCL, Phy000FQ5O_ASPFU, ...
## Node labels:
## , 0.99985, 0.99985, 0.72129, 0.991353, 0.99985, ...
##
## Unrooted; includes branch lengths.
The only possible way for a Newick tree to store additional information other than the tree structure is to encode the information as taxa labels. This tree use node label to store support values (e.g. bootstrap values).
Look at the help page for ?ape::read.tree
, specifically the Details and Value session to explore the components of the phylo
object.
The treeio
package implement several parser functions.
Parser function | Description |
---|---|
read.beast | parsing output of BEAST |
read.codeml | parsing output of CodeML (rst and mlc files) |
read.codeml_mlc | parsing mlc file (output of CodeML) |
read.hyphy | parsing output of HYPHY |
read.jplace | parsing jplace file including output of EPA and pplacer |
read.mrbayes | parsing output of MrBayes |
read.newick | parsing newick string, with ability to parse node label as support values |
read.nhx | parsing NHX file including output of PHYLDOG and RevBayes |
read.paml_rst | parsing rst file (output of BaseML or CodeML) |
read.phylip | parsing phylip file (phylip alignment + newick string) |
read.r8s | parsing output of r8s |
read.raxml | parsing output of RAxML |
After parsing, storage of the tree structure with associated data is made through a S4 class, treedata, defined in the treeio package. These parsed data are mapped to the tree branches and nodes inside treedata
object, so that they can be efficiently used to visually annotate the tree using ggtree package.
Here, we use BEAST output as an example of using these parser functions to import tree with associated data. The details of BEAST output can be found on http://beast.community/nexus_metacomments.html. In summary, it introudces ‘metacomment’ to annotate elements in the standard NEXUS format. The additional information is put in comments so that existing programs can read the tree by ignoring them. treeio is able to read the tree with all inserted information stored in BEAST output.
beast_tree <- read.beast("data/MCC_FluA_H3.tree")
beast_tree
## 'treedata' S4 object that stored information of
## 'data/MCC_FluA_H3.tree'.
##
## ...@ phylo:
## Phylogenetic tree with 76 tips and 75 internal nodes.
##
## Tip labels:
## A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
##
## Rooted; includes branch lengths.
##
## with the following features available:
## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
As there are many R package works on the phylo
object, treeio
provides as.phylo
method to convert treedata
object to a phylo
object, and get.data
method to extract associated data stored in treedata
object.
The tidytree provides as_tibble
method to convert treedata
object to tidy data and creates the possibility of manipulating tree data using tidy interface.
beast_tree
to phylo
object.beast_tree
.beast_tree
to tidy data.beast_tree
There is a wide range of heterogeneous data, such as traits, geographic distritution, experimental and clinical data, etc, that need to be integrated and linked to phylogeny. For example, in the study of viral evolution, tree nodes may associated with epidemiological information, such as location, age and subtype. Functional annotations may need to be mapped on gene trees for comparative genomic studies.
To facilitate data integration, treeio provides full_join
method to link external data to phylogeny and stored in treedata
object.
library(readr)
x <- read.tree("data/tree_boots.nwk")
info <- read_csv("data/tip_data.csv")
names(info)[1] <- 'label'
print(info)
## # A tibble: 7 x 10
## label vernacularName imageURL imageLicense imageAuthor infoURL mass_in_kg
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Rang~ Reindeer http://~ CC-BY-SA Alexandre ~ http:/~ 109.
## 2 Cerv~ Red deer http://~ CC-BY-SA Sciadopitys http:/~ 241.
## 3 Bos_~ Cattle https:/~ CC-BY-SA Cynthia Si~ http:/~ 619.
## 4 Ovis~ Asiatic moufl~ http://~ CC-BY-SA J<U+0161>rg Hempel http:/~ 39.1
## 5 Suri~ Meerkat http://~ CC-BY-SA Sara&Joach~ http:/~ 0.73
## 6 Cyst~ Hooded seal http://~ CC-BY-SA Ecomare, S~ http:/~ 279.
## 7 Meph~ Striped skunk http://~ CC-BY Kevin Bowm~ http:/~ 2.4
## # ... with 3 more variables: trophic_habit <chr>, ncbi_taxid <dbl>,
## # rank <chr>
d2 <- read_csv("data/inode_data.csv")
names(d2)[1] <- 'label'
tree2 <- full_join(as.treedata(x), info, by='label')
tree2
## 'treedata' S4 object'.
##
## ...@ phylo:
## Phylogenetic tree with 7 tips and 6 internal nodes.
##
## Tip labels:
## Rangifer_tarandus, Cervus_elaphus, Bos_taurus, Ovis_orientalis, Suricata_suricatta, Cystophora_cristata, ...
## Node labels:
## [1] "Mammalia" "Artiodactyla" "Cervidae" "Bovidae"
## [5] "Carnivora" "Caniformia"
##
## Rooted; includes branch lengths.
##
## with the following features available:
## '', 'vernacularName', 'imageURL', 'imageLicense', 'imageAuthor',
## 'infoURL', 'mass_in_kg', 'trophic_habit', 'ncbi_taxid', 'rank'.
full_join(tree2, d2, by='label')
## 'treedata' S4 object'.
##
## ...@ phylo:
## Phylogenetic tree with 7 tips and 6 internal nodes.
##
## Tip labels:
## Rangifer_tarandus, Cervus_elaphus, Bos_taurus, Ovis_orientalis, Suricata_suricatta, Cystophora_cristata, ...
## Node labels:
## [1] "Mammalia" "Artiodactyla" "Cervidae" "Bovidae"
## [5] "Carnivora" "Caniformia"
##
## Rooted; includes branch lengths.
##
## with the following features available:
## '', 'vernacularName.x', 'imageURL', 'imageLicense', 'imageAuthor',
## 'infoURL.x', 'mass_in_kg', 'trophic_habit', 'ncbi_taxid', 'rank.x',
## 'vernacularName.y', 'infoURL.y', 'rank.y', 'bootstrap', 'posterior'.
d2
) to tree (tree2
)treeio provides write.beast
and write.jtree
to export tree with data to a single file. write.beast
outputs BEAST compatible file that can be open using FigTree and many other software - most of them can read the tree by ignoring data.
write.beast(tree2)
## #NEXUS
## [R-package treeio, Thu Jan 17 18:22:25 2019]
##
## BEGIN TAXA;
## DIMENSIONS NTAX = 7;
## TAXLABELS
## Rangifer_tarandus
## Cervus_elaphus
## Bos_taurus
## Ovis_orientalis
## Suricata_suricatta
## Cystophora_cristata
## Mephitis_mephitis
## ;
## END;
## BEGIN TREES;
## TRANSLATE
## 1 Rangifer_tarandus,
## 2 Cervus_elaphus,
## 3 Bos_taurus,
## 4 Ovis_orientalis,
## 5 Suricata_suricatta,
## 6 Cystophora_cristata,
## 7 Mephitis_mephitis
## ;
## TREE * UNTITLED = [&R] (((1[&vernacularName=Reindeer,imageURL=http://media.eol.org/content/2012/06/13/00/48543_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Alexandre Buisse (Nattfodd),infoURL=http://eol.org/pages/328653/overview,mass_in_kg=109.09,trophic_habit=herbivore,ncbi_taxid=9870,rank=species]:1,2[&vernacularName=Red deer,imageURL=http://media.eol.org/content/2014/09/16/00/20239_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Sciadopitys,infoURL=http://eol.org/pages/328649/overview,mass_in_kg=240.87,trophic_habit=herbivore,ncbi_taxid=9860,rank=species]:1)Cervidae:1,(3[&vernacularName=Cattle,imageURL=https://media.eol.org/content/2014/09/29/06/46535_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Cynthia Sims Parr,infoURL=http://eol.org/pages/328699/overview,mass_in_kg=618.64,trophic_habit=herbivore,ncbi_taxid=9913,rank=species]:1,4[&vernacularName=Asiatic mouflon,imageURL=http://media.eol.org/content/2015/05/20/03/80720_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=J<U+0161>rg Hempel,infoURL=http://eol.org/pages/311906/overview,mass_in_kg=39.1,trophic_habit=herbivore,ncbi_taxid=469796,rank=species]:1)Bovidae:1)Artiodactyla:1,(5[&vernacularName=Meerkat,imageURL=http://media.eol.org/content/2016/08/16/05/67138_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Sara&Joachim,infoURL=http://eol.org/pages/311580/overview,mass_in_kg=0.73,trophic_habit=carnivore,ncbi_taxid=37032,rank=species]:2,(6[&vernacularName=Hooded seal,imageURL=http://media.eol.org/content/2013/06/18/07/63362_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Ecomare, Salko de Wolf,infoURL=http://eol.org/pages/328632/overview,mass_in_kg=278.9,trophic_habit=omnivore,ncbi_taxid=39293,rank=species]:1,7[&vernacularName=Striped skunk,imageURL=http://media.eol.org/content/2012/06/15/06/75234_orig.jpg,imageLicense=CC-BY,imageAuthor=Kevin Bowman,infoURL=http://eol.org/pages/328593/overview,mass_in_kg=2.4,trophic_habit=omnivore,ncbi_taxid=30548,rank=species]:1)Caniformia:1)Carnivora:1)Mammalia;
## END;
write.jtree
outputs JSON (JavaScript Object Notation) file, which is a lightweight data-interchange format and widely supported in almost all modern programming languages.
write.jtree(tree2)
## {
## "tree": "(((Rangifer_tarandus:1{1},Cervus_elaphus:1{2})Cervidae:1{10},(Bos_taurus:1{3},Ovis_orientalis:1{4})Bovidae:1{11})Artiodactyla:1{9},(Suricata_suricatta:2{5},(Cystophora_cristata:1{6},Mephitis_mephitis:1{7})Caniformia:1{13})Carnivora:1{12})Mammalia{8};",
## "data":[
## {
## "edge_num": 1,
## "vernacularName": "Reindeer",
## "imageURL": "http://media.eol.org/content/2012/06/13/00/48543_orig.jpg",
## "imageLicense": "CC-BY-SA",
## "imageAuthor": "Alexandre Buisse (Nattfodd)",
## "infoURL": "http://eol.org/pages/328653/overview",
## "mass_in_kg": 109.09,
## "trophic_habit": "herbivore",
## "ncbi_taxid": 9870,
## "rank": "species"
## },
## {
## "edge_num": 2,
## "vernacularName": "Red deer",
## "imageURL": "http://media.eol.org/content/2014/09/16/00/20239_orig.jpg",
## "imageLicense": "CC-BY-SA",
## "imageAuthor": "Sciadopitys",
## "infoURL": "http://eol.org/pages/328649/overview",
## "mass_in_kg": 240.87,
## "trophic_habit": "herbivore",
## "ncbi_taxid": 9860,
## "rank": "species"
## },
## {
## "edge_num": 3,
## "vernacularName": "Cattle",
## "imageURL": "https://media.eol.org/content/2014/09/29/06/46535_orig.jpg",
## "imageLicense": "CC-BY-SA",
## "imageAuthor": "Cynthia Sims Parr",
## "infoURL": "http://eol.org/pages/328699/overview",
## "mass_in_kg": 618.64,
## "trophic_habit": "herbivore",
## "ncbi_taxid": 9913,
## "rank": "species"
## },
## {
## "edge_num": 4,
## "vernacularName": "Asiatic mouflon",
## "imageURL": "http://media.eol.org/content/2015/05/20/03/80720_orig.jpg",
## "imageLicense": "CC-BY-SA",
## "imageAuthor": "J<U+0161>rg Hempel",
## "infoURL": "http://eol.org/pages/311906/overview",
## "mass_in_kg": 39.1,
## "trophic_habit": "herbivore",
## "ncbi_taxid": 469796,
## "rank": "species"
## },
## {
## "edge_num": 5,
## "vernacularName": "Meerkat",
## "imageURL": "http://media.eol.org/content/2016/08/16/05/67138_orig.jpg",
## "imageLicense": "CC-BY-SA",
## "imageAuthor": "Sara&Joachim",
## "infoURL": "http://eol.org/pages/311580/overview",
## "mass_in_kg": 0.73,
## "trophic_habit": "carnivore",
## "ncbi_taxid": 37032,
## "rank": "species"
## },
## {
## "edge_num": 6,
## "vernacularName": "Hooded seal",
## "imageURL": "http://media.eol.org/content/2013/06/18/07/63362_orig.jpg",
## "imageLicense": "CC-BY-SA",
## "imageAuthor": "Ecomare, Salko de Wolf",
## "infoURL": "http://eol.org/pages/328632/overview",
## "mass_in_kg": 278.9,
## "trophic_habit": "omnivore",
## "ncbi_taxid": 39293,
## "rank": "species"
## },
## {
## "edge_num": 7,
## "vernacularName": "Striped skunk",
## "imageURL": "http://media.eol.org/content/2012/06/15/06/75234_orig.jpg",
## "imageLicense": "CC-BY",
## "imageAuthor": "Kevin Bowman",
## "infoURL": "http://eol.org/pages/328593/overview",
## "mass_in_kg": 2.4,
## "trophic_habit": "omnivore",
## "ncbi_taxid": 30548,
## "rank": "species"
## },
## {
## "edge_num": 8
## },
## {
## "edge_num": 9
## },
## {
## "edge_num": 10
## },
## {
## "edge_num": 11
## },
## {
## "edge_num": 12
## },
## {
## "edge_num": 13
## }
## ],
## "metadata": {"info": "R-package treeio", "data": "Thu Jan 17 18:22:25 2019"}
## }
file = "your_file_name"
to write.beast
or write.jtree
.read.beast
or read.jtree
.