This section will demonstrate how to use treeio to parse tree with associated data to a single object in R.

This lesson assumes a basic familiarity with R and data frames.

This lesson does not cover methods and software for generating phylogenetic trees, nor does it it cover interpreting phylogenies. Here’s a quick primer on how to read a phylogeny that you should definitely review prior to this lesson, but it is by no means extensive. Genome-wide sequencing allows for examination of the entire genome, and from this, many methods and software tools exist for comparative genomics using SNP- and gene-based phylogenetic analysis, either from unassembled sequencing reads, draft assemblies/contigs, or complete genome sequences. These methods are beyond the scope of this lesson.

The treeio Package

treeio is an R package that designed for phylogenetic tree data input and output. It is released as part of Bioconductor and ROpenSci projects.

  1. tree Bioconductor page: https://www.bioconductor.org/packages/treeio.
  2. treeio homepage: https://guangchuangyu.github.io/treeio.

Just like R packages from CRAN, you only need to install Bioconductor packages once (instructions here), then load them every time you start a new R session.

library(treeio)

Most tree viewer software (including R packages) focus on Newick and NEXUS file formats, and other evolutionary analysis software might also contains supporting evidence and/or analysis findings within the file that can be further analyzed in R or interpreted in phylogenetic context to help identifying evolutionary patterns.

treeio supports several file formats, including:

and software output from:


Basic trees

Let’s first import a phylogenetic tree in to R. Phylogenetic tree are mainly stored as Newick or Nexus formats that only contains tree structure with (phylogram) or without (cladogram) branch lengths.

Download the tree.nwk data by clicking here or using the link above. Let’s load the libraries you’ll need if you haven’t already, and then import the tree using read.tree(). Displaying the object itself tells you a little bit about the tree, e.g. number of tips and nodes, a glance of tip (and node if available) labels.

library(treeio)
tree <- read.tree("data/tree.nwk")
tree
## 
## Phylogenetic tree with 28 tips and 26 internal nodes.
## 
## Tip labels:
##  Phy000G05U_EMENI, Phy000GDP6_ASPNG, Phy003AMS0_602072, Phy000FJDH_ASPFL, Phy000FCLK_ASPCL, Phy000FQ5O_ASPFU, ...
## Node labels:
##  , 0.99985, 0.99985, 0.72129, 0.991353, 0.99985, ...
## 
## Unrooted; includes branch lengths.

The only possible way for a Newick tree to store additional information other than the tree structure is to encode the information as taxa labels. This tree use node label to store support values (e.g. bootstrap values).

Exercise 1

Look at the help page for ?ape::read.tree, specifically the Details and Value session to explore the components of the phylo object.

  1. Summarize a branch length distribution by Histogram.

Getting tree data from evolutionary analysis result

The treeio package implement several parser functions.

Parser functions defined in treeio
Parser function Description
read.beast parsing output of BEAST
read.codeml parsing output of CodeML (rst and mlc files)
read.codeml_mlc parsing mlc file (output of CodeML)
read.hyphy parsing output of HYPHY
read.jplace parsing jplace file including output of EPA and pplacer
read.mrbayes parsing output of MrBayes
read.newick parsing newick string, with ability to parse node label as support values
read.nhx parsing NHX file including output of PHYLDOG and RevBayes
read.paml_rst parsing rst file (output of BaseML or CodeML)
read.phylip parsing phylip file (phylip alignment + newick string)
read.r8s parsing output of r8s
read.raxml parsing output of RAxML

After parsing, storage of the tree structure with associated data is made through a S4 class, treedata, defined in the treeio package. These parsed data are mapped to the tree branches and nodes inside treedata object, so that they can be efficiently used to visually annotate the tree using ggtree package.

Here, we use BEAST output as an example of using these parser functions to import tree with associated data. The details of BEAST output can be found on http://beast.community/nexus_metacomments.html. In summary, it introudces ‘metacomment’ to annotate elements in the standard NEXUS format. The additional information is put in comments so that existing programs can read the tree by ignoring them. treeio is able to read the tree with all inserted information stored in BEAST output.

beast_tree <- read.beast("data/MCC_FluA_H3.tree")
beast_tree
## 'treedata' S4 object that stored information of
##  'data/MCC_FluA_H3.tree'.
## 
## ...@ phylo: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##  A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',    'height_range', 'length',
##  'length_0.95_HPD',  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range'.

As there are many R package works on the phylo object, treeio provides as.phylo method to convert treedata object to a phylo object, and get.data method to extract associated data stored in treedata object.

The tidytree provides as_tibble method to convert treedata object to tidy data and creates the possibility of manipulating tree data using tidy interface.

Exercise 2

  1. convert the beast_tree to phylo object.
  2. extract tree associated data from beast_tree.
  3. convert beast_tree to tidy data.
  4. create a scatter plot of substitution rate vs branch length of the beast_tree

Linking external data to phylogeny

There is a wide range of heterogeneous data, such as traits, geographic distritution, experimental and clinical data, etc, that need to be integrated and linked to phylogeny. For example, in the study of viral evolution, tree nodes may associated with epidemiological information, such as location, age and subtype. Functional annotations may need to be mapped on gene trees for comparative genomic studies.

To facilitate data integration, treeio provides full_join method to link external data to phylogeny and stored in treedata object.

library(readr)

x <- read.tree("data/tree_boots.nwk")

info <- read_csv("data/tip_data.csv")
names(info)[1] <- 'label'
print(info)
## # A tibble: 7 x 10
##   label vernacularName imageURL imageLicense imageAuthor infoURL mass_in_kg
##   <chr> <chr>          <chr>    <chr>        <chr>       <chr>        <dbl>
## 1 Rang~ Reindeer       http://~ CC-BY-SA     Alexandre ~ http:/~     109.  
## 2 Cerv~ Red deer       http://~ CC-BY-SA     Sciadopitys http:/~     241.  
## 3 Bos_~ Cattle         https:/~ CC-BY-SA     Cynthia Si~ http:/~     619.  
## 4 Ovis~ Asiatic moufl~ http://~ CC-BY-SA     J<U+0161>rg Hempel http:/~      39.1 
## 5 Suri~ Meerkat        http://~ CC-BY-SA     Sara&Joach~ http:/~       0.73
## 6 Cyst~ Hooded seal    http://~ CC-BY-SA     Ecomare, S~ http:/~     279.  
## 7 Meph~ Striped skunk  http://~ CC-BY        Kevin Bowm~ http:/~       2.4 
## # ... with 3 more variables: trophic_habit <chr>, ncbi_taxid <dbl>,
## #   rank <chr>
d2 <- read_csv("data/inode_data.csv")
names(d2)[1] <- 'label'

tree2 <- full_join(as.treedata(x), info, by='label')
tree2
## 'treedata' S4 object'.
## 
## ...@ phylo: 
## Phylogenetic tree with 7 tips and 6 internal nodes.
## 
## Tip labels:
##  Rangifer_tarandus, Cervus_elaphus, Bos_taurus, Ovis_orientalis, Suricata_suricatta, Cystophora_cristata, ...
## Node labels:
## [1] "Mammalia"     "Artiodactyla" "Cervidae"     "Bovidae"     
## [5] "Carnivora"    "Caniformia"  
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  '', 'vernacularName',   'imageURL', 'imageLicense', 'imageAuthor',
##  'infoURL',  'mass_in_kg',   'trophic_habit',    'ncbi_taxid',   'rank'.
full_join(tree2, d2, by='label')
## 'treedata' S4 object'.
## 
## ...@ phylo: 
## Phylogenetic tree with 7 tips and 6 internal nodes.
## 
## Tip labels:
##  Rangifer_tarandus, Cervus_elaphus, Bos_taurus, Ovis_orientalis, Suricata_suricatta, Cystophora_cristata, ...
## Node labels:
## [1] "Mammalia"     "Artiodactyla" "Cervidae"     "Bovidae"     
## [5] "Carnivora"    "Caniformia"  
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  '', 'vernacularName.x', 'imageURL', 'imageLicense', 'imageAuthor',
##  'infoURL.x',    'mass_in_kg',   'trophic_habit',    'ncbi_taxid',   'rank.x',
##  'vernacularName.y', 'infoURL.y',    'rank.y',   'bootstrap',    'posterior'.

Exercise 3

  1. link internal node data (d2) to tree (tree2)
  2. check whether you have mapped internal node data successful (e.g. bootstrap and posterior)
  3. create a scatter plot of posterior vs bootstrap values.

Exporting trees with data

treeio provides write.beast and write.jtree to export tree with data to a single file. write.beast outputs BEAST compatible file that can be open using FigTree and many other software - most of them can read the tree by ignoring data.

write.beast(tree2)
## #NEXUS
## [R-package treeio, Thu Jan 17 18:22:25 2019]
## 
## BEGIN TAXA;
##  DIMENSIONS NTAX = 7;
##  TAXLABELS
##      Rangifer_tarandus
##      Cervus_elaphus
##      Bos_taurus
##      Ovis_orientalis
##      Suricata_suricatta
##      Cystophora_cristata
##      Mephitis_mephitis
##  ;
## END;
## BEGIN TREES;
##  TRANSLATE
##      1   Rangifer_tarandus,
##      2   Cervus_elaphus,
##      3   Bos_taurus,
##      4   Ovis_orientalis,
##      5   Suricata_suricatta,
##      6   Cystophora_cristata,
##      7   Mephitis_mephitis
##  ;
##  TREE * UNTITLED = [&R] (((1[&vernacularName=Reindeer,imageURL=http://media.eol.org/content/2012/06/13/00/48543_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Alexandre Buisse (Nattfodd),infoURL=http://eol.org/pages/328653/overview,mass_in_kg=109.09,trophic_habit=herbivore,ncbi_taxid=9870,rank=species]:1,2[&vernacularName=Red deer,imageURL=http://media.eol.org/content/2014/09/16/00/20239_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Sciadopitys,infoURL=http://eol.org/pages/328649/overview,mass_in_kg=240.87,trophic_habit=herbivore,ncbi_taxid=9860,rank=species]:1)Cervidae:1,(3[&vernacularName=Cattle,imageURL=https://media.eol.org/content/2014/09/29/06/46535_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Cynthia Sims Parr,infoURL=http://eol.org/pages/328699/overview,mass_in_kg=618.64,trophic_habit=herbivore,ncbi_taxid=9913,rank=species]:1,4[&vernacularName=Asiatic mouflon,imageURL=http://media.eol.org/content/2015/05/20/03/80720_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=J<U+0161>rg Hempel,infoURL=http://eol.org/pages/311906/overview,mass_in_kg=39.1,trophic_habit=herbivore,ncbi_taxid=469796,rank=species]:1)Bovidae:1)Artiodactyla:1,(5[&vernacularName=Meerkat,imageURL=http://media.eol.org/content/2016/08/16/05/67138_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Sara&Joachim,infoURL=http://eol.org/pages/311580/overview,mass_in_kg=0.73,trophic_habit=carnivore,ncbi_taxid=37032,rank=species]:2,(6[&vernacularName=Hooded seal,imageURL=http://media.eol.org/content/2013/06/18/07/63362_orig.jpg,imageLicense=CC-BY-SA,imageAuthor=Ecomare, Salko de Wolf,infoURL=http://eol.org/pages/328632/overview,mass_in_kg=278.9,trophic_habit=omnivore,ncbi_taxid=39293,rank=species]:1,7[&vernacularName=Striped skunk,imageURL=http://media.eol.org/content/2012/06/15/06/75234_orig.jpg,imageLicense=CC-BY,imageAuthor=Kevin Bowman,infoURL=http://eol.org/pages/328593/overview,mass_in_kg=2.4,trophic_habit=omnivore,ncbi_taxid=30548,rank=species]:1)Caniformia:1)Carnivora:1)Mammalia;
## END;

write.jtree outputs JSON (JavaScript Object Notation) file, which is a lightweight data-interchange format and widely supported in almost all modern programming languages.

write.jtree(tree2)
## {
##  "tree": "(((Rangifer_tarandus:1{1},Cervus_elaphus:1{2})Cervidae:1{10},(Bos_taurus:1{3},Ovis_orientalis:1{4})Bovidae:1{11})Artiodactyla:1{9},(Suricata_suricatta:2{5},(Cystophora_cristata:1{6},Mephitis_mephitis:1{7})Caniformia:1{13})Carnivora:1{12})Mammalia{8};",
##  "data":[
##   {
##     "edge_num": 1,
##     "vernacularName": "Reindeer",
##     "imageURL": "http://media.eol.org/content/2012/06/13/00/48543_orig.jpg",
##     "imageLicense": "CC-BY-SA",
##     "imageAuthor": "Alexandre Buisse (Nattfodd)",
##     "infoURL": "http://eol.org/pages/328653/overview",
##     "mass_in_kg": 109.09,
##     "trophic_habit": "herbivore",
##     "ncbi_taxid": 9870,
##     "rank": "species"
##   },
##   {
##     "edge_num": 2,
##     "vernacularName": "Red deer",
##     "imageURL": "http://media.eol.org/content/2014/09/16/00/20239_orig.jpg",
##     "imageLicense": "CC-BY-SA",
##     "imageAuthor": "Sciadopitys",
##     "infoURL": "http://eol.org/pages/328649/overview",
##     "mass_in_kg": 240.87,
##     "trophic_habit": "herbivore",
##     "ncbi_taxid": 9860,
##     "rank": "species"
##   },
##   {
##     "edge_num": 3,
##     "vernacularName": "Cattle",
##     "imageURL": "https://media.eol.org/content/2014/09/29/06/46535_orig.jpg",
##     "imageLicense": "CC-BY-SA",
##     "imageAuthor": "Cynthia Sims Parr",
##     "infoURL": "http://eol.org/pages/328699/overview",
##     "mass_in_kg": 618.64,
##     "trophic_habit": "herbivore",
##     "ncbi_taxid": 9913,
##     "rank": "species"
##   },
##   {
##     "edge_num": 4,
##     "vernacularName": "Asiatic mouflon",
##     "imageURL": "http://media.eol.org/content/2015/05/20/03/80720_orig.jpg",
##     "imageLicense": "CC-BY-SA",
##     "imageAuthor": "J<U+0161>rg Hempel",
##     "infoURL": "http://eol.org/pages/311906/overview",
##     "mass_in_kg": 39.1,
##     "trophic_habit": "herbivore",
##     "ncbi_taxid": 469796,
##     "rank": "species"
##   },
##   {
##     "edge_num": 5,
##     "vernacularName": "Meerkat",
##     "imageURL": "http://media.eol.org/content/2016/08/16/05/67138_orig.jpg",
##     "imageLicense": "CC-BY-SA",
##     "imageAuthor": "Sara&Joachim",
##     "infoURL": "http://eol.org/pages/311580/overview",
##     "mass_in_kg": 0.73,
##     "trophic_habit": "carnivore",
##     "ncbi_taxid": 37032,
##     "rank": "species"
##   },
##   {
##     "edge_num": 6,
##     "vernacularName": "Hooded seal",
##     "imageURL": "http://media.eol.org/content/2013/06/18/07/63362_orig.jpg",
##     "imageLicense": "CC-BY-SA",
##     "imageAuthor": "Ecomare, Salko de Wolf",
##     "infoURL": "http://eol.org/pages/328632/overview",
##     "mass_in_kg": 278.9,
##     "trophic_habit": "omnivore",
##     "ncbi_taxid": 39293,
##     "rank": "species"
##   },
##   {
##     "edge_num": 7,
##     "vernacularName": "Striped skunk",
##     "imageURL": "http://media.eol.org/content/2012/06/15/06/75234_orig.jpg",
##     "imageLicense": "CC-BY",
##     "imageAuthor": "Kevin Bowman",
##     "infoURL": "http://eol.org/pages/328593/overview",
##     "mass_in_kg": 2.4,
##     "trophic_habit": "omnivore",
##     "ncbi_taxid": 30548,
##     "rank": "species"
##   },
##   {
##     "edge_num": 8
##   },
##   {
##     "edge_num": 9
##   },
##   {
##     "edge_num": 10
##   },
##   {
##     "edge_num": 11
##   },
##   {
##     "edge_num": 12
##   },
##   {
##     "edge_num": 13
##   }
## ],
##  "metadata": {"info": "R-package treeio", "data": "Thu Jan 17 18:22:25 2019"}
## }

Exercise 4

  1. output the file to a single file by passing file = "your_file_name" to write.beast or write.jtree.
  2. re-read the output file into R using read.beast or read.jtree.