ggtree in Bioconductor 3.1

January 19, 2015 in R

I am very glad that ggtree is now available via Bioconductor. This is my 6th Bioconductor package. ggtree now supports parsing output files from BEAST, PAML, HYPHY, EPA and PPLACER and can annotate phylogenetic tree directly using plot methods.

January 5, 2015 in R

I like running fortune every time when the terminal was started. A screenshot is shown below:

wmaker

viewing and annotating phylogenetic tree with ggtree

December 21, 2014 in Evolution, Visualization, R

When I need to annotate nucleotide substitutions in the phylogenetic tree, I found that all the software are designed to display the tree but not annotating it. Some of them may support annotating the tree with specific data such as bootstrap values, but they are restricted to a few supported data types. It is hard/impossible to inject user specific data.

proper use of GOSemSim

November 23, 2014 in Research, R

One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN.

SIR Model of Epidemics

October 13, 2014 in R, Visualization, Epidemics

The SIR model divides the population to three compartments: Susceptible, Infected and Recovered. If the disease dynamic fits the SIR model, then the flow of individuals is one direction from the susceptible group to infected group and then to the recovered group. All individuals are assumed to be identical in terms of their susceptibility to infection, infectiousness if infected and mixing behaviour associated with disease transmission.

We defined: $S_t$ = the number of susceptible individuals at time t

$ I_t $ = the number of infected individuals at time t

$R_t$ = the number of recovered individuals at time t

Suppose on average every infected individual will contact $\gamma$ person, and $\kappa$ percent of these $\gamma$ person will be infected. Then on average there are $\beta = \gamma \times \kappa$ person will be infected an infected individual.

multiple annotation in ChIPseeker

October 1, 2014 in Genomics, Visualization, R

Nearest gene annotation

Almost all annotation software calculate the distance of a peak to the nearest TSS and assign the peak to that gene. This can be misleading, as binding sites might be located between two start sites of different genes or hit different genes which have the same TSS location in the genome.

The function annotatePeak provides option to assign genes with a max distance cutoff and all genes within this distance were reported for each peak.

The spread of new mutations

August 13, 2014 in R, Biology, Evolution

Genetic drift is the term used in population genetics to refer to the statistical drift over time of gene frequencies in a population due to random sampling effects in the formation of successive generations. In a narrower sense, genetic drift refers to the expected population dynamics of neutral alleles (those defined as having no positive or negative impact on reproductive fitness), which are predicted to eventually become fixed at zero or 100% frequency in the absence of other mechanisms affecting allele distributions.

The most important keyword in the definition of genetic drift is random sampling effects. The figure belowed illustrates this idea. The surviving individuals do not necessarily have selection advantage. They are randomly selected.

ggtree in Bioconductor 3.1

viewing and annotating phylogenetic tree with ggtree

proper use of GOSemSim

SIR Model of Epidemics

multiple annotation in ChIPseeker

Nearest gene annotation

The spread of new mutations

Guangchuang Yu