[Bioc 32] NEWS of my BioC packages

October 15, 2015 in R

In BioC 3.2 release, all my packages including GOSemSim, clusterProfiler, DOSE, ReactomePA, and ChIPseeker switch from Sweave to R Markdown for package vignettes.

GOSemSim

To make it consistent between GOSemSim and clusterProfiler, ‘worm’ was deprecated and instead we should use ‘celegans’. As usual, information content data was updated.

functional enrichment analysis with NGS data

August 21, 2015 in Bioinformatics, Genomics, R

I found a Bioconductor package, seq2pathway, that can apply functional analysis to NGS data. It consists of two components, seq2gene and gene2pathway. seq2gene converts genomic coordination to genes while gene2pathway performs functional analysis at gene level.

I think it would be interesting to incorporate seq2gene with clusterProfiler. But it fail to run due to it call absolute path of python installed in the author’s computer.

functional enrichment for GTEx paper

August 13, 2015 in R

The ENCODE consortium has recently published a great paper on Gene Expression from the GTEx dataset. A criticism raised on pubpeer is that the gene ontology enrichment analysis was done with DAVID which has not been updated in the last five years.

The result is shown below:

dotplot for enrichment result

June 23, 2015 in Visualization, R

This is a feature request from clusterProfiler user. It’s similar to what I implemented in clusterProfiler for comparing biological themes. For comparing different enrichment results, the x-axis represent different gene clusters while for a single enrichment result, the x-axis can be gene count or gene ratio. This is actually similar to traditional barplot, with dot position as bar height and dot color as bar color. But dotplot can represent one more feature nicely by dot size and it can be a good alternative to barplot.

use clusterProfiler as an universal enrichment analysis tool

May 11, 2015 in Bioinformatics, Systems Biology, R

clusterProfiler supports enrichment analysis of both hypergeometric test and gene set enrichment analysis. It internally supports Gene Ontology analysis of about 20 species, Kyoto Encyclopedia of Genes and Genomes (KEGG) with all species that have annotation available in KEGG database, DAVID annotation (only hypergeometric test supported), Disease Ontology and Network of Cancer Genes (via DOSE for human) and Reactome Pathway (via ReactomePA for several species). This is still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (eg GO via blastgo and KEGG via KAAS), unsupported ontology/pathway or customized annotation.

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

[Bioc 31] NEWS of my BioC packages

April 20, 2015 in R

GOSemSim

GOSemSim was first implemented in 2008 and published in Bioinformatics in 2010. It’s now a mature package with no bugs found in the past half year. Only vignette and Information content data were updated.

DAVID functional analysis with clusterProfiler

March 16, 2015 in R

clusterProfiler was used to visualize DAVID results in a paper published in BMC Genomics.

Some users told me that they may want to use DAVID at some circumstances. I think it maybe a good idea to make clusterProfiler supports DAVID, so that DAVID users can use visualization functions provided by clusterProfiler.

require(DOSE)
require(clusterProfiler)
data(geneList)
gene = names(geneList)[abs(geneList) > 2]
david = enrichDAVID(gene = gene, idType="ENTREZ_GENE_ID", 
listType="Gene", annotation="KEGG_PATHWAY")



> summary(david)
               ID            Description GeneRatio  BgRatio       pvalue
hsa04110 hsa04110             Cell cycle     11/68 125/5085 4.254437e-06
hsa04114 hsa04114         Oocyte meiosis     10/68 110/5085 1.119764e-05
hsa03320 hsa03320 PPAR signaling pathway      7/68  69/5085 2.606715e-04
             p.adjust qvalue                                             geneID
hsa04110 0.0003998379     NA 9133/4174/890/991/1111/891/7272/8318/4085/983/9232
hsa04114 0.0005261534     NA    9133/5241/51806/3708/991/891/4085/983/9232/6790
hsa03320 0.0081354974     NA                 4312/2167/5346/5105/3158/9370/9415
         Count
hsa04110    11
hsa04114    10
hsa03320     7

There are only 5085 human genes annotated by KEGG, this is due to out-of-date DAVID data.

[Bioc 32] NEWS of my BioC packages

GOSemSim

functional enrichment analysis with NGS data

functional enrichment for GTEx paper

dotplot for enrichment result

use clusterProfiler as an universal enrichment analysis tool

[Bioc 31] NEWS of my BioC packages

GOSemSim

DAVID functional analysis with clusterProfiler

Guangchuang Yu