R package DOSE released

Disease Ontology (DO) provides an open source ontology for the integration of biomedical data that is associated with human disease. DO analysis can lead to interesting discoveries that deserve further clinical investigation.

DOSE was designed for semantic similarity measure and enrichment analysis.

Four information content (IC)-based methods, proposed by Resnik, Jiang, Lin and Schlicker, and one graph structure-based method, proposed by Wang, were implemented. The calculation details can be referred to the vignette of R package GOSemSim.

Continue reading

In recently years, high-throughput experimental techniques such as microarray and mass spectrometry can identify many lists of genes and gene products. The most widely used strategy for high-throughput data analysis is to identify different gene clusters based on their expression profiles. Another commonly used approach is to annotate these genes to biological knowledge, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), and identify the statistically significantly enriched categories. These two different strategies were implemented in many bioconductor packages, such as Mfuzz and BHC for clustering analysis and GOstats for GO enrichment analysis.

Continue reading

The S3 OOP system

R currently supports two internal OOP systems (S3 and S4), and several others as add-on packages, such as R.oo, and OOP.

S3 is easy to use but not reliable enough for large software projects. The S3 system emphasize on generic functions and polymorphism. It’s a function centric system which is different from class centric system like JAVA.

Continue reading

Abstract

SUMMARY: The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. Four information content (IC)- and a graph-based methods are implemented in the GOSemSim package, multiple species including human, rat, mouse, fly and yeast are also supported. The functions provided by the GOSemSim offer flexibility for applications, and can be easily integrated into high-throughput analysis pipelines. AVAILABILITY: GOSemSim is released under the GNU General Public License within Bioconductor project, and freely available at http://bioconductor.org/packages/2.6/bioc/html/GOSemSim.html.

Continue reading

泰勒公式学过微积分都应该知道,可以翻wiki复习一下,https://zh.wikipedia.org/wiki/泰勒公式.

用R简单实现一下:

 efv <- function(f, value, variable="x", a=0, eps=0.001) {
     #estimate function value using Taylor theorem
     assign(eval(variable), a)
     fv.old <- eval(f)
     k <- 1     
     repeat {
         df <- D(f, variable)
         if (df == 0)
             break
         fv.new <- fv.old + eval(df)*(value-a)^k/factorial(k)
         if (fv.new - fv.old < eps)
             break
         fv.old <- fv.new
         f <- df
         k <- k+1
     }
     return (fv.new)
 }

Continue reading

when I recalled the switch function, it always gave the first element, no matter what the parameter is.

r-29-dev

when organism changed to “yeast”, and called switch function, species supposed to be changed to “Sc”, but it remains it’s original value.

Continue reading

创建R包

> source("code.R") #载入写好的代码 
> package.skeleton(name="pkgname", list=c("function_name_list"))
# 生成R源码包的目录结构 

到man目录下改*.Rd文档。latex格式。这是包和函数的帮助文档。

如果需要vignette文档的话。在包目录下,新建inst/doc,在里面写pkgname.Rnw文档。基本上是latex格式,不过允许你插入R代码。make的时候,会先跑代码。再自动转换成latex,再编译成pdf。

$ R CMD check pkgname 
# 检验代码和文档。这个很重要。通常一些小问题都能在这一步发现。 
$ R CMD build pkgname 
# 打包,源码包格式 
$ R CMD build --binary pkgname
#编译后打包,二进制格式。

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou