R package DOSE released

Disease Ontology (DO) provides an open source ontology for the integration of biomedical data that is associated with human disease. DO analysis can lead to interesting discoveries that deserve further clinical investigation.

DOSE was designed for semantic similarity measure and enrichment analysis.

Four information content (IC)-based methods, proposed by Resnik, Jiang, Lin and Schlicker, and one graph structure-based method, proposed by Wang, were implemented. The calculation details can be referred to the vignette of R package GOSemSim.

Continue reading

08年的时候,结合mRNA芯片和miRNA芯片数据,做靶标预测。某一天晚上,没睡着,想到可以通过靶标来计算miRNA的相似性,正好可以利用当时写的软件包GOSemSim来计算。

09年看到BMC Systems Biology上有一篇文章,利用miRNA所调控的蛋白相互作用网络对miRNA进行聚类,当时我就觉得,利用靶标来计算,这想法不发出去,它就不再新鲜了。

记得Watson说过,科研有三大动力,其一是赶在竞争对手之前,其二用于泡妞,其三,忘记了-,-

第三点和Watson对不上,所以记不住,第一是DNA双螺旋时,谁看到那个X射线衍射结果,谁拿炸药奖,Linus Pauling当时是他们最大的竞争对手。第二点是Watson 39岁时,娶了自己的学生,19岁。

08年的时候,算好了相似性矩阵,这事就一直掠着。老板压着我第一篇文章没发,这是最主要的原因,当然自己当时在搞申请,也有影响。

毕业后,我想把它扔了。

Continue reading

愚人节你好

去年4月1号,发了第一篇SCI,昨天无意中发现,06年我去中科院面试时,不要我的那个实验室,竟然在同一年,引用了我的文章。当年考两次中科院,结果还没考上,这是我以前耿耿于怀了很久的,都是浮云啊。

Continue reading

In recently years, high-throughput experimental techniques such as microarray and mass spectrometry can identify many lists of genes and gene products. The most widely used strategy for high-throughput data analysis is to identify different gene clusters based on their expression profiles. Another commonly used approach is to annotate these genes to biological knowledge, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), and identify the statistically significantly enriched categories. These two different strategies were implemented in many bioconductor packages, such as Mfuzz and BHC for clustering analysis and GOstats for GO enrichment analysis.

Continue reading

The foundamental idea of numerical integration is to estimate the area of the region in the xy-plane bounded by the graph of function f(x). The integral was esimated by dividing x into small intervals, then adds all the small approximations to give a total approximation. Trapezoidal rule Numerical integration can be done by trapezoidal rule, simpson’s rule and quadrature rules. R has a built-in function, integrate, which performs adaptive quadrature.

Continue reading

Root finding

Numerical root finding methods use iteration, producing a sequence of numbers that hopefully converge towards a limits which is a root. This post only focuses four basic algorithms on root finding, and covers bisection method, fixed point method, Newton-Raphson method, and secant method.

The simplest root finding algorithms is the bisection method. It works when f is a continuous function and it requires previous knowledge of two initial gueeses, u and v, such that f(u) and f(v) have opposite signs. This method is reliable, but converges slowly. For detail, see https://guangchuangyu.github.io/cn/2008/11/bisect-to-solve-equation/ .

Root finding can be reduced to the problem of finding fixed points of the function g(x) = c*f(x) +x, where c is a non-zero constant. It is clearly that f(a) = 0 if and only if g(a) = a. This is the so called fixed point algorithm.

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou