IBW2011

开完了IBW这个会,感觉比较面向学生,会议前的龙星课程,倒是不错,我就冲着它的题目过去的,我想听的是概率图模型和系统生物学,结果只有最后一天,早上讲了点bayesian network,下午讲了会GSEA,BN讲的只是基本的概念,GSEA讲的内容还不到位,只讲到fisher’s test做enrichment analysis,并没有讲如何把表达量通过某些方法把它转化为统计量,再进行enrichment analysis,没有考虑表达量,是不叫GSEA的,提出GSEA的那篇PNAS文章,讲的就是expression+prior knowledge (gene set)。

几天的时间,基本上是讲Hidden Markov Model用于序列比对,Prof Tang以前是在Waterman那里做post-doc,估计也是搞了很长时间的序列。我对这块兴趣不大。

Continue reading

上了几天的课,http://ibw2011.fmmu.edu.cn/schedule.htm 今天就上完了,只完成了project 1,想写gibbs sampling,但是没搞明白,汗。

这个纯属练习用,没啥实用价值。

Course Projects:

Project 1: Implementation of a simple gene finder

GOAL

Build a simple codon-usage based gene finder for finding genes in E.coli.

Procedure

Collect 100 gene sequences from the bacterium E. coli in the genbank (http://www.ncbi.nlm.nihh.gov). Compute the codon usage table based on these genes (and the translated protein sequences from them); Build a probabilistic model based on the codon usages; Implement a random sequence model in which the nucleotide frequency is computed from the 100 E. coli genes. For a given DNA sequence (and one selected reading frame), compare your model with a random sequence model; Results that you should submit:

Two FASTA files for the collected 100 genes and 100 translated protein sequences; The printed codon usage table; A program named ECgnfinder, running with the syntax as ECgnfinder –i inputfile

Inputfile stands for the name of input file, which should contain one DNA sequence in FASTA file format; the program should be able to report an error message if the input file is in the wrong format.

The output should be printed to the standard output as (xxx stands for the likelihood)

ORF1: xxx ORF2: xxx

Continue reading

screen shot

买了苹果机,一直放在家里没用,准备开会带去用,拿过来装几个软件。

校园网用锐捷是个比较讨厌的东西。当年在华农,光这认证就搞了很久。

现在开源的认证客户端多了,搞起来也方便了,现在在linux下用的xmuruijie,就比当年mystar好用。

估计xmuruijie也是可以在苹果机上用的,依赖python,搞不好还有些modules要装,懒得去弄,搜了一下,找到mentohust,当然需要改一下mac address,用万能的ifconfig搞定。

为了编译软件,必要先装xcode,这个大家伙,4G多,中午挂到吃晚饭才下完。 然后就可以装各种open source的软件。

mac上有fink支持debian一样的apt-get,还有原生的macport,gentoo的portage也支持,我选择使用netbsd的pkgsrc来安装各种开源软件。

以前用过比较长时间的netbsd,用着比较顺手。

Continue reading

Ewan Birney最近的一篇博文(Five statistical things I wished I had been taught 20 years ago )讲述了统计对于生物学的重要性。

一开始从RA Fisher讲起,说生物压根就是统计。Fisher是个农业学家,他所建立的那些统计方法,都是从生物学问题出发。

Ewan所谈及的五个方面分别是:

1. Non parametric statistics. These are statistical tests which make a bare minimum of assumptions of underlying distributions; in biology we are rarely confident that we know the underlying distribution, and hand waving about central limit theorem can only get you so far. Wherever possible you should use a non parameteric test. This is Mann-Whitney (or Wilcoxon if you prefer) for testing “medians” (Medians is in quotes because this is not quite true. They test something which is closely related to the median) of two distributions, Spearman’s Rho (rather pearson’s r2) for correlation, and the Kruskal test rather than ANOVAs (though if I get this right, you can’t in Kruskal do the more sophisticated nested models you can do with ANOVA). Finally, don’t forget the rather wonderful Kolmogorov-Smirnov (I always think it sounds like really good vodka) test of whether two sets of observations come from the same distribution. All of these methods have a basic theme of doing things on the rank of items in a distribution, not the actual level. So - if in doubt, do things on the rank of metric, rather than the metric itself.

Continue reading

08年的时候,结合mRNA芯片和miRNA芯片数据,做靶标预测。某一天晚上,没睡着,想到可以通过靶标来计算miRNA的相似性,正好可以利用当时写的软件包GOSemSim来计算。

09年看到BMC Systems Biology上有一篇文章,利用miRNA所调控的蛋白相互作用网络对miRNA进行聚类,当时我就觉得,利用靶标来计算,这想法不发出去,它就不再新鲜了。

记得Watson说过,科研有三大动力,其一是赶在竞争对手之前,其二用于泡妞,其三,忘记了-,-

第三点和Watson对不上,所以记不住,第一是DNA双螺旋时,谁看到那个X射线衍射结果,谁拿炸药奖,Linus Pauling当时是他们最大的竞争对手。第二点是Watson 39岁时,娶了自己的学生,19岁。

08年的时候,算好了相似性矩阵,这事就一直掠着。老板压着我第一篇文章没发,这是最主要的原因,当然自己当时在搞申请,也有影响。

毕业后,我想把它扔了。

Continue reading

愚人节你好

去年4月1号,发了第一篇SCI,昨天无意中发现,06年我去中科院面试时,不要我的那个实验室,竟然在同一年,引用了我的文章。当年考两次中科院,结果还没考上,这是我以前耿耿于怀了很久的,都是浮云啊。

Continue reading

As an in vitro model for type II human lung cancer, A549 cells resist cytotoxicity via phosphorylation of proteins as demonstrated by many studies. However, to date, no large-scale phosphoproteome investigation has been conducted on A549. Here, we performed a systematical analysis of the phosphoproteome of A549 by using mass spectrometry (MS)-based strategies. This investigation led to the identification of 337 phosphorylation sites on 181 phosphoproteins. Among them, 67 phosphoproteins and 230 phosphorylation sites identified appeared to be novel with no previous characterization in lung cancer.

Continue reading

Author's picture

Guangchuang Yu

a senior-in-age-but-not-senior-in-knowledge bioinformatician

Postdoc researcher

Hong Kong