IBW2011

July 18, 2011 in conference

开完了IBW这个会，感觉比较面向学生，会议前的龙星课程，倒是不错，我就冲着它的题目过去的，我想听的是概率图模型和系统生物学，结果只有最后一天，早上讲了点bayesian network，下午讲了会GSEA，BN讲的只是基本的概念，GSEA讲的内容还不到位，只讲到fisher’s test做enrichment analysis，并没有讲如何把表达量通过某些方法把它转化为统计量，再进行enrichment analysis，没有考虑表达量，是不叫GSEA的，提出GSEA的那篇PNAS文章，讲的就是expression+prior knowledge (gene set)。

几天的时间，基本上是讲Hidden Markov Model用于序列比对，Prof Tang以前是在Waterman那里做post-doc，估计也是搞了很长时间的序列。我对这块兴趣不大。

a simple gene finder

July 10, 2011 in conference, R

上了几天的课，http://ibw2011.fmmu.edu.cn/schedule.htm 今天就上完了，只完成了project 1，想写gibbs sampling，但是没搞明白，汗。

这个纯属练习用，没啥实用价值。

Course Projects:

Project 1: Implementation of a simple gene finder

GOAL

Build a simple codon-usage based gene finder for finding genes in E.coli.

Procedure

Collect 100 gene sequences from the bacterium E. coli in the genbank (http://www.ncbi.nlm.nihh.gov). Compute the codon usage table based on these genes (and the translated protein sequences from them); Build a probabilistic model based on the codon usages; Implement a random sequence model in which the nucleotide frequency is computed from the 100 E. coli genes. For a given DNA sequence (and one selected reading frame), compare your model with a random sequence model; Results that you should submit:

Two FASTA files for the collected 100 genes and 100 translated protein sequences; The printed codon usage table; A program named ECgnfinder, running with the syntax as ECgnfinder –i inputfile

Inputfile stands for the name of input file, which should contain one DNA sequence in FASTA file format; the program should be able to report an error message if the input file is in the wrong format.

The output should be printed to the standard output as (xxx stands for the likelihood)

ORF1: xxx ORF2: xxx

screen shot

June 28, 2011 in BSD, Mac OS

买了苹果机，一直放在家里没用，准备开会带去用，拿过来装几个软件。

校园网用锐捷是个比较讨厌的东西。当年在华农，光这认证就搞了很久。

现在开源的认证客户端多了，搞起来也方便了，现在在linux下用的xmuruijie，就比当年mystar好用。

估计xmuruijie也是可以在苹果机上用的，依赖python，搞不好还有些modules要装，懒得去弄，搜了一下，找到mentohust，当然需要改一下mac address，用万能的ifconfig搞定。

为了编译软件，必要先装xcode，这个大家伙，4G多，中午挂到吃晚饭才下完。然后就可以装各种open source的软件。

mac上有fink支持debian一样的apt-get，还有原生的macport，gentoo的portage也支持，我选择使用netbsd的pkgsrc来安装各种开源软件。

以前用过比较长时间的netbsd，用着比较顺手。

Five things biologists should know about statistics

June 24, 2011 in statistics

Ewan Birney最近的一篇博文（Five statistical things I wished I had been taught 20 years ago ）讲述了统计对于生物学的重要性。

一开始从RA Fisher讲起，说生物压根就是统计。Fisher是个农业学家，他所建立的那些统计方法，都是从生物学问题出发。

Ewan所谈及的五个方面分别是：

1. Non parametric statistics. These are statistical tests which make a bare minimum of assumptions of underlying distributions; in biology we are rarely confident that we know the underlying distribution, and hand waving about central limit theorem can only get you so far. Wherever possible you should use a non parameteric test. This is Mann-Whitney (or Wilcoxon if you prefer) for testing “medians” (Medians is in quotes because this is not quite true. They test something which is closely related to the median) of two distributions, Spearman’s Rho (rather pearson’s r2) for correlation, and the Kruskal test rather than ANOVAs (though if I get this right, you can’t in Kruskal do the more sophisticated nested models you can do with ANOVA). Finally, don’t forget the rather wonderful Kolmogorov-Smirnov (I always think it sounds like really good vodka) test of whether two sets of observations come from the same distribution. All of these methods have a basic theme of doing things on the rank of items in a distribution, not the actual level. So - if in doubt, do things on the rank of metric, rather than the metric itself.

又虚长了一岁

May 5, 2011 in Paper

08年的时候，结合mRNA芯片和miRNA芯片数据，做靶标预测。某一天晚上，没睡着，想到可以通过靶标来计算miRNA的相似性，正好可以利用当时写的软件包GOSemSim来计算。

09年看到BMC Systems Biology上有一篇文章，利用miRNA所调控的蛋白相互作用网络对miRNA进行聚类，当时我就觉得，利用靶标来计算，这想法不发出去，它就不再新鲜了。

记得Watson说过，科研有三大动力，其一是赶在竞争对手之前，其二用于泡妞，其三，忘记了-,-

第三点和Watson对不上，所以记不住，第一是DNA双螺旋时，谁看到那个X射线衍射结果，谁拿炸药奖，Linus Pauling当时是他们最大的竞争对手。第二点是Watson 39岁时，娶了自己的学生，19岁。

08年的时候，算好了相似性矩阵，这事就一直掠着。老板压着我第一篇文章没发，这是最主要的原因，当然自己当时在搞申请，也有影响。

毕业后，我想把它扔了。

愚人节你好

April 1, 2011 in Personal

去年4月1号，发了第一篇SCI，昨天无意中发现，06年我去中科院面试时，不要我的那个实验室，竟然在同一年，引用了我的文章。当年考两次中科院，结果还没考上，这是我以前耿耿于怀了很久的，都是浮云啊。

Phosphoproteome profile of human lung cancer cell line A549

November 23, 2010 in Bioinformatics, Proteomics, Publication

As an in vitro model for type II human lung cancer, A549 cells resist cytotoxicity via phosphorylation of proteins as demonstrated by many studies. However, to date, no large-scale phosphoproteome investigation has been conducted on A549. Here, we performed a systematical analysis of the phosphoproteome of A549 by using mass spectrometry (MS)-based strategies. This investigation led to the identification of 337 phosphorylation sites on 181 phosphoproteins. Among them, 67 phosphoproteins and 230 phosphorylation sites identified appeared to be novel with no previous characterization in lung cancer.

NEWER POSTS
OLDER POSTS
page 33 of 39

IBW2011

a simple gene finder

screen shot

Five things biologists should know about statistics

又虚长了一岁

愚人节你好

Phosphoproteome profile of human lung cancer cell line A549

Guangchuang Yu

那些我讲过的oral

2016潮博沙龙

2016中国R语言大会

我不会用illustrator，只会用ppt！

biobabble的作者们

欧式距离如何应对缺失值

ggupset -- ggplot2版本的upset plot

你还在愁毕业？隔壁实验室的小哥从网上抄了几十行代码打了个R包，发了SCI，毕业了！

港校申请指北：考研考博失利之后的另一选择

为什么港校是一个选择

听说你想把pheatmap和ggplot2拼在一起

请问，你是要我帮你google吗？

怎么提问，是需要学习的

clusterProfiler事后丸: 转换ID为SYMBOL