IBW2011

开完了IBW这个会,感觉比较面向学生,会议前的龙星课程,倒是不错,我就冲着它的题目过去的,我想听的是概率图模型和系统生物学,结果只有最后一天,早上讲了点bayesian network,下午讲了会GSEA,BN讲的只是基本的概念,GSEA讲的内容还不到位,只讲到fisher’s test做enrichment analysis,并没有讲如何把表达量通过某些方法把它转化为统计量,再进行enrichment analysis,没有考虑表达量,是不叫GSEA的,提出GSEA的那篇PNAS文章,讲的就是expression+prior knowledge (gene set)。

几天的时间,基本上是讲Hidden Markov Model用于序列比对,Prof Tang以前是在Waterman那里做post-doc,估计也是搞了很长时间的序列。我对这块兴趣不大。

Continue reading

上了几天的课,http://ibw2011.fmmu.edu.cn/schedule.htm 今天就上完了,只完成了project 1,想写gibbs sampling,但是没搞明白,汗。

这个纯属练习用,没啥实用价值。

Course Projects:

Project 1: Implementation of a simple gene finder

GOAL

Build a simple codon-usage based gene finder for finding genes in E.coli.

Procedure

Collect 100 gene sequences from the bacterium E. coli in the genbank (http://www.ncbi.nlm.nihh.gov). Compute the codon usage table based on these genes (and the translated protein sequences from them); Build a probabilistic model based on the codon usages; Implement a random sequence model in which the nucleotide frequency is computed from the 100 E. coli genes. For a given DNA sequence (and one selected reading frame), compare your model with a random sequence model; Results that you should submit:

Two FASTA files for the collected 100 genes and 100 translated protein sequences; The printed codon usage table; A program named ECgnfinder, running with the syntax as ECgnfinder –i inputfile

Inputfile stands for the name of input file, which should contain one DNA sequence in FASTA file format; the program should be able to report an error message if the input file is in the wrong format.

The output should be printed to the standard output as (xxx stands for the likelihood)

ORF1: xxx ORF2: xxx

Continue reading

Author's picture

Guangchuang Yu

a senior-in-age-but-not-senior-in-knowledge bioinformatician

Postdoc researcher

Hong Kong