上了几天的课,http://ibw2011.fmmu.edu.cn/schedule.htm
今天就上完了,只完成了project 1,想写gibbs sampling,但是没搞明白,汗。
这个纯属练习用,没啥实用价值。
Course Projects:
Project 1: Implementation of a simple gene finder
GOAL
Build a simple codon-usage based gene finder for finding genes in
E.coli.
Procedure
Collect 100 gene sequences from the bacterium E. coli in the genbank
(http://www.ncbi.nlm.nihh.gov). Compute the codon usage table based on
these genes (and the translated protein sequences from them); Build a
probabilistic model based on the codon usages; Implement a random
sequence model in which the nucleotide frequency is computed from the
100 E. coli genes. For a given DNA sequence (and one selected reading
frame), compare your model with a random sequence model; Results that
you should submit:
Two FASTA files for the collected 100 genes and 100 translated protein
sequences; The printed codon usage table; A program named ECgnfinder,
running with the syntax as ECgnfinder –i inputfile
Inputfile stands for the name of input file, which should contain one
DNA sequence in FASTA file format; the program should be able to
report an error message if the input file is in the wrong format.
The output should be printed to the standard output as (xxx stands for
the likelihood)
ORF1: xxx ORF2: xxx
Continue reading