通常做blast,我都是输出asn格式,然后用blast_formatter抽出需要的信息写成表格格式。

序列的description行当然不会有重复,但是ID (第一个空白前面的字符)是有重复的,从qseqid来看,就分不清是那个序列,于是准备让blast_formatter把query title(整个description line)给输出来,结果仔细阅读了blastn -help发现不支持呀不支持!然而它支持stitle和salltitle,也就是subject的可以,但query的不行,这绝逼是一个大坑!_| ̄|○

Continue reading

To my knowledge, BioEdit is the most comprehensive biological sequence alignment editor. Most of my labmates run this software using Parallels Desktop. For some of them, BioEdit is the only reason to install Parallels Desktop.

I need to edit my alignment recently, and install it in my iMac using Wine, which is a compatibility layer for running Windows applications on POSIX-compliant OS. Although it is famous in Linux community for many years, many OSX users never heard of it.

Continue reading

ppiPre抄袭了GOSemSim的代码,证据当然非常充分,比对一下代码就知道了,我在Proper use of GOSemSim一文中,做出了一些比较,另外也可以参考github页面,github记录了ppiPre被暴光抄袭之后所做的修改。 从我给BMC Systems Biology的编辑反馈这件事开始,在这铁板钉钉的事实面前,编辑拖了整整一年,而这一年时间过去了,ppiPre仍没有被编辑部受理。从最早反应这件事情,编辑信誓旦旦说他们很重视这种事情,到后面对我的邮件视而不见,我愿意相信编辑部处理这些事情,需要时间,他们有自己的规则,但一年的时间,不回邮件,冷处理以淡化此事,这绝对不是应该有的规则。 在编辑一直无视我的情况下,我写出了Proper use of GOSemSim一文,列举了一些一模一样的代码,并告知CRAN,当ppiPre被CRAN移除时,我写信给编辑,这时候,编辑告诉我说他们准备要去联系作者了,这时候已经过去半年了,是的!你没有看错,半年过去了,编辑说他们还没去联系作者!我是不相信的。必然是联系了之后,有某些不为人知的原因,所以编辑态度反常,对抄袭这种打鸡血的事情,不断在打太极。

再过二个月,ppiPre的作者邓岳给我写了信:

Continue reading

在进行测序的时候,需要将DNA打断,构建library,这些fragment需要接上adaptor,好进行扩增,illumina的测序,可以有single end和paired end两种,分别从一端和两端进行测序。

fragment                  ========================================
fragment + adaptors    ~~~========================================~~~
SE read                   --------->
PE reads                R1--------->                    <---------R2
unknown gap                         ....................

Continue reading

Genetic drift is the term used in population genetics to refer to the statistical drift over time of gene frequencies in a population due to random sampling effects in the formation of successive generations. In a narrower sense, genetic drift refers to the expected population dynamics of neutral alleles (those defined as having no positive or negative impact on reproductive fitness), which are predicted to eventually become fixed at zero or 100% frequency in the absence of other mechanisms affecting allele distributions.

The most important keyword in the definition of genetic drift is random sampling effects. The figure belowed illustrates this idea. The surviving individuals do not necessarily have selection advantage. They are randomly selected.

Continue reading

从2011年1月我就在实验室的QQ群里发群邮件说IPI关门,时至今日,已经关门3年了,主页上一直停留在关门大吉的那一刻。 我不断在邮件里, lab meeting上强调要换成uniprot来搜库,然而时至今日,依然还是有很多的人在使用IPI,想想真可怕,实验室真是100年不更新一下数据啊。 另外一个我非常讨厌的就是GI号,它压根就不是正儿八经的ID号,但他们从来就不愿意尝试改变。 比如上面这个蛋白质序列的FASTA文件,注释行有很多信息,比如: >gi|16128001|ref|NP_414548.1| putative transporter [Escherichia coli str. K-12 substr. MG1655] 显然搜库时可以使用NP_414548.1做为ID,这个问题我说过N多遍,但他们一定会用他们惯用的gi|16128001来做ID。 问题是很明显的: GI number (sometimes written in lower case, “gi”) is simply a series of digits that are assigned consecutively to each sequence record processed by NCBI. The GI number bears no resemblance to the Accession number of the sequence record. The GI number has been used for many years by NCBI to track sequence histories in GenBank and the other sequence databases it maintains.

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou