从2007年写了第一篇文章之后,我发现管理文献真不是人干的,一直以来使用Zotero来管理文献,自从有了dropbox之后,就想把文献库放在dropbox上,一来有个云备份,不怕硬盘坏,二来嘛,实验室电脑和个人电脑可以实时同步化,无奈dropbox空间太小,而我的zotero早已超过10G,因为电子书也放在里面,只能做罢。

度娘出手还是很大方的,我现在的网盘已经有3T,包括之前在推广阶段用1元买的1T。

Continue reading

After two weeks developed, I have added/updated some plot functions in ChIPseeker (version >=1.0.1).

ChIP peaks over Chromosomes

> files=getSampleFiles()
> peak=readPeakFile(files[[4]])
> peak
GRanges object with 1331 ranges and 2 metadata columns:
         seqnames                 ranges strand   |             V4        V5
                               |        
     [1]     chr1     [ 815092,  817883]      *   |    MACS_peak_1    295.76
     [2]     chr1     [1243287, 1244338]      *   |    MACS_peak_2     63.19
     [3]     chr1     [2979976, 2981228]      *   |    MACS_peak_3    100.16
     [4]     chr1     [3566181, 3567876]      *   |    MACS_peak_4    558.89
     [5]     chr1     [3816545, 3818111]      *   |    MACS_peak_5     57.57
     ...      ...                    ...    ... ...            ...       ...
  [1327]     chrX [135244782, 135245821]      *   | MACS_peak_1327     55.54
  [1328]     chrX [139171963, 139173506]      *   | MACS_peak_1328    270.19
  [1329]     chrX [139583953, 139586126]      *   | MACS_peak_1329    918.73
  [1330]     chrX [139592001, 139593238]      *   | MACS_peak_1330    210.88
  [1331]     chrY [ 13845133,  13845777]      *   | MACS_peak_1331     58.39
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX  chrY
      NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA
> covplot(peak, weightCol="V5")

Continue reading

boxplot

生物坑很多人画图只会直方图,统计只会T检验,在暨大见过太多的学生连T检验都不会,分不清SEM和SD的差别,也不清楚T检验那几个简单参数的含义。我写统计笔记也是因为不想重复性地跟学生讲解T检验。

Barplot和T test一样普遍而流行,barplot适合于表示计数数据和比例,显示比例也可以用pie plot,但直方图比饼图要好,因为人类的眼睛适合于比较高度,而不是弧度。

多半时候生物学数据并非简单的计数数据,对于测量数据,在展示数据分布时,很多人会使用他们熟悉的barplot,用高度来表示mean,然后再加上errorbar,这样展示数据,信息量是非常低的,使用boxplot能够提供更多的数据分布信息,能更好地展现数据,但可能很多人只会在excel里画barplot,Nature Methods 2013年的文章中有100个barplot图,而只有20个boxplot图,从这里就可以看出来,用boxplot的人远远没有barplot多,于是NPG怒了,写了两篇专栏文章Points of View: Bar charts and box plotsPoints of Significance: Visualizing samples with box plots并且发表了一篇BoxPlotR: a web tool for generation of box plots方便大家画boxplot,如此简单的web tool能够发Nature Methods,实在是让人羡慕妒忌恨啊。

Continue reading

从2011年1月我就在实验室的QQ群里发群邮件说IPI关门,时至今日,已经关门3年了,主页上一直停留在关门大吉的那一刻。 我不断在邮件里, lab meeting上强调要换成uniprot来搜库,然而时至今日,依然还是有很多的人在使用IPI,想想真可怕,实验室真是100年不更新一下数据啊。 另外一个我非常讨厌的就是GI号,它压根就不是正儿八经的ID号,但他们从来就不愿意尝试改变。 比如上面这个蛋白质序列的FASTA文件,注释行有很多信息,比如: >gi|16128001|ref|NP_414548.1| putative transporter [Escherichia coli str. K-12 substr. MG1655] 显然搜库时可以使用NP_414548.1做为ID,这个问题我说过N多遍,但他们一定会用他们惯用的gi|16128001来做ID。 问题是很明显的: GI number (sometimes written in lower case, “gi”) is simply a series of digits that are assigned consecutively to each sequence record processed by NCBI. The GI number bears no resemblance to the Accession number of the sequence record. The GI number has been used for many years by NCBI to track sequence histories in GenBank and the other sequence databases it maintains.

Continue reading

I used R package ChIPpeakAnno for annotating peaks, and found that it handle the DNA strand in the wrong way. Maybe the developers were from the computer science but not biology background.

> require(ChIPpeakAnno)
> packageVersion("ChIPpeakAnno")
[1] '2.10.0'
> peak <- RangedData(space="chr1", IRanges(24736757, 24737528))
> data(TSS.human.GRCh37)
> ap <- annotatePeakInBatch(peak, Annotation=TSS.human.GRCh37)
> ap
RangedData with 1 row and 9 value columns across 1 space
                     space               ranges |        peak      strand
                               |  
1 ENSG00000001461        1 [24736757, 24737528] |           1           +
                          feature start_position end_position insideFeature
                                   
1 ENSG00000001461 ENSG00000001461       24742284     24799466      upstream
                  distancetoFeature shortestDistance fromOverlappingOrNearest
                                                
1 ENSG00000001461             -5527             4756             NearestStart

Continue reading

local blast

I was asked to set up a local blast for the lab. Blast can be installed directly using apt in debian and it turns out to be easy. root@jz:/ssd/genomes# apt-get install ncbi-blast+ Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: ncbi-blast+ 0 upgraded, 1 newly installed, 0 to remove and 26 not upgraded. Need to get 11.2 MB of archives. After this operation, 32.

Continue reading

Author's picture

Guangchuang Yu

Bioinformatics Professor @ SMU

Bioinformatics Professor

Guangzhou