从2007年写了第一篇文章之后,我发现管理文献真不是人干的,一直以来使用Zotero来管理文献,自从有了dropbox之后,就想把文献库放在dropbox上,一来有个云备份,不怕硬盘坏,二来嘛,实验室电脑和个人电脑可以实时同步化,无奈dropbox空间太小,而我的zotero早已超过10G,因为电子书也放在里面,只能做罢。
度娘出手还是很大方的,我现在的网盘已经有3T,包括之前在推广阶段用1元买的1T。
从2007年写了第一篇文章之后,我发现管理文献真不是人干的,一直以来使用Zotero来管理文献,自从有了dropbox之后,就想把文献库放在dropbox上,一来有个云备份,不怕硬盘坏,二来嘛,实验室电脑和个人电脑可以实时同步化,无奈dropbox空间太小,而我的zotero早已超过10G,因为电子书也放在里面,只能做罢。
度娘出手还是很大方的,我现在的网盘已经有3T,包括之前在推广阶段用1元买的1T。
After two weeks developed, I have added/updated some plot functions in ChIPseeker (version >=1.0.1).
> files=getSampleFiles()
> peak=readPeakFile(files[[4]])
> peak
GRanges object with 1331 ranges and 2 metadata columns:
seqnames ranges strand | V4 V5
|
[1] chr1 [ 815092, 817883] * | MACS_peak_1 295.76
[2] chr1 [1243287, 1244338] * | MACS_peak_2 63.19
[3] chr1 [2979976, 2981228] * | MACS_peak_3 100.16
[4] chr1 [3566181, 3567876] * | MACS_peak_4 558.89
[5] chr1 [3816545, 3818111] * | MACS_peak_5 57.57
... ... ... ... ... ... ...
[1327] chrX [135244782, 135245821] * | MACS_peak_1327 55.54
[1328] chrX [139171963, 139173506] * | MACS_peak_1328 270.19
[1329] chrX [139583953, 139586126] * | MACS_peak_1329 918.73
[1330] chrX [139592001, 139593238] * | MACS_peak_1330 210.88
[1331] chrY [ 13845133, 13845777] * | MACS_peak_1331 58.39
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY
NA NA NA NA NA NA ... NA NA NA NA NA NA
> covplot(peak, weightCol="V5")
ChIPpeakAnno WAS the only R package for ChIP peak annotation. I used it for annotating peak in my recent study.
I found it does not consider the strand information of genes. I reported the bug to the authors, but they are reluctant to change.
So I decided to develop my own package, ChIPseeker, and it’s now available in Bioconductor.
生物坑很多人画图只会直方图,统计只会T检验,在暨大见过太多的学生连T检验都不会,分不清SEM和SD的差别,也不清楚T检验那几个简单参数的含义。我写统计笔记也是因为不想重复性地跟学生讲解T检验。
Barplot和T test一样普遍而流行,barplot适合于表示计数数据和比例,显示比例也可以用pie plot,但直方图比饼图要好,因为人类的眼睛适合于比较高度,而不是弧度。
多半时候生物学数据并非简单的计数数据,对于测量数据,在展示数据分布时,很多人会使用他们熟悉的barplot,用高度来表示mean,然后再加上errorbar,这样展示数据,信息量是非常低的,使用boxplot能够提供更多的数据分布信息,能更好地展现数据,但可能很多人只会在excel里画barplot,Nature Methods 2013年的文章中有100个barplot图,而只有20个boxplot图,从这里就可以看出来,用boxplot的人远远没有barplot多,于是NPG怒了,写了两篇专栏文章Points of View: Bar charts and box plots和Points of Significance: Visualizing samples with box plots并且发表了一篇BoxPlotR: a web tool for generation of box plots方便大家画boxplot,如此简单的web tool能够发Nature Methods,实在是让人羡慕妒忌恨啊。
I used R package ChIPpeakAnno for annotating peaks, and found that it handle the DNA strand in the wrong way. Maybe the developers were from the computer science but not biology background.
> require(ChIPpeakAnno)
> packageVersion("ChIPpeakAnno")
[1] '2.10.0'
> peak <- RangedData(space="chr1", IRanges(24736757, 24737528))
> data(TSS.human.GRCh37)
> ap <- annotatePeakInBatch(peak, Annotation=TSS.human.GRCh37)
> ap
RangedData with 1 row and 9 value columns across 1 space
space ranges | peak strand
|
1 ENSG00000001461 1 [24736757, 24737528] | 1 +
feature start_position end_position insideFeature
1 ENSG00000001461 ENSG00000001461 24742284 24799466 upstream
distancetoFeature shortestDistance fromOverlappingOrNearest
1 ENSG00000001461 -5527 4756 NearestStart