ChIPpeakAnno WAS the only R package for ChIP peak annotation. I used it for annotating peak in my recent study.

I found it does not consider the strand information of genes. I reported the bug to the authors, but they are reluctant to change.

So I decided to develop my own package, ChIPseeker, and it’s now available in Bioconductor.

> require(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> require(ChIPseeker)
> peakfile = getSampleFiles()[[4]]
> peakfile
[1] "/usr/local/Cellar/r/3.1.0/R.framework/Versions/3.1/Resources/library/ChIPseeker/extdata//GEO_sample_data/GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz"
> peakAnno <- annotatePeak(peakfile, tssRegion=c(-3000, 1000), TxDb=txdb, annoDb="org.Hs.eg.db")
>> loading peak file...              2014-04-13 12:03:45 PM 
>> preparing features information...         2014-04-13 12:03:45 PM 
>> identifying nearest features...       2014-04-13 12:03:45 PM 
>> calculating distance from peak to TSS...  2014-04-13 12:03:46 PM 
>> assigning genomic annotation...       2014-04-13 12:03:46 PM 
>> adding gene annotation...             2014-04-13 12:03:50 PM 
>> assigning chromosome lengths          2014-04-13 12:03:50 PM 
>> done...                   2014-04-13 12:03:50 PM 
> head(as.GRanges(peakAnno))
GRanges object with 6 ranges and 14 metadata columns:
      seqnames             ranges strand |          V4        V5
                      |     
  [1]     chr1 [ 815092,  817883]      * | MACS_peak_1    295.76
  [2]     chr1 [1243287, 1244338]      * | MACS_peak_2     63.19
  [3]     chr1 [2979976, 2981228]      * | MACS_peak_3    100.16
  [4]     chr1 [3566181, 3567876]      * | MACS_peak_4    558.89
  [5]     chr1 [3816545, 3818111]      * | MACS_peak_5     57.57
  [6]     chr1 [6304864, 6305704]      * | MACS_peak_6     54.57
             annotation  geneChr geneStart   geneEnd geneLength geneStrand
                    
  [1] Distal Intergenic     chr1    803451    812182       8732          -
  [2]  Promoter (<=1kb)     chr1   1243994   1247057       3064          +
  [3]  Promoter (<=1kb)     chr1   2976181   2980350       4170          -
  [4]  Promoter (<=1kb)     chr1   3547331   3566671      19341          -
  [5]  Promoter (<=1kb)     chr1   3816968   3832011      15044          +
  [6]            3  UTR     chr1   6304252   6305638       1387          +
           geneId transcriptId distanceToTSS         ENSEMBL      SYMBOL
                   
  [1]      284593   uc001abt.4         -5701 ENSG00000230368      FAM41C
  [2]      126789   uc001aed.3             0 ENSG00000169972       PUSL1
  [3]      440556   uc001aka.3             0 ENSG00000177133   LINC00982
  [4]       49856   uc001ako.3             0 ENSG00000116213      WRAP73
  [5]   100133612   uc001alg.3             0 ENSG00000236423   LINC01134
  [6]      390992   uc009vly.2          1452 ENSG00000173673        HES3
                                          GENENAME
                                       
  [1] family with sequence similarity 41, member C
  [2]              pseudouridylate synthase-like 1
  [3]   long intergenic non-protein coding RNA 982
  [4]      WD repeat containing, antisense to TP73
  [5]  long intergenic non-protein coding RNA 1134
  [6]       hes family bHLH transcription factor 3
  ---
  seqlengths:
        chr1     chr10     chr11     chr12 ...      chr8      chr9      chrX
   249250621 135534747 135006516 133851895 ... 146364022 141213431 155270560

annotatePeak function can accept peak file or GRanges object that contained peak information. The sample peakfile shown above is a sample output from MACS. All the information contained in peakfile will be kept in the output of annotatePeak.

The annotation column annotates genomic features of the peak, that is whether peak is located in Promoter, Exon, UTR, Intron or Intergenic. If the peak is annotated by Exon or Intron, more detail information will be provided. For instance, Exon (38885 exon 3 of 11) indicates that the peak is located at the 3th exon of gene 38885 (entrezgene ID) which contain 11 exons in total.

All the names of column that contains information of nearest gene annotated will start with gene in camelCaps style.

If parameter annoDb=“org.Hs.eg.db” is provided, extra columns of ENSEMBL, SYMBOL and GENENAME will be provided in the final output.

For more detail, please refer to the package vignette. Comments or suggestions are welcome.

——- below was added at April 30, 2014 ———–

After two weeks of ChIPseeker release in Bioconductor, I am very glad to receive an email from Cambridge Systems Biology Centre that they are using ChIPseeker in annotating Drosophila ChIP data.

Although I only illustrate examples in human, ChIPseeker do supports all species only if genomic information was available. :mrgreen: :mrgreen:

> require(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
> require(ChIPseeker)
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
>
> chr <- c("chrX", "chrX", "chrX")
> start <- c(71349, 77124, 86364)
> end <- c(72835, 77836, 88378)
> peak <- GRanges(seqnames = chr, ranges = IRanges(start, end))
> peakAnno <- annotatePeak(peak,
+                          tssRegion=c(-3000,100),
+                          TxDb = txdb,
+                          annoDb="org.Dm.eg.db")
>> preparing features information...         2014-04-30 21:28:27
>> identifying nearest features...       2014-04-30 21:28:30
>> calculating distance from peak to TSS...  2014-04-30 21:28:31
>> assigning genomic annotation...       2014-04-30 21:28:31
>> adding gene annotation...             2014-04-30 21:28:42
Loading required package: org.Dm.eg.db
Loading required package: DBI

>> assigning chromosome lengths           2014-04-30 21:28:43
>> done...                   2014-04-30 21:28:43
> as.GRanges(peakAnno)
GRanges object with 3 ranges and 12 metadata columns:
      seqnames         ranges strand |
                  |
  [1]     chrX [71349, 72835]      * |
  [2]     chrX [77124, 77836]      * |
  [3]     chrX [86364, 88378]      * |
                                           annotation  geneChr geneStart
                                            
  [1]                                Promoter (<=1kb)     chrX     72458
  [2] Intron (FBtr0100135/FBgn0029128, intron 1 of 7)     chrX     72458
  [3] Intron (FBtr0100135/FBgn0029128, intron 2 of 7)     chrX     72458
        geneEnd geneLength geneStrand      geneId transcriptId distanceToTSS
                   
  [1]     96056      23599          + FBgn0029128  FBtr0100135             0
  [2]     96056      23599          + FBgn0029128  FBtr0100135          5378
  [3]     96056      23599          + FBgn0029128  FBtr0100135         15920
         ENTREZID      SYMBOL    GENENAME
        
  [1]     3354987         tyn     trynity
  [2]     3354987         tyn     trynity
  [3]     3354987         tyn     trynity
  ---
  seqlengths:
       chrX
   22422827
>

Citation

Yu G, Wang LG and He QY*. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 2015, 31(14):2382-2383.