showCategory parameter for visualizing compareCluster output
I am using dotplot() to visualize results from enrichGO(), enrichDO(), enricher() and compareCluster() in clusterProfiler R package. When specifying showCategory, I get the right number of categories except with the results of compareCluser().
In my case, I use compareCluster() on a list of 3 elements:
str(ClusterList) List of 3 $ All : chr [1:1450] “89886” “29923” “100132891” “101410536” … $ g1 : chr [1:858] “89886” “29923” “100132891” “101410536” … $ g2: chr [1:592] “5325” “170691” “29953” “283392” …
CompareGO_BP=compareCluster(ClusterList, fun="enrichGO”, pvalueCutoff=0.01, pAdjustMethod="BH”, OrgDb=org.Hs.eg.db,ont="BP”,readable=T)
dotplot(CompareGO_BP, showCategory=10, title="GO - Biological Process”)
I ask for 10 categories, but I get 15 categories in All, 8 categories in g1 and 12 categories in g2. None of the categories, neither the sum of the categories are 10…
Is the option showCategory working in the case of comparison? Am I missing something here?
And which categories precisely will it plot? the most significant whatever my 3 cases or the most significant of each case?
The question was posted in Bioconductor support site. It seems quite confusing and I think I need to write a post to clarify it.
In this case, we expect to get
<= 10 of most significant categories for each
Cluster as the user specify
showCategory=10. This is exactly the way of
barplot methods do for
8 categories for
g1 is obvious that there are only
8 enriched terms found for
But why there are
barplot methods implemented in
clusterProfiler try to make the comparison among clusters more informative and reasonable. After extracting e.g. 10 categories for each cluster,
clusterProfiler try to collect overlap of these categories among clusters.
term A is enriched in all the clusters (
g2) and is in the 10 most significant categories for
g1 but not
clusterProfiler try to capture these information and include
term A in
g2 clusters and making the comparison in
barplot more reasonable.
If users want to ignore these information, they can use
dotplot(..., includeAll=FALSE), which is actually not recommended.
See the difference to the following example:
library(clusterProfiler) data(gcSample) x=compareCluster(gcSample, fun='enrichDO') dotplot(x, showCategory=5, includeAll=FALSE) dotplot(x, showCategory=5)
In the first figure, it seems that all these clusters are very different, with no overlapping terms enriched. This is not true as demonstrated in the second figure.
Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.