听说你有RNAseq数据却不知道怎么跑GSEA
Dear GuangChuangyu,
I’m trying to use the clusterProfiler package for GSE analysis on DGE data obtained from RNAseq. While I can run enrichKEGG, I’m unable to run gseKEGG basically because I don’t know how to obtain an order ranked gene list.
I work on R. I have a dataframe or matrix with gene names, log2 fold change values, pvalues and adjusted pvalues among others.
How can I get the order ranked gene list to feed in gseKEGG?
Moreover what is the more reliable way to obtain functional insight about each sample? enrichKEGG or gseKEGG?
Thank you in advance for your help.
best regards
bruno saubaméa
今天收到一封来自Université Paris Descartes的求助信,这个问题我被问过好多次了,显然很多新手都有这问题,根本不知道该怎么跑GSEA,搞不清GSEA的输入是什么。
GSEA输入的geneList
,有三个特性:
+ 数值型向量,可以是fold change或者其它的数值型变量都可以
+ 命名,每一个数字都有一个对应的名字,就是相应的基因ID了
+ 排序,数字是从高到低排序的
假设你的数据是一个csv
文件,这个文件至少应该有两个columns,一个是基因ID(不允许有duplication),第二个是相应的表达量或fold change之类的数字型变量。
那么你的geneList
可以这样搞出来:
d = read.csv(your_csv_file)
## assume 1st column is ID
## 2nd column is FC
## feature 1: numeric vector
geneList = d[,2]
## feature 2: named vector
names(geneList) = as.character(d[,1])
## feature 3: decreasing order
geneList = sort(geneList, decreasing = TRUE)
有了geneList
,就可以愉快地用clusterProfiler进行GSEA分析了。