ebrowser {EnrichmentBrowser} | R Documentation |
This is the all-in-one wrapper function to perform the standard enrichment analysis pipeline implemented in the EnrichmentBrowser package.
Given flat gene expression data, the data is read in and subsequently subjected to chosen enrichment analysis methods.
The results from different methods can be combined and investigated in detail in the default browser.
ebrowser( meth, exprs, pdat, fdat, org, data.type = c(NA, "ma", "rseq"), norm.method = "quantile", de.method = "limma", gs, grn = NULL, perm = 1000, alpha = 0.05, beta = 1, comb = FALSE, browse = TRUE, nr.show = -1 )
meth |
Enrichment analysis method.
See |
exprs |
Expression matrix.
A tab separated text file containing *normalized* expression values
on a *log* scale.
Columns = samples/subjects; rows = features/probes/genes;
NO headers, row or column names.
Supported data types are log2 counts (microarray single-channel),
log2 ratios (microarray two-color), and log2-counts per million
(RNA-seq logCPMs).
See limma's user guide for definition and normalization of the
different data types.
Alternatively, this can be a |
pdat |
Phenotype data.
A tab separated text file containing annotation information
for the samples in either *two or three* columns.
NO headers, row or column names.
The number of rows/samples in this file should match the number of
columns/samples of the expression matrix.
The 1st column is reserved for the sample IDs;
The 2nd column is reserved for a *BINARY* group assignment.
Use '0' and '1' for unaffected (controls) and affected (cases) sample
class, respectively.
For paired samples or sample blocks a third column is expected that
defines the blocks.
If 'exprs' is a |
fdat |
Feature data.
A tab separated text file containing annotation information
for the features.
Exactly *TWO* columns; 1st col = feature IDs;
2nd col = corresponding KEGG gene ID
for each feature ID in 1st col; NO headers, row or column names.
The number of rows/features in this file should match the number of
rows/features of the expression matrix.
If 'exprs' is a |
org |
Organism under investigation in KEGG three letter code,
e.g. ‘hsa’ for ‘Homo sapiens’.
See also |
data.type |
Expression data type. Use 'ma' for microarray and 'rseq' for RNA-seq data. If NA, data.type is automatically guessed. If the expression values in 'eset' are decimal numbers they are assumed to be microarray intensities. Whole numbers are assumed to be RNA-seq read counts. Defaults to NA. |
norm.method |
Determines whether and how the expression data should be normalized.
For available microarray normalization methods see the man page of the
limma function |
de.method |
Determines which method is used for per-gene differential expression
analysis. See the man page of |
gs |
Gene sets. Either a list of gene sets (vectors of KEGG gene IDs) or a text file in GMT format storing all gene sets under investigation. |
grn |
Gene regulatory network. Either an absolute file path to a tabular file or a character matrix with exactly *THREE* cols; 1st col = IDs of regulating genes; 2nd col = corresponding regulated genes; 3rd col = regulation effect; Use '+' and '-' for activation/inhibition. |
perm |
Number of permutations of the expression matrix to estimate the null distribution. Defaults to 1000. Can also be an integer vector matching the length of 'meth' to assign different numbers of permutations for different methods. |
alpha |
Statistical significance level. Defaults to 0.05. |
beta |
Log2 fold change significance level. Defaults to 1 (2-fold). |
comb |
Logical. Should results be combined if more then one enrichment method is selected? Defaults to FALSE. |
browse |
Logical. Should results be displayed in the browser for interactive exploration? Defaults to TRUE. |
nr.show |
Number of gene sets to show. As default all statistical significant gene sets are displayed. Note that this only influences the number of gene sets for which additional visualization will be provided (typically only of interest for the top / signifcant gene sets). Selected enrichment methods and resulting flat gene set rankings still include the complete number of gene sets under study. |
None, opens the browser to explore results.
Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
Limma User's guide: http://www.bioconductor.org/packages/limma
read.eset
to read expression data from file;
probe.2.gene.eset
to transform probe to gene level expression;
kegg.species.code
maps species name to KEGG code.
get.kegg.genesets
to retrieve gene set definitions from KEGG;
compile.grn.from.kegg
to construct a GRN from KEGG pathways;
sbea
to perform set-based enrichment analysis;
nbea
to perform network-based enrichment analysis;
comb.ea.results
to combine results from different methods;
ea.browse
for exploration of resulting gene sets
# expression data from file exprs.file <- system.file("extdata/exprs.tab", package="EnrichmentBrowser") pdat.file <- system.file("extdata/colData.tab", package="EnrichmentBrowser") fdat.file <- system.file("extdata/rowData.tab", package="EnrichmentBrowser") # getting all human KEGG gene sets # hsa.gs <- get.kegg.genesets("hsa") gs.file <- system.file("extdata/hsa_kegg_gs.gmt", package="EnrichmentBrowser") hsa.gs <- parse.genesets.from.GMT(gs.file) # set-based enrichment analysis ebrowser( meth="ora", exprs=exprs.file, pdat=pdat.file, fdat=fdat.file, gs=hsa.gs, org="hsa", nr.show=3) # compile a gene regulatory network from KEGG pathways # hsa.grn <- compile.grn.from.kegg("hsa") pwys <- system.file("extdata/hsa_kegg_pwys.zip", package="EnrichmentBrowser") hsa.grn <- compile.grn.from.kegg(pwys) # network-based enrichment analysis ebrowser( meth="ggea", exprs=exprs.file, pdat=pdat.file, fdat=fdat.file, gs=hsa.gs, grn=hsa.grn, org="hsa", nr.show=3 ) # combining results ebrowser( meth=c("ora", "ggea"), comb=TRUE, exprs=exprs.file, pdat=pdat.file, fdat=fdat.file, gs=hsa.gs, grn=hsa.grn, org="hsa", nr.show=3 )