This is an introductory guide to integration analysis between exposome and omics data with R package omicRexposome. The document illustrates two types of analysis: 1) Association analysis, that are performed between exposome and a single omic data-set; and 2) Integration analysis where multiple data-sets, including exposome data, are analysed at the same time.
omicRexposome 1.28.0
omicRexposome
is an R package designed to work join with rexposome
. The aim of omicRexposome
is to perform analysis joining exposome and omic data with the goal to find the relationship between a single or set of exposures (external exposome) and the behavior of a gene, a group of CpGs, the level of a protein, etc. Also to provide a series of tools to analyse exposome and omic data using standard methods from Biocondcutor.
omicRexposome
is currently in development and not available from CRAN nor Bioconductor. Anyway, the package can be installed by using devtools R package and taking the source from Bioinformatic Research Group in Epidemiology’s GitHub repository.
This can be done by opening an R session and typing the following code:
devtools::install_github("isglobal-brge/omicRexposome")
User must take into account that this sentence do not install the packages’ dependencies.
Two different types of analyses can be done with omicRexposome
:
Analysis | omicRexposome function |
---|---|
Association Study | association |
Integration Study | crossomics |
Both association and integration studies are based in objects of class MultiDataSet
. A MultiDataSet
object is a contained for multiple layers of sample information. Once the exposome data and the omics data are encapsulated in a MultiDataSet
the object can be used for both association and integration studies.
The method association
requires a MultiDataSet
object having to types of information: the exposome data from an ExposomeSet
object and omic information from objects of class ExpressionSet
, MethylationSet
, SummarizedExperiment
or others. ExposomeSet
objects are created with functions read_exposome
and load_exposome
from rexposome
R package (see next section Loading Exposome Data) and encapsulates exposome data. The method crossomics
expects a MultiDataSet
with any number of different data-sets (at last two). Compared with association
method, crossomics
do not requires an ExposomeSet
.
In order to illustrate the capabilities of omicRexposome
and the exposome-omic analysis pipeline, we will use the data from BRGdata
package. This package includes different omic-sets including methylation, transcriptome and proteome data-sets and an exposome-data set.
omicRexposome
and MultiDataSet
R packages are loaded using the standard library command:
library(omicRexposome)
library(MultiDataSet)
The association studies are performed using the method association
. This method requires, at last four, augments:
object
should be filled with a MultiDataSet
object.formula
should be filled with an expression containing the covariates used to adjust the model.expset
should be filled with the name that the exposome-set receives in the MultiDataSet
object.omicset
should be filled with the name that the omic-set receives in the MultiDataSet
object.The argument formula
should follow the pattern: ~sex+age
. The method association
will fill the formula placing the exposures in the ExposomeSet
m between ~
and the covariates sex+age
.
association
implements the limma
pipeline using lmFit
and eBayes
in the extraction methods from MultiDataSet
. The method takes care of the missing data in exposures, outcomes and omics data and locating and is subsets both data-sets, exposome data and omic data, by common samples. The argument method
allows to select the fitting method used in lmFit
. By default it takes the value "ls"
for least squares but it can also takes "robust"
for robust regression.
The following subsections illustrates the usage of association
with different types of omics data: methylome, transcriptome and proteome.
First we get the exposome data from BRGdata
package that we will use in the whole section.
data("brge_expo", package = "brgedata")
class(brge_expo)
## [1] "ExposomeSet"
## attr(,"package")
## [1] "rexposome"
The aim of this analysis is to perform an association test between the gene expression levels and the exposures. So the first point is to obtain the transcriptome data from the brgedata
package.
data("brge_gexp", package = "brgedata")
The association studies between exposures and transcriptome are done in the same way that the ones with methylome. The method used is association
, that takes as input an object of MultiDataSet
class with both exposome and expression data.
mds <- createMultiDataSet()
mds <- add_genexp(mds, brge_gexp)
mds <- add_exp(mds, brge_expo)
gexp <- association(mds, formula=~Sex+Age,
expset = "exposures", omicset = "expression")
We can have a look to the number of hits and the lambda score of each analysis with the methods tableHits
and tableLambda
, seen in the previous section.
hit <- tableHits(gexp, th=0.001)
lab <- tableLambda(gexp)
merge(hit, lab, by="exposure")
## exposure hits lambda
## 1 BPA_p 19 0.9072377
## 2 BPA_t1 27 0.8807316
## 3 BPA_t3 56 0.9391129
## 4 Ben_p 19 0.8013466
## 5 Ben_t1 12 0.8234104
## 6 Ben_t2 9 0.8393350
## 7 Ben_t3 21 0.8301203
## 8 NO2_p 32 1.0281960
## 9 NO2_t1 16 0.7942881
## 10 NO2_t2 35 1.1482314
## 11 NO2_t3 31 0.8770931
## 12 PCB118 59 0.9308472
## 13 PCB138 38 1.0726221
## 14 PCB153 51 1.1743989
## 15 PCB180 17 0.9790750
Since most of all models have a lambda under one, we should consider use Surrogate Variable Analysis. This can be done using the same association
method but by setting the argument sva
to "fast"
so the pipeline of isva
and SmartSVA
R packages is applied. If sva
is set to "slow"
the applied. pipeline is the one from sva
R package.
gexp <- association(mds, formula=~Sex+Age,
expset = "exposures", omicset = "expression", sva = "fast")
We can re-check the results creating the same table than before:
hit <- tableHits(gexp, th=0.001)
lab <- tableLambda(gexp)
merge(hit, lab, by="exposure")
## exposure hits lambda
## 1 BPA_p 50 0.9874143
## 2 BPA_t1 51 0.9433845
## 3 BPA_t3 61 0.9842234
## 4 Ben_p 76 1.0117743
## 5 Ben_t1 64 1.0115556
## 6 Ben_t2 71 1.0089886
## 7 Ben_t3 59 0.9968889
## 8 NO2_p 78 1.0116970
## 9 NO2_t1 67 1.0056894
## 10 NO2_t2 69 1.0209991
## 11 NO2_t3 49 0.9802463
## 12 PCB118 129 1.0518532
## 13 PCB138 67 1.0094398
## 14 PCB153 58 0.9925023
## 15 PCB180 67 0.9974135
The objects of class ResultSet
have a method called plotAssociation
that allows to create QQ Plots (that are another useful way to see if there are some inflation/deflation in the P-Values).
gridExtra::grid.arrange(
plotAssociation(gexp, rid="Ben_p", type="qq") +
ggplot2::ggtitle("Transcriptome - Pb Association"),
plotAssociation(gexp, rid="BPA_p", type="qq") +
ggplot2::ggtitle("Transcriptome - THM Association"),
ncol=2
)
## Warning in ggplot2::geom_text(ggplot2::aes(x = -Inf, y = Inf, hjust = 0, : All aesthetics have length 1, but the data has 67528 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## All aesthetics have length 1, but the data has 67528 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
Following this line, the same method plotAssociation
can be used to create volcano plots.
gridExtra::grid.arrange(
plotAssociation(gexp, rid="Ben_p", type="volcano", tPV=-log10(1e-04)) +
ggplot2::ggtitle("Transcriptome - Pb Association"),
plotAssociation(gexp, rid="BPA_p", type="volcano", tPV=-log10(1e-04)) +
ggplot2::ggtitle("Transcriptome - THM Association"),
ncol=2
)
The proteome data-set included in brgedata
has 47 proteins for 90 samples.
data("brge_prot", package="brgedata")
brge_prot
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 47 features, 90 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: x0001 x0002 ... x0090 (90 total)
## varLabels: age sex
## varMetadata: labelDescription
## featureData
## featureNames: Adiponectin_ok Alpha1AntitrypsinAAT_ok ...
## VitaminDBindingProte_ok (47 total)
## fvarLabels: chr start end
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
The association analysis between exposures and proteome is also done using association
.
mds <- createMultiDataSet()
mds <- add_eset(mds, brge_prot, dataset.type ="proteome")
mds <- add_exp(mds, brge_expo)
prot <- association(mds, formula=~Sex+Age,
expset = "exposures", omicset = "proteome")
The tableHits
indicates that no association was found between the 47 proteins and the exposures.
tableHits(prot, th=0.001)
## exposure hits
## Ben_p Ben_p 0
## Ben_t1 Ben_t1 0
## Ben_t2 Ben_t2 0
## Ben_t3 Ben_t3 0
## BPA_p BPA_p 0
## BPA_t1 BPA_t1 0
## BPA_t3 BPA_t3 0
## NO2_p NO2_p 0
## NO2_t1 NO2_t1 1
## NO2_t2 NO2_t2 0
## NO2_t3 NO2_t3 0
## PCB118 PCB118 0
## PCB138 PCB138 0
## PCB153 PCB153 0
## PCB180 PCB180 0
This is also seen in the Manhattan plot for proteins that can be obtained from plotAssociation
.
gridExtra::grid.arrange(
plotAssociation(prot, rid="Ben_p", type="protein") +
ggplot2::ggtitle("Proteome - Cd Association") +
ggplot2::geom_hline(yintercept = 1, color = "LightPink"),
plotAssociation(prot, rid="NO2_p", type="protein") +
ggplot2::ggtitle("Proteome - Cotinine Association") +
ggplot2::geom_hline(yintercept = 1, color = "LightPink"),
ncol=2
)