ropls 1.36.0
The
ropls
R package implements the PCA, PLS(-DA) and OPLS(-DA)
approaches with the original, NIPALS-based, versions of the
algorithms (Wold, Sjostrom, and Eriksson 2001; Trygg and Wold 2002). It includes the R2 and Q2
quality metrics (Eriksson et al. 2001; Tenenhaus 1998), the permutation
diagnostics (Szymanska et al. 2012), the computation of the VIP values
(Wold, Sjostrom, and Eriksson 2001), the score and orthogonal distances to detect outliers
(Hubert, Rousseeuw, and Vanden Branden 2005), as well as many graphics (scores, loadings,
predictions, diagnostics, outliers, etc).
Partial Least-Squares (PLS), which is a latent variable regression method based on covariance between the predictors and the response, has been shown to efficiently handle datasets with multi-collinear predictors, as in the case of spectrometry measurements (Wold, Sjostrom, and Eriksson 2001). More recently, Trygg and Wold (2002) introduced the Orthogonal Partial Least-Squares (OPLS) algorithm to model separately the variations of the predictors correlated and orthogonal to the response.
OPLS has a similar predictive capacity compared to PLS and improves the interpretation of the predictive components and of the systematic variation (Pinto, Trygg, and Gottfries 2012). In particular, OPLS modeling of single responses only requires one predictive component.
Diagnostics such as the Q2Y metrics and permutation testing are of high importance to avoid overfitting and assess the statistical significance of the model. The Variable Importance in Projection (VIP), which reflects both the loading weights for each component and the variability of the response explained by this component (Pinto, Trygg, and Gottfries 2012; Mehmood et al. 2012), can be used for feature selection (Trygg and Wold 2002; Pinto, Trygg, and Gottfries 2012).
OPLS is available in the SIMCA-P commercial software (Umetrics, Umea, Sweden; Eriksson et al. (2001)). In addition, the kernel-based version of OPLS (Bylesjo et al. 2008) is available in the open-source R statistical environment (R Development Core Team 2008), and a single implementation of the linear algorithm in R has been described (Gaude et al. 2013).
The sacurine metabolomics dataset will be used as a case study to describe the features from the ropls pacakge.
The objective was to study the influence of age, body mass index (bmi), and gender on metabolite concentrations in urine, by analysing 183 samples from a cohort of adults with liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS; Thevenot et al. (2015)).
Urine samples were analyzed by using an LTQ Orbitrap in the negative ionization mode. A total of 109 metabolites were identified or annotated at the MSI level 1 or 2. After retention time alignment with XCMS, peaks were integrated with Quan Browser. Signal drift and batch effect were corrected, and each urine profile was normalized to the osmolality of the sample. Finally, the data were log10 transformed (Thevenot et al. 2015).
The volunteers’ age
, body mass index (bmi
), and gender
were recorded.
We first load the
ropls
package:
library(ropls)
We then load the sacurine
dataset which contains:
The dataMatrix
matrix of numeric type containing the intensity
profiles (log10 transformed),
The sampleMetadata
data frame containg sample metadata,
The variableMetadata
data frame containg variable metadata
data(sacurine)
names(sacurine)
## [1] "dataMatrix" "sampleMetadata" "variableMetadata" "se"
## [5] "eset"
We attach sacurine to the search path and display a summary of the
content of the dataMatrix, sampleMetadata and
variableMetadata with the view
method from the
ropls
package:
attach(sacurine)
view(dataMatrix)
## dim class mode typeof size NAs min mean median max
## 183 x 109 matrix numeric double 0.2 Mb 0 -0.3 4.2 4.3 6
## (2-methoxyethoxy)propanoic acid isomer (gamma)Glu-Leu/Ile ...
## HU_011 3.019766011 3.888479324 ...
## HU_014 3.81433889 4.277148905 ...
## ... ... ... ...
## HU_208 3.748127215 4.523763202 ...
## HU_209 4.208859398 4.675880567 ...
## Valerylglycine isomer 2 Xanthosine
## HU_011 3.889078716 4.075879575
## HU_014 4.181765852 4.195761901
## ... ... ...
## HU_208 4.634338821 4.487781609
## HU_209 4.47194762 4.222953354
view(sampleMetadata)
## age bmi gender
## numeric numeric factor
## nRow nCol size NAs
## 183 3 0 Mb 0
## age bmi gender
## HU_011 29 19.75 M
## HU_014 59 22.64 F
## ... ... ... ...
## HU_208 27 18.61 F
## HU_209 17.5 21.48 F
## 1 data.frame 'factor' column(s) converted to 'numeric' for plotting.