1 Introduction
2 Exploratory analysis
3 Image segmentation
- 3.1 Spatial shrunken centroids clustering
- 3.2 Spatial Dirichlet Gaussian mixture modeling
4 Classification and cross-validation
- 4.1 Projection to latent structures (PLS)
- 4.2 Spatial shrunken centroids classification
5 Class comparison
- 5.1 Sample-based means testing
- 5.2 Segment-based means testing
6 Session information

1 Introduction

Cardinal 3 provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.

Before statistical analysis, it is important to identify the statistical goal of the experiment:

Unsupervised analysis. The data has no class labels or conditions, and we are interested in exploratory analysis to discover regions of interest in the data.
Supervised analysis. The data has class labels and we want to train a statistical or machine learning model to predict the class labels of new data.
Class comparison. The data has class labels or conditions, and we want to test whether the abundance of the mass features is different between conditions.

CardinalWorkflows provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.

2 Exploratory analysis

Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.

set.seed(2020)
mse <- simulateImage(preset=2, dim=c(32,32), sdnoise=0.5,
    peakheight=c(2,4), representation="centroid")

mse$design <- makeFactor(circle=mse$circle,
    square=mse$square, bg=!(mse$circle | mse$square))

image(mse, "design")

image(mse, i=c(5, 13, 21), layout=c(1,3))

2.1 Principal components analysis (PCA)

Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.

Use PCA() to perform PCA on a MSImagingExperiment.

pca <- PCA(mse, ncomp=3)
pca

## SpatialPCA on 30 variables and 1024 observations
## names(5): sdev, rotation, center, scale, x
## coord(2): x = 1...32, y = 1...32
## runNames(1): run0
## modelData(): Principal components (k=3)
## 
## Standard deviations (1, .., k=3):
##       PC1      PC2      PC3
##  7.497502 4.390060 0.792943
## 
## Rotation (n x k) = (30 x 3):
##               PC1          PC2          PC3
## [1,]  0.003714758 -0.063713680  0.130430317
## [2,]  0.005986094 -0.149225312  0.176614331
## [3,]  0.010723800 -0.211046155  0.123815086
## [4,]  0.014332766 -0.241875395  0.062812424
## [5,]  0.014096766 -0.302596953  0.108328509
## [6,]  0.013484325 -0.205415665  0.087055142
## ...           ...          ...          ...

We can see that the first 2 principal components explain most of the variation in the data.

image(pca, type="x", superpose=FALSE, layout=c(1,3), scale=TRUE)

The loadings of the components show how each feature contributes to each component.

plot(pca, type="rotation", superpose=FALSE, layout=c(1,3), linewidth=2)

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

plot(pca, type="x", groups=mse$design, linewidth=2)

2.2 Non-negative matrix factorization (NMF)

Non-negative matrix factorization is a popular alternative to PCA when the data is naturally non-negative. The main difference between PCA and NMF is that, for NMF, all of the loadings are required to be non-negative.

Use NMF() to perform NMF on a MSImagingExperiment.

nmf <- NMF(mse, ncomp=3)
nmf

## SpatialNMF on 30 variables and 1024 observations
## names(4): activation, x, iter, transpose
## coord(2): x = 1...32, y = 1...32
## runNames(1): run0
## modelData(): Non-negative matrix factorization (k=3)
## 
## Activation (n x k) = (30 x 3):
##              C1         C2         C3
## [1,]  0.0000000  0.1818780 58.8678807
## [2,]  0.0000000  0.8338271 71.9368731
## [3,]  0.0000000  1.5642950 55.7437560
## [4,]  0.0000000  1.9371212 47.9554669
## [5,]  0.0000000  2.3660891 62.2639494
## [6,]  0.0000000  1.5707428 50.1336302
## ...         ...        ...        ...

We can see that NMF can pick up the variation somewhat better when the data is non-negative, as is the case for mass spectra. As before, we still only need 2 components.

image(nmf, type="x", superpose=FALSE, layout=c(1,3), scale=TRUE)

As with PCA, the loadings of the NMF components show how each feature contributes to each component. The NMF components can be easier to interpret as they must be non-negative.

plot(nmf, type="activation", superpose=FALSE, layout=c(1,3), linewidth=2)

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

plot(nmf, type="x", groups=mse$design, linewidth=2)

2.3 Feature colocalization

Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.

Use colocalize() to find mass features that are colocalized with another image.

coloc <- colocalized(mse, mz=1116)
coloc

## DataFrame with 30 rows and 7 columns
##             i        mz       cor       MOC        M1        M2      Dice
##     <integer> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1          12   1115.98  1.000000  1.000000  1.000000  1.000000  1.000000
## 2          18   1227.94  0.945897  0.969254  0.967442  0.961947  0.878906
## 3          13   1133.57  0.944454  0.968846  0.965190  0.960511  0.869141
## 4          17   1206.48  0.935775  0.963161  0.953382  0.960074  0.865234
## 5          14   1135.93  0.934478  0.961983  0.956605  0.958773  0.867188
## ...       ...       ...       ...       ...       ...       ...       ...
## 26          5   788.863  0.392391  0.602264  0.867068  0.623138  0.642578
## 27          3   551.867  0.390606  0.604024  0.840873  0.639406  0.652344
## 28          2   473.921  0.335301  0.567899  0.779907  0.620612  0.625000
## 29          9  1049.186  0.325053  0.559666  0.765659  0.651090  0.638672
## 30          1   440.703  0.221729  0.483280  0.676830  0.598486  0.587891

By default, Pearson correlation is used to rank the colocalized features. Manders overlap coefficient (MOC), colocalization coefficients (M1 and M2), and Dice scores are also provided.

image(mse, mz=coloc$mz[1:3], layout=c(1,3))

3 Image segmentation

Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.

3.1 Spatial shrunken centroids clustering

Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.

A smoothing radius r, initial number of clusters k, and sparsity parameters s must be provided.

The larger the sparsity parameter s, the fewer mass features will contribute to the segmentation.

Spatial shrunken centroids may result in fewer clusters than the initial number of clusters k, so it is recommended to use a value for k that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.

set.seed(2020)
ssc <- spatialShrunkenCentroids(mse, r=1, k=3, s=c(0,6,12,18))
ssc

## ResultsList of length 4
## names(4): r=1,k=3,s=0 r=1,k=3,s=6 r=1,k=3,s=12 r=1,k=3,s=18
## model: SpatialShrunkenCentroids 
##              r k  s  weights clusters sparsity      AIC      BIC
## r=1,k=3,s=0  1 3  0 gaussian        3     0.00 248.3840 840.1606
## r=1,k=3,s=6  1 3  6 gaussian        3     0.10 225.0908 772.4842
## r=1,k=3,s=12 1 3 12 gaussian        3     0.31 190.3702 644.0656
## r=1,k=3,s=18 1 3 18 gaussian        3     0.46 167.5922 557.1785

Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.

image(ssc, i=2:4, type="probability", layout=c(1,3))

Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.

plot(ssc, i=2:4, type="statistic", layout=c(1,3),
    linewidth=2, annPeaks="circle")

Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.

Use topFeatures() to rank mass features by t-statistic.

ssc_top <- topFeatures(ssc[[4L]])
ssc_top

## DataFrame with 90 rows and 6 columns
##             i        mz       class statistic   centers        sd
##     <integer> <numeric> <character> <numeric> <numeric> <numeric>
## 1          22  1452.462           2   39.3260   5.45157  0.830981
## 2          23  1453.510           2   35.1185   4.90823  0.815796
## 3           7   980.953           3   28.0706   3.29758  0.607250
## 4          10  1056.822           3   27.7432   3.38048  0.643196
## 5          29  1881.854           2   25.7379   3.35161  0.665932
## ...       ...       ...         ...       ...       ...       ...
## 86         17   1206.48           1  -19.3573  0.937857  0.730484
## 87         14   1135.93           1  -22.0903  1.003166  0.822290
## 88         12   1115.98           1  -26.1672  0.966941  0.770708
## 89         18   1227.94           1  -30.3760  1.030219  0.824671
## 90         13   1133.57           1  -30.5304  1.104705  0.914357

ssc_top_cl3 <- subset(ssc_top, class==3)
image(mse, mz=ssc_top_cl3$mz[1:3], layout=c(1,3))

3.2 Spatial Dirichlet Gaussian mixture modeling

Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.

This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.

set.seed(2020)
dgmm <- spatialDGMM(mse, r=1, k=3, weights="adaptive")
dgmm

## SpatialDGMM on 30 variables and 1024 observations
## names(10): class, mu, sigma, ..., weights, r, k
## coord(2): x = 1...32, y = 1...32
## runNames(1): run0
## modelData(): Spatial Gaussian mixture models (k=3, n=30)
## 
## Parameter estimates:
## $mu
## , , 1 
##               1          2          3
## run0 1.80940992 0.73728403 0.08320872
## , , ... 
## 
## $sigma
## , , 1 
##              1         2         3
## run0 0.5549472 0.2345581 0.1010398
## , , ...

A different segmentation is fit for each mass feature.

image(dgmm, i=c(5, 13, 21), layout=c(1,3))

Each image is modeled as a mixture of Gaussian distributions.

plot(dgmm, i=c(5, 13, 21), layout=c(1,3), linewidth=2)

Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.

When applied to a SpatialDGMM object, colocalize() is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.

coloc2 <- colocalized(dgmm, mse$square)
coloc2

## DataFrame with 30 rows and 6 columns
##             i        mz       MOC        M1        M2      Dice
##     <integer> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1          21   1440.55  0.904051  1.000000  0.817308  0.899471
## 2          29   1881.85  0.893352  1.000000  0.798077  0.887701
## 3          28   1858.90  0.890657  1.000000  0.793269  0.884718
## 4          26   1618.66  0.888018  0.988095  0.798077  0.882979
## 5          27   1828.40  0.882779  0.976471  0.798077  0.878307
## ...       ...       ...       ...       ...       ...       ...
## 26          2   473.921  0.469387  0.254597  0.865385  0.393443
## 27          4   781.237  0.468692  0.252441  0.870192  0.391351
## 28          8  1023.708  0.466487  0.254286  0.855769  0.392070
## 29          1   440.703  0.446867  0.242898  0.822115  0.375000
## 30          9  1049.186  0.430754  0.238235  0.778846  0.364865

image(mse, mz=coloc2$mz[1:3], layout=c(1,3))

4 Classification and cross-validation

Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.

set.seed(2020)
mse2 <- simulateImage(preset=7, dim=c(32,32), sdnoise=0.3,
    nrun=3, peakdiff=2, representation="centroid")

mse2$class <- makeFactor(A=mse2$circleA, B=mse2$circleB)

image(mse2, "class", layout=c(1,3))

image(mse2, i=1, layout=c(1,3))

When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.

We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.

4.1 Projection to latent structures (PLS)

Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.

cv_pls <- crossValidate(PLS, x=mse2, y=mse2$class, ncomp=1:15, folds=run(mse2))

## Warning in pls_nipals(x, y = y, k = max(ncomp), center = center, scale. =
## scale, : NIPALS did not converge in 100 iterations for component 9

cv_pls

## SpatialCV on 30 variables and 3072 observations
## names(4): average, scores, folds, fitted.values
## coord(2): x = 1...32, y = 1...32
## runNames(3): run0, run1, run2
## modelData(): Cross validation with 3 folds
## 
## Average accuracy:
##          MacroRecall MacroPrecision
## ncomp=1    0.5125770            NaN
## ncomp=2    0.6974224      0.8233768
## ncomp=3    0.8419269      0.9166265
## ncomp=4    0.8496846      0.9201001
## ncomp=5    0.9245283      0.9484979
## ncomp=6    0.8993711      0.9377432
## ncomp=7    0.9025157      0.9389764
## ncomp=8    0.9056604      0.9402390
## ncomp=9    0.9067086      0.9406667
## ncomp=10   0.9119497      0.9428571
## ncomp=11   0.9140461      0.9437586
## ncomp=12   0.9140461      0.9437586
## ncomp=13   0.9140461      0.9437586
## ncomp=14   0.9140461      0.9437586
## ncomp=15   0.9140461      0.9437586

We can see that the accuracy maxes out after 11 PLS components.

pls <- PLS(mse2, y=mse2$class, ncomp=11)
pls

## SpatialPLS on 30 variables and 3072 observations
## names(16): coefficients, projection, residuals, ..., y.center, y.scale, algorithm
## coord(2): x = 1...32, y = 1...32
## runNames(3): run0, run1, run2
## modelData(): Partial least squares (k=11)
## 
## Covariances (1, .., k=11):
##          C1         C2         C3         C4         C5         C6 ...
##  80966.7467  8312.9372  5795.9393  7777.9091   568.1463  2261.7837 ...
## 
## Coefficients:
##          [,1]        [,2]        [,3]        [,4]        [,5]        [,6] ...
## A -0.20769417 -0.15218086  0.01513389 -0.07913578  0.07232778 -0.03078143 ...
## B -0.01609232  0.05232824  0.09390828  0.05638310  0.06500572  0.07204951 ...

We can plot the fitted response values to visualize the prediction.

image(pls, type="response", layout=c(1,3), scale=TRUE)

The PLS regression coefficients can be used to find influential features.

plot(pls, type="coefficients", linewidth=2, annPeaks="circle")

Like PCA or NMF, it can be useful to plot the PLS scores against each other to visualize the separation between classes.

plot(pls, type="scores", groups=mse2$class, linewidth=2)

Note that orthgonal PLS (O-PLS) is also available via method="opls" or by using the separate OPLS() method. Typically, both methods perform similarly, although O-PLS can sometimes produce more easily interpretable regression coefficients.

4.2 Spatial shrunken centroids classification

Spatially-aware nearest shrunken centroids classification is an extension of nearest shrunken centroids that incorporates spatial information into the model.

Like in the clustering case of spatial shrunken centroids, a smoothing radius r must be provided along with sparsity parameters s.

cv_ssc <- crossValidate(spatialShrunkenCentroids, x=mse2, y=mse2$class,
    r=2, s=c(0,3,6,9,12,15,18), folds=run(mse2))
cv_ssc

## SpatialCV on 30 variables and 3072 observations
## names(4): average, scores, folds, fitted.values
## coord(2): x = 1...32, y = 1...32
## runNames(3): run0, run1, run2
## modelData(): Cross validation with 3 folds
## 
## Average accuracy:
##          MacroRecall MacroPrecision
## r=2,s=0    0.6954971      0.6995690
## r=2,s=3    0.7010637      0.6998403
## r=2,s=6    0.7353019      0.7252008
## r=2,s=9    0.7838019      0.7726697
## r=2,s=12   0.7774291      0.7677136
## r=2,s=15   0.7218415      0.7222145
## r=2,s=18   0.6511533      0.6884699

We can see that in this case, the model with s=9 has the best cross-validated accuracy for the data. However, it does not perform as well as the PLS model.

ssc2 <- spatialShrunkenCentroids(mse2, y=mse2$class, r=2, s=9)
ssc2

## SpatialShrunkenCentroids on 30 variables and 3072 observations
## names(12): class, probability, scores, ..., transpose, weights, r
## coord(2): x = 1...32, y = 1...32
## runNames(3): run0, run1, run2
## modelData(): Nearest shrunken centroids (s=9.00) with 2 classes
## 
## Priors (1, .., k=2):
##          A         B
##  0.4767331 0.5232669
## 
## Statistics:
##              A         B
## [1,]  2.398037 30.739976
## [2,]  6.761464 37.485007
## [3,] 14.895771 45.388782
## [4,]  7.863448 38.670873
## [5,] 18.944757 46.169418
## [6,] 10.743564 41.063311
## ...        ...       ...

Again, we can plot the predicted class probabilities to visualize the prediction.

image(ssc2, type="probability", layout=c(1,3),
    subset=mse2$circleA | mse2$circleB)

Plotting t-statistics shows the peaks with lower m/z values have a higher abundance in class “B”.

plot(ssc2, type="statistic", linewidth=2, annPeaks="circle")

ssc2_top <- topFeatures(ssc2)

subset(ssc2_top, class == "B")

## DataFrame with 30 rows and 6 columns
##             i        mz       class statistic   centers        sd
##     <integer> <numeric> <character> <numeric> <numeric> <numeric>
## 1           5   788.863           B   46.1694   3.96014  0.862679
## 2           3   551.867           B   45.3888   3.17161  0.601005
## 3          10  1056.822           B   45.3650   3.23093  0.623250
## 4           8  1023.708           B   42.3114   3.88898  0.959834
## 5           6   849.003           B   41.0633   2.85568  0.596431
## ...       ...       ...         ...       ...       ...       ...
## 26         26   1618.66           B         0  1.404354  0.540876
## 27         27   1828.40           B         0  1.486190  0.564330
## 28         28   1858.90           B         0  1.202897  0.742032
## 29         29   1881.85           B         0  0.511865  0.340700
## 30         30   2105.50           B         0  2.014417  0.659438

5 Class comparison

Statistical hypothesis testing is used to determine whether the abundance of a feature is different between two or more conditions.

In order to account for additional factors like the effect of experimental runs, subject-to-subject variability, etc., this is often done most appropriately using linear models.

This example uses a simple experiment with two conditions “A” and “B”, with three replicates in each condition.

set.seed(2020)
mse3 <- simulateImage(preset=4, npeaks=10, dim=c(32,32), sdnoise=0.3,
    nrun=3, peakdiff=1, representation="centroid")

mse3$trt <- makeFactor(A=mse3$circleA, B=mse3$circleB)

image(mse3, "trt", layout=c(2,3))

image(mse3, i=1, layout=c(2,3))

We know from the design of the simulation that the first 5 (of 10) m/z values differ between the conditions.

featureData(mse3)

## MassDataFrame with 10 rows and 4 columns
##           mz   circleA   circleB      diff
##    <numeric> <numeric> <numeric> <logical>
## 1    473.921  0.970513   1.97051      TRUE
## 2    781.237  1.022086   2.02209      TRUE
## 3    788.863  0.837499   1.83750      TRUE
## 4   1023.708  0.851260   1.85126      TRUE
## 5   1135.933  1.219069   2.21907      TRUE
## 6   1200.465  1.487075   1.48707     FALSE
## 7   1227.938  1.077624   1.07762     FALSE
## 8   1361.268  1.058126   1.05813     FALSE
## 9   1453.510  0.942880   0.94288     FALSE
## 10  1858.899  1.015203   1.01520     FALSE
## mz(1): mz

5.1 Sample-based means testing

Use meansTest() to fit linear models with the most basic summarization. The samples must be specified with samples. Each sample is summarized by its mean, and then a linear model is fit to the summaries. In this case, each sample is a separate run.

Here, we specify condition as the sole fixed effect. Internally, the model will call either lm() or lme() depending on whether any random effects are provided.

mtest <- meansTest(mse3, ~ condition, samples=run(mse3))
mtest

## MeansTest of length 10
## model: lm 
##     i        mz                 fixed    statistic       pvalue
## 1   1  473.9206 intensity ~ condition  1.573820455 2.096531e-01
## 2   2  781.2367 intensity ~ condition  8.751387862 3.093665e-03
## 3   3  788.8633 intensity ~ condition  8.615081741 3.333908e-03
## 4   4 1023.7081 intensity ~ condition 15.349625903 8.933862e-05
## 5   5 1135.9335 intensity ~ condition  7.885528664 4.983191e-03
## 6   6 1200.4653 intensity ~ condition  0.627528828 4.282632e-01
## 7   7 1227.9380 intensity ~ condition  0.926973324 3.356506e-01
## 8   8 1361.2682 intensity ~ condition  0.757910715 3.839832e-01
## 9   9 1453.5096 intensity ~ condition  1.113398853 2.913443e-01
## 10 10 1858.8985 intensity ~ condition  0.005187237 9.425840e-01

By default, the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).

Box-and-whisker plots can be used to visualize the differences (if any) between the conditions.

plot(mtest, i=1:10, layout=c(2,5), ylab="Intensity", fill=TRUE)

Use topFeatures() to rank the results.

mtest_top <- topFeatures(mtest)

subset(mtest_top, fdr < 0.05)

## DataFrame with 4 rows and 5 columns
##           i        mz statistic      pvalue         fdr
##   <integer> <numeric> <numeric>   <numeric>   <numeric>
## 1         4  1023.708  15.34963 8.93386e-05 0.000893386
## 2         2   781.237   8.75139 3.09367e-03 0.011113028
## 3         3   788.863   8.61508 3.33391e-03 0.011113028
## 4         5  1135.933   7.88553 4.98319e-03 0.012457977

We find 4 significant features.

5.2 Segment-based means testing

Testing of SpatialDGMM objects is also supported by meansTest(). The key idea here is that spatial-DGMM segmentation captures within-sample heterogeneity, so testing between spatial-DGMM segments is more sensitive that simply summarizing a whole sample by its mean.

First, we must segment the data with spatialDGMM(), while making sure that each sample is segmented independently (by specifying the samples as groups).

dgmm2 <- spatialDGMM(mse3, r=2, k=3, groups=run(mse3))

Now use segmentationTest() to fit the models.

stest <- meansTest(dgmm2, ~ condition)

stest

## MeansTest of length 10
## model: lm 
##     i        mz                 fixed  statistic       pvalue
## 1   1  473.9206 intensity ~ condition  2.0695067 0.1502701229
## 2   2  781.2367 intensity ~ condition 12.4904117 0.0004090460
## 3   3  788.8633 intensity ~ condition 11.9296651 0.0005524711
## 4   4 1023.7081 intensity ~ condition  7.9452789 0.0048212988
## 5   5 1135.9335 intensity ~ condition  8.9061085 0.0028421830
## 6   6 1200.4653 intensity ~ condition  1.1878387 0.2757659600
## 7   7 1227.9380 intensity ~ condition  1.2993476 0.2543324348
## 8   8 1361.2682 intensity ~ condition  0.5614654 0.4536703943
## 9   9 1453.5096 intensity ~ condition  0.4079379 0.5230179955
## 10 10 1858.8985 intensity ~ condition  0.1233448 0.7254347196

As with meansTest(), the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).

Box-and-whisker plots can be used to visually compare the conditions.

plot(stest, i=1:10, layout=c(2,5), ylab="Intensity", fill=TRUE)