shinyDSP 0.99.13
The purpose of this vignette is to look under the hood and explain what a few underlying functions do to make this app work.
The human kidney data from Nanostring is loaded from ExperimentHub()
. Two files are required: a count matrix and matching annotation file. Let’s read those files.
library(standR)
library(SummarizedExperiment)
library(ExperimentHub)
library(readr)
library(dplyr)
library(stats)
eh <- ExperimentHub()
AnnotationHub::query(eh, "standR")
## ExperimentHub with 3 records
## # snapshotDate(): 2025-04-03
## # $dataprovider: Nanostring
## # $species: NA
## # $rdataclass: data.frame
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH7364"]]'
##
## title
## EH7364 | GeomxDKDdata_count
## EH7365 | GeomxDKDdata_sampleAnno
## EH7366 | GeomxDKDdata_featureAnno
countFilePath <- eh[["EH7364"]]
sampleAnnoFilePath <- eh[["EH7365"]]
countFile <- readr::read_delim(unname(countFilePath), na = character())
sampleAnnoFile <- readr::read_delim(unname(sampleAnnoFilePath), na = character())
A key step in analysis is to select a biological group(s) of interest. Let’s inspect sampleAnnoFile
.
colnames(sampleAnnoFile)
## [1] "SlideName" "ScanName" "ROILabel"
## [4] "SegmentLabel" "SegmentDisplayName" "Sample_ID"
## [7] "AOISurfaceArea" "AOINucleiCount" "ROICoordinateX"
## [10] "ROICoordinateY" "RawReads" "TrimmedReads"
## [13] "StitchedReads" "AlignedReads" "DeduplicatedReads"
## [16] "SequencingSaturation" "UMIQ30" "RTSQ30"
## [19] "disease_status" "pathology" "region"
## [22] "LOQ" "NormalizationFactor" "RoiReportX"
## [25] "RoiReportY"
sampleAnnoFile$disease_status %>% unique()
## [1] "DKD" "normal"
sampleAnnoFile$region %>% unique()
## [1] "glomerulus" "tubule"
“disease_status” and “region” look like interesting variables. For example, we might be interested in comparing “normal_glomerulus” to “DKD_glomerulus”. The app can create a new sampleAnnoFile
where two or more variables of interest can be combined into one grouped variable (under “Variable(s) of interest” in the main side bar
).
new_sampleAnnoFile <- sampleAnnoFile %>%
tidyr::unite(
"disease_region", # newly created grouped variable
c("disease_status","region"), # variables to combine
sep = "_"
)
new_sampleAnnoFile$disease_region %>% unique()
## [1] "DKD_glomerulus" "DKD_tubule" "normal_tubule"
## [4] "normal_glomerulus"
Finally a spatial experiment object can be created.
spe <- standR::readGeoMx(as.data.frame(countFile),
as.data.frame(new_sampleAnnoFile))
We merged “disease_status” and “region” to create a new group called, “disease_region”. The app allows users to pick any subset of groups of interest. In this example, the four possible groups are “DKD_glomerulus”, “DKD_tubule”,“normal_tubule”, and “normal_glomerulus”. The following script can subset spe
to keep certain groups.
selectedTypes <- c("DKD_glomerulus", "DKD_tubule", "normal_tubule", "normal_glomerulus")
toKeep <- colData(spe) %>%
tibble::as_tibble() %>%
pull(disease_region)
spe <- spe[, grepl(paste(selectedTypes, collapse = "|"), toKeep)]
Regions of interest(ROIs) can be filtered out based on any quantitative variable in colData(spe)
(same as colnames(new_sampleAnnoFile)
). These options can be found under the “QC” nav panel
’s side bar
. Let’s say I want to keep ROIs whose “SequencingSaturation” is at least 85.
filter <- lapply("SequencingSaturation", function(column) {
cutoff_value <- 85
if(!is.na(cutoff_value)) {
return(colData(spe)[[column]] > cutoff_value)
} else {
return(NULL)
}
})
filtered_spe <- spe[,unlist(filter)]
colData(spe) %>% dim()
## [1] 231 24
colData(filtered_spe) %>% dim()
## [1] 217 24
We can see that 14 ROIs have been filtered out.
Two PCA plots (colour-coded by biological groups or batch) for three normalization schemes are automatically created in the “PCA” nav panel
. Let’s take Q3 normalization as an example.
speQ3 <- standR::geomxNorm(filtered_spe, method = "upperquartile")
speQ3 <- scater::runPCA(filtered_spe)
speQ3_compute <- SingleCellExperiment::reducedDim(speQ3, "PCA")
Then, .PCAFunction()
is called to generate the plot.
.PCAFunction(speQ3, speQ3_compute, "disease_region", selectedTypes, c(21,22,23,24), c("blue","blue","black","black"))
Let’s proceed with Q3 normalization. A design matrix is created next.
design <- model.matrix(~0+disease_region, data = colData(speQ3))
# Clean up column name
colnames(design) <- gsub("disease_region", "", colnames(design))
If confounding variables are chosen in the main side bar
, those would be added to model.matrix
as ~0 + disease_region + confounder1 + confounder2)
.
edgeR
is used to convert SpatialExperiment
into DGEList
, filter and estimate dispersion.
library(edgeR)
dge <- SE2DGEList(speQ3)
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes = FALSE]
dge <- estimateDisp(dge, design = design, robust = TRUE)
Recall our “selectedTypes” from above: “DKD_glomerulus”, “DKD_tubule”, “normal_tubule”, “normal_glomerulus”.
The following code creates all pairwise comparisons between them.
# In case there are spaces
selectedTypes_underscore <- gsub(" ", "_", selectedTypes)
comparisons <- list()
comparisons <- lapply(
seq_len(choose(length(selectedTypes_underscore), 2)),
function(i) {
noquote(
paste0(
utils::combn(selectedTypes_underscore, 2,
simplify = FALSE
)[[i]][1],
"-",
utils::combn(selectedTypes_underscore, 2,
simplify = FALSE
)[[i]][2]
)
)
}
)
con <- makeContrasts(
# Must use as.character()
contrasts = as.character(unlist(comparisons)),
levels = colnames(design)
)
colnames(con) <- sub("-", "_vs_", colnames(con))
con
## Contrasts
## Levels DKD_glomerulus_vs_DKD_tubule
## DKD_glomerulus 1
## DKD_tubule -1
## normal_glomerulus 0
## normal_tubule 0
## Contrasts
## Levels DKD_glomerulus_vs_normal_tubule
## DKD_glomerulus 1
## DKD_tubule 0
## normal_glomerulus 0
## normal_tubule -1
## Contrasts
## Levels DKD_glomerulus_vs_normal_glomerulus
## DKD_glomerulus 1
## DKD_tubule 0
## normal_glomerulus -1
## normal_tubule 0
## Contrasts
## Levels DKD_tubule_vs_normal_tubule DKD_tubule_vs_normal_glomerulus
## DKD_glomerulus 0 0
## DKD_tubule 1 1
## normal_glomerulus 0 -1
## normal_tubule -1 0
## Contrasts
## Levels normal_tubule_vs_normal_glomerulus
## DKD_glomerulus 0
## DKD_tubule 0
## normal_glomerulus -1
## normal_tubule 1
limma
The app uses duplicateCorrelation()
“[s]ince we need to make comparisons both within and between subjects, it is necessary to treat Patient as a random effect” limma user’s guide (Ritchie et al. 2015). limma-voom
method is used as standR
package recommends (Liu et al. 2023).
library(limma)
block_by <- colData(speQ3)[["SlideName"]]
v <- voom(dge, design)
corfit <- duplicateCorrelation(v, design,
block = block_by
)
v2 <- voom(dge, design,
block = block_by,
correlation = corfit$consensus
)
corfit2 <- duplicateCorrelation(v, design,
block = block_by
)
fit <- lmFit(v, design,
block = block_by,
correlation = corfit2$consensus
)
fit_contrast <- contrasts.fit(fit,
contrasts = con
)
efit <- eBayes(fit_contrast, robust = TRUE)
For each contrast (a column in con
), the app creates a table of top differentially expressed genes sorted by their adjusted P value.
# Keep track of how many comparisons there are
numeric_vector <- seq_len(ncol(con))
new_list <- as.list(numeric_vector)
# This adds n+1th index to new_list where n = ncol(con)
# This index contains seq_len(ncol(con))
# ex. new_list[[7]] = 1 2 3 4 5 6
# This coefficient allows ANOVA-like comparison in toptable()
if (length(selectedTypes) > 2) {
new_list[[length(new_list) + 1]] <- numeric_vector
}
topTabDF <- lapply(new_list, function(i) {
limma::topTable(efit,
coef = i, number = Inf, p.value = 0.05,
adjust.method = "BH", lfc = 1
) %>%
tibble::rownames_to_column(var = "Gene")
})
# Adds names to topTabDF
if (length(selectedTypes) > 2) {
names(topTabDF) <- c(
colnames(con),
colnames(con) %>% stringr::str_split(., "_vs_") %>%
unlist() %>% unique() %>% paste(., collapse = "_vs_")
)
} else {
names(topTabDF) <- colnames(con)
}
topTabDF
is now a list of tables where the list index corresponds to that of columns in con
.
colnames(con)
## [1] "DKD_glomerulus_vs_DKD_tubule" "DKD_glomerulus_vs_normal_tubule"
## [3] "DKD_glomerulus_vs_normal_glomerulus" "DKD_tubule_vs_normal_tubule"
## [5] "DKD_tubule_vs_normal_glomerulus" "normal_tubule_vs_normal_glomerulus"
head(topTabDF[[1]])
## Gene Type logFC AveExpr t P.Value adj.P.Val B
## 1 SPOCK1 gene 2.773094 7.073508 30.12186 1.201902e-89 2.159337e-85 193.6072
## 2 PLAT gene 3.953830 8.269033 34.17176 7.387340e-89 6.636048e-85 191.7033
## 3 ARMH4 gene 2.545175 7.168846 28.85332 1.067403e-85 6.392320e-82 184.6393
## 4 TIMP3 gene 2.898280 9.560524 29.27997 3.602942e-82 1.618262e-78 176.5764
## 5 FGF1 gene 3.486781 7.645384 30.66180 5.580483e-81 2.005179e-77 173.7881
## 6 PODXL gene 4.218256 8.531335 30.20394 5.207716e-79 1.559364e-75 169.3251
These tables are then shown to the user in the “Tables” nav panel
.
No further serious data processing is performed at this point. The numbers in topTabDF
is now parsed to create volcano plots and heatmaps in their respective nav panel
s.
Liu, Ning, Dharmesh D Bhuva, Ahmed Mohamed, Micah Bokelund, Arutha Kulasinghe, Chin Wee Tan, and Melissa J Davis. 2023. “standR: Spatial Transcriptomic Analysis for GeoMx DSP Data.” Nucleic Acids Research, November, gkad1026. https://doi.org/10.1093/nar/gkad1026.
Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.