1 Introduction

This vignette demonstrates the use plotCellTypeMDS(), plotCellTypePCA(), boxplotPCA() and calculateDiscriminantSpace(), which are designed to help assess and compare cell types between query and reference datasets using Multidimensional Scaling (MDS) and Principal Component Analysis (PCA), and Fisher Discriminant Analysis (FDA) respectively.

This vignette also demonstrates how to visualize gene expression data from single-cell RNA sequencing (scRNA-seq) experiments using two functions: plotGeneExpressionDimred() and plotMarkerExpression(). These functions allow researchers to explore gene expression patterns across various dimensions and compare expression distributions between different datasets or cell types.

Lastly this vignette will demonstrate how to visualize quality control (QC) scores and annotation scores, potentially identifying relationship between QC statistics (e.g., total library size or percentage of mitochondrial genes) and cell type annotation scores.

2 Preliminaries

In the context of the scDiagnostics package, this vignette illustrates how to evaluate cell type annotations using two distinct datasets:

  • reference_data: This dataset consists of meticulously curated cell type annotations assigned by experts, providing a robust benchmark for assessing the quality of annotations. It serves as a standard reference to evaluate and detect anomalies or inconsistencies within the cell type annotations.

  • query_data: This dataset includes annotations both from expert assessments and those generated by the SingleR package. By comparing these annotations, you can identify discrepancies between manual and automated results, thereby pinpointing potential inconsistencies or areas that may need further scrutiny.

  • qc_data: This dataset includes data from haematopoietic stem and progenitor cells, including QC metrics like library size and mitochondrial content, SingleR-predicted cell types and annotation scores indicating prediction confidence.

# Load library
library(scDiagnostics)

# Load datasets
data("reference_data")
data("query_data")
data("qc_data")

# Set seed for reproducibility
set.seed(0)

3 Visualization of Query vs. Reference Dataset

3.1 Plot Reference and Query Cell Types Using MDS

The plotCellTypeMDS() function creates a scatter plot using Multidimensional Scaling (MDS) to visualize the similarity between cell types in query and reference datasets. This function generates a MDS plot that colors cells according to their types based on a dissimilarity matrix calculated from a specified gene set.

# Generate the MDS scatter plot with cell type coloring
plotCellTypeMDS(query_data = query_data, 
                reference_data = reference_data, 
                cell_types = c("CD4", "CD8", "B_and_plasma", "Myeloid"),
                query_cell_type_col = "SingleR_annotation", 
                ref_cell_type_col = "expert_annotation")

3.2 Plot Principal Components for Different Cell Types

The plotCellTypePCA() function projects the query dataset onto the PCA space of the reference dataset. It then plots specified principal components for different cell types, allowing comparison of PCA results between query and reference datasets.

# Generate the MDS scatter plot with cell type coloring
plotCellTypePCA(query_data = query_data, 
                reference_data = reference_data,
                cell_types = c("CD4", "CD8", "B_and_plasma", "Myeloid"),
                query_cell_type_col = "SingleR_annotation", 
                ref_cell_type_col = "expert_annotation", 
                pc_subset = 1:3)