1 Installation

From Bioconductor:

if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("SingleCellAlleleExperiment")

2 Introduction to the workflow

2.1 Biological background and motivation

Immune molecules such as B and T cell receptors, human leukocyte antigens (HLAs) or killer Ig-like receptors (KIRs) are encoded in the genetically most diverse loci of the human genome. Many of these immune genes are hyperpolymorphic, showing high allelic diversity across human populations. In addition, typical immune molecules are polygenic, which means that multiple functionally similar genes encode the same protein subunit.

However, interactive single-cell methods commonly used to analyze immune cells in large patient cohorts do not consider this. This leads to erroneous quantification of important immune mediators and impaired inter-donor comparability.

2.2 Workflow for unraveling the immunogenetic diversity in scData

We have developed a workflow that allows quantification of expression and interactive exploration of donor-specific alleles of different immune genes. The workflow is divided into two software packages and one additional data package:

  1. The scIGD software package consist of a Snakemake workflow designed to automate and streamline the genotyping process for immune genes, focusing on key targets such as HLAs and KIRs, and enabling allele-specific quantification from single-cell RNA-sequencing (scRNA-seq) data using donor-specific references. For detailed information of the performed steps and how to utilize this workflow, please refer to its documentation here: scIGD.

  2. To harness the full analytical potential of the results, we’ve developed a dedicated R package, SingleCellAlleleExperiment presented in this repository. This package provides a comprehensive multi-layer data structure, enabling the representation of immune genes at specific levels, including alleles, genes, and groups of functionally similar genes and thus, allows data analysis across these immunologically relevant, different layers of annotation.

  3. The scaeData is an R/ExperimentHub data package providing datasets generated and processed by the scIGD software package which can be used to explore the data and potential downstream analysis workflows using the here presented novel SingleCellAlleleExperiment data structure. Refer to scaeData for more information regarding the available datasets and source of raw data.

This workflow is designed to support both 10x and BD Rhapsody data, encompassing amplicon/targeted sequencing as well as whole-transcriptome-based data, providing flexibility to users working with different experimental setups.