CATALYST 1.10.3
assignPrelim
: Assignment of preliminary IDsestCutoffs
: Estimation of separation cutoffsapplyCutoffs
: Applying deconvolution parametersoutFCS
: Output population-wise FCS filesplotYields
: Selecting barcode separation cutoffsplotEvents
: Normalized intensitiesplotMahal
: All barcode biaxial plotraw_data
is a flowSet
with 2 experiments, each containing 2’500 raw measurements with a variation of signal over time. Samples were mixed with DVS beads captured by mass channels 140, 151, 153, 165 and 175.sample_ff
which follows a 6-choose-3 barcoding scheme where mass channels 102, 104, 105, 106, 108, and 110 were used for labeling such that each of the 20 individual barcodes are positive for exactly 3 out of the 6 barcode channels. Accompanying this, sample_key
contains a binary code of length 6 for each sample, e.g. 111000, as its unique identifier.mp_cells
, the package contains 36 single-antibody stained controls in ss_exp
where beads were stained with antibodies captured by mass channels 139, 141 through 156, and 158 through 176, respectively, and pooled together. Note that, to decrease running time, we downsampled to a total of 10’000 events. Lastly, isotope_list
contains a named list of isotopic compositions for all elements within 75 through 209 u corresponding to the CyTOF mass range at the time of writing (1).CATALYST provides an implementation of bead-based normalization as described by Finck et al. (2). Here, identification of bead-singlets (used for normalization), as well as of bead-bead and cell-bead doublets (to be removed) is automated as follows:
concatFCS
: Concatination of FCS filesMultiple flowFrame
s or FCS files can be concatenated via concatFCS
, which takes either a flowSet
, a list of flowFrame
s, a character specifying the location of the FCS files to be concatinated, or a vector of FCS file names as input. If out_path=NULL
(the default), the function will return a single flowFrame
containing the measurement data of all files. Otherwise, an FCS 3.0 standard file of the concatenated data will be written to the specified location.
library(CATALYST)
data(raw_data)
ff <- concatFCS(raw_data)
normCytof
: Normalization using bead standardsSince bead gating is automated here, normalization comes down to a single function that takes a flowFrame
as input and only requires specification of the beads
to be used for normalization. Valid options are:
"dvs"
for bead masses 140, 151, 153, 165, 175"beta"
for bead masses 139, 141, 159, 169, 175By default, we apply a \(median\;\pm5\;mad\) rule to remove low- and high-signal events from the bead population used for estimating normalization factors. The extent to which bead populations are trimmed can be adjusted via trim
. The population will become increasingly narrow and bead-bead doublets will be exluded as the trim
value decreases. Notably, slight over-trimming will not affect normalization. It is therefore recommended to choose a trim
value that is small enough to assure removal of doublets at the cost of a small bead population to normalize to.
normCytof(x=ff, y="dvs", k=80, plot=FALSE)
CATALYST provides an implementation of the single-cell deconvolution algorithm described by Zunder et al. (3). The package contains three functions for debarcoding and three visualizations that guide selection of thresholds and give a sense of barcode assignment quality.
In summary, events are assigned to a sample when i) their positive and negative barcode populations are separated by a distance larger than a threshold value and ii) the combination of their positive barcode channels appears in the barcoding scheme. Depending on the supplied scheme, there are two possible ways of arriving at preliminary event assignments:
All data required for debarcoding are held in objects of class dbFrame
(see Appendix), allowing for the following easy-to-use work-flow:
assignPrelim
will return a dbFrame
containing the input measurement data, barcoding scheme, and preliminary event assignments.applyCutoffs
. It is recommended to estimate, and possibly adjust, population-specific separation cutoffs by running estCutoffs
prior to this.plotYields
, plotEvents
and plotMahal
aim to guide selection of devoncolution parameters and to give a sense of the resulting barcode assignment quality.dbFrame
with outFCS
.assignPrelim
: Assignment of preliminary IDsThe debarcoding process commences by assigning each event a preliminary barcode ID. assignPrelim
thereby takes either a binary barcoding scheme or a vector of numeric masses as input, and accordingly assigns each event the appropirate row name or mass as ID. FCS files are read into R with read.FCS
of the flowCore package, and are represented as an object of class flowFrame
:
data(sample_ff)
sample_ff
## flowFrame object 'anonymous'
## with 20000 cells and 6 observables:
## name desc range minRange maxRange
## 1 (Pd102)Di BC102 9745.799 -0.9999121 9745.799
## 2 (Pd104)Di BC104 9687.522 -0.9994696 9687.522
## 3 (Pd105)Di BC105 8924.638 -0.9989271 8924.638
## 4 (Pd106)Di BC106 8016.669 -0.9997822 8016.669
## 5 (Pd108)Di BC108 9043.869 -0.9999974 9043.869
## 6 (Pd110)Di BC110 8204.455 -0.9999368 8204.455
## 0 keywords are stored in the 'description' slot
The debarcoding scheme should be a binary table with sample IDs as row and numeric barcode masses as column names:
data(sample_key)
head(sample_key)
## 102 104 105 106 108 110
## A1 1 1 1 0 0 0
## A2 1 1 0 1 0 0
## A3 1 1 0 0 1 0
## A4 1 1 0 0 0 1
## A5 1 0 1 1 0 0
## B1 1 0 1 0 1 0
Provided with a flowFrame
and a compatible barcoding scheme (barcode masses must occur in the parameters of the flowFrame
), assignPrelim
will return a dbFrame
containing exprs
from the input flowFrame
, a numeric or character vector of event assignments in slot bc_ids
, separations between barcode populations on the normalized scale in slot deltas
, normalized barcode intensities in slot normed_bcs
, and the counts
and yields
matrices. Measurement intensities are normalized by population such that each is scaled to the 95% quantile of asinh transformed measurement intensities of events assigned to the respective barcode population.
re0 <- assignPrelim(x=sample_ff, y=sample_key, verbose=FALSE)
re0
## dbFrame object with
## 20000 events, 6 observables and 20 barcodes:
##
## Current assignments:
## 0 event(s) unassigned
## ID A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4
## Count 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
##
## ID C5 D1 D2 D3 D4 D5
## Count 1000 1000 1000 1000 1000 1000
estCutoffs
: Estimation of separation cutoffsAs opposed to a single global cutoff, estCutoffs
will estimate a sample-specific cutoff to deal with barcode population cell yields that decline in an asynchronous fashion. Thus, the choice of thresholds for the distance between negative and positive barcode populations can be i) automated and ii) independent for each barcode. Nevertheless, reviewing the yield plots (see below), checking and possibly refining separation cutoffs is advisable.
For the estimation of cutoff parameters we consider yields upon debarcoding as a function of the applied cutoffs. Commonly, this function will be characterized by an initial weak decline, where doublets are excluded, and subsequent rapid decline in yields to zero. Inbetween, low numbers of counts with intermediate barcode separation give rise to a plateau. To facilitate robust estimation, we fit a linear and a three-parameter log-logistic function (4) to the yields function with the LL.3
function of the drc R package (5) (Figure 1). As an adequate cutoff estimate, we target a point that marks the end of the plateau regime and on-set of yield decline to appropriately balance confidence in barcode assignment and cell yield.
The goodness of the linear fit relative to the log-logistic fit is weighed as follow: \[w = \frac{\text{RSS}_{log-logistic}}{\text{RSS}_{log-logistic}+\text{RSS}_{linear}}\]
The cutoffs for both functions are defined as:
\[c_{linear} = -\frac{\beta_0}{2\beta_1}\] \[c_{log-logistic}=\underset{x}{\arg\min}\:\frac{\vert\:f'(x)\:\vert}{f(x)} > 0.1\]
The final cutoff estimate \(c\) is defined as the weighted mean between these estimates:
\[c=(1-w)\cdot c_{log-logistic}+w\cdot c_{linear}\]