There are currently two general strategies for working with cytometry data in R. The first is to perform the entire analysis from within the R coding environment. There are some excellent packages and standards that allow the import of FCS files, compensation, data transformation, plotting, and exporting summary statistics. However, manual gating of flow cytometry data remains cumbersome and difficult to perform accurately. Of note, there are packages available now that enable automatic, data-driven gates that avoid these problems, but are themselves complex to properly prepare and validate the automatic gates. Moreover, these data-driven gates represent an unfamiliar workflow for the majority of cytometerists that are accustomed to GUI-based analyses such as those enabled by FlowJo\(^{TM}\), Kaluza\(^{TM}\), and other cytometry analysis software.
The second strategy is to first perform compensation, transformation, and gating in FlowJo, and then import the resulting workspace object into R using the flowWorkspace package. This has the advantage of allowing cytometerists to work in a familiar way while still enabling the use of cutting-edge cytometry tools. However, this approach lacks all of the other benefits that an R-exclusive cytometry package would allow. In particular, it is dependent on expensive, closed-source software and does not allow for easy commenting and version control. Nevertheless, manual gating remains sufficiently difficult in R as to make this the more common method.
flowGate was developed to fill in this missing ability for manual, GUI-based gating in R, finally enabling a familiar cytometry analysis pipeline completely within R. This vignette will demonstrate the flowGate function within the context of a complete cytometry analysis pipeline and is geared toward a researcher who has never used R for flow cytometry analysis at all.
flowGate is now submitted to the Bioconductor project, so installing it should be pretty straightforward. Note that it’s still possible to install the unstable devel version from github (see instructions below), but for general use, the Bioconductor version is the better choice.
If you’d prefer to install the devel version from GitHub, please complete steps 1 and 2 above, then proceed here instead:
if(!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")
}
if(!requireNamespace("flowCore", quietly = TRUE)){
BiocManager::install("flowCore")
}
if(!requireNamespace("flowWorkspace", quietly = TRUE)){
BiocManager::install("flowWorkspace")
}
if(!requireNamespace("ggcyto", quietly = TRUE)){
BiocManager::install("ggcyto")
}
Once you have the BioConductor dependencies installed, you should be able to run the following code to download and install flowGate.
# install devtools if you don't already have it
if(!requireNamespace("devtools", quietly = TRUE)){
install.packages("devtools")
}
devtools::install_github("NKInstinct/flowGate")
Again, if you aren’t sure which version to use, please go with the Bioconductor version.
Before you can start drawing gates on your data, you need to read it into R and compensate it . If this seems a little complex, don’t worry—it can be a bit much to wrap your head around the first time you do it, but once you know the options you would like to use, you can write them all into a single function and then use that for automatic data import and transformation in the future. We’ll cover preparing just such a function at the end of this section in case you aren’t sure how to do that.
The example flow data used in this package comes from the GvHD data included in flowCore, the base package for flow cytometry data analysis in R. The GvHD data that comes with flowCore is already stored in a flowSet, so for this example, you can find the first three samples from GvHD bundled with flowGate as individual .FCS files, which we will turn into a flowSet here.
library(flowGate)
#> Loading required package: flowWorkspace
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#>
#> vignette("flowWorkspace-Introduction", "flowWorkspace")
#> Loading required package: ggcyto
#> Loading required package: ggplot2
#> Loading required package: flowCore
#> Loading required package: ncdfFlow
#> Loading required package: BH
library(flowCore)
path_to_fcs <- system.file("extdata", package = "flowGate")
The system.file()
command above is needed to get at the .FCS files that have shipped with flowGate. However, when you are ready to read your own flow data into R, you can provide a simple path to the directory you are interested in. For example, if your target directory is something called “Flow Data” in your current directory, you could simply pass path_to_fcs <- "./Flow Data/"
and that would do the same thing.
Running this command first finds all of the files at the location specified by path_to_fcs and checks if any of them ends in .FCS (that’s the pattern = ".FCS$"
part of the code). All of them that satisfy the pattern are loaded into a flowSet called fs. Note that there are a lot of customization options here—have a look at the flowCore vignettes if you want to change something about this default behaviour.
If you have a lot of flow data, you might not want to load it all into a flowSet following the above instructions. This is because the whole flowSet we just created exists in RAM, and if it is extremely large, it might cause problems depending on how much RAM you have available to you. Instead, you can use the ncdfFlow package to directly access the data on your hard drive. This will cause all operations on the data to be slightly slower, but will not consume an enormous chunk of system RAM. It also has some benefits for keeping all your data and analyses together, which we’ll touch on when we talk about saving gated cytometry data. For those reasons, I tend to prefer this NCDF approach for cytometry analysis, but it doesn’t really matter which one you use, especially when just starting out.
FlowSets are excellent containers for holding flow data, but they cannot store information about gating very well. The flowWorkspace package introduces a new, similar data structure called a GatingSet that holds both the flowSet we just made and all gating information about it. Thankfully, once we have made a flowSet, it is very easy to prepare a GatingSet.
So far we have created a flowSet and then turned it into a GatingSet. This GatingSet is a container that can hold many different kinds of information. Currently, it has the raw expression values recorded off of the cytometer and experimental metadata. Both of these were contained in the FCS file, and so were loaded into the flowSet and then brought over to the GatingSet. Next, we are going to add a compensation matrix to this container, so that the data are properly compensated. For these specific example data, the compensation matrix is stored in an external file which we will import and apply to the GatingSet. There are other ways to compensate, which we will mention below.
path_to_comp <- system.file("extdata",
"compdata",
"compmatrix",
package = "flowCore")
comp_matrix <- read.table(path_to_comp,
header = TRUE,
skip = 2,
check.names = FALSE)
comp <- compensation(comp_matrix)
gs <- compensate(gs, comp)
Although the above example uses an externally-stored compensation matrix, a common use-case will be that you have acquired flow data for which you performed instrument-level compensation. In this case, the compensation matrix is saved directly in the .FCS file upon export, and you can read that into the comp object instead of loading an external file.
There are several places in a .FCS file where a compensation matrix can be stored. My Fortessa X20 stores it in the $SPIL slot of a .FCS, but other cytometers may behave differently. To find out where yours is, you need to first call spillover()
on one of the samples in your flowSet (not the GatingSet!), and then look at the output. One of the slots that gets returned to you will look like a compensation matrix—take note of which one that is and then store it in a variable called comp.
Here is an example. Note that since the .FCS files used in this example do not have an acquisition-defined compensation, trying to run this code using these data will fail.
# Not run for the purposes of the vignette
# Find out which slot the compensation data are in.
spillover(some_new_fs[[3]])
# You need to select one of the samples contained in the flowSet. I chose the
# third one here ([[3]]), but that is completely arbitrary. The should all have
# the exact same compensation matrix.
# This command should output a list of several objects. One of them should look
# like a compensation matrix. If we were running this command on data from my
# Fortessa, it would be the first object in the list (the $SPIL slot), but look
# at your data and see which object you want to work with. Once you know which
# object is your compensation matrix, proceed.
comp_matrix <- spillover(some_new_fs[[3]])[[1]]
# Note that I have selected the first object in the list, which corresponds to
# where my acquisition-defined matrices are stored. If yours is in a different
# list object, put that object's number in place of the [[1]] above.
# From here, it is exactly like before:
comp <- compensation(comp_matrix)
some_new_gs <- compensate(some_new_gs, comp)
It is also possible, using the flowCore package, to automatically create a compensation matrix from single colour control samples. Exactly how to do this is beyond the scope of this vignette, so you are encouraged to look into the flowCore vignettes for further instructions.
As mentioned above, all of that seems like a lot of work just to import the flow data into R. However, a lot of the complexity of this import step comes from not knowing exactly how your specific experiment is set up. Once you know that, you can write all of this into a single function that holds all of your defaults, and then you can just call that function to import your data. If we were to do that with the above data import, it would look something like this:
import_gs <- function(path, pattern, compensation_matrix){
fs <- read.flowSet(path = path,
pattern = pattern,
full.names = TRUE)
gs <- GatingSet(fs)
comp <- compensation(compensation_matrix)
gs <- compensate(gs, comp)
return(gs)
}
Now that we have defined this function, we can do all of the above steps in a couple of lines of code:
#Setup the paths to your data as before
path_to_fcs <- system.file("extdata",
package = "flowGate")
path_to_comp<- system.file("extdata",
"compdata",
"compmatrix",
package = "flowCore")
comp_matrix <- read.table(path_to_comp,
header = TRUE,
skip = 2,
check.names = FALSE)
#Then run the import function we just created
gs <- import_gs(path = path_to_fcs,
pattern = ".FCS$",
compensation_matrix = comp_matrix)
This is a pretty basic function, and there’s a lot more we could do to make it more useful for other experiments with slightly different conditions, but it’s a good start for now and hopefully demonstrates that once you have the hang of it, importing cytometry data into R is neither difficult nor time consuming.
If you are used to working with flow data in R, you might be wondering why we haven’t applied transforms (biex, arcsinh, etc) to the fluorescent channels of these data yet. flowGate is designed so that you can apply biexponential scaling to your flow data directly when gating, rather than at this import step, meaning you can set the biex parameters yourself while looking at the data. Note that in this way, the data are transformed at the plotting level rather than the data level. We’ll talk a little more about that later.
However: if you have very large flow files, you might notice some odd behaviour when trying to draw polygon gates in particular, where your data seems to move around the plot. The gate that you’re drawing will move with them, so you can kind of “roll with it” if you like, but if you find this unworkable, you should apply your transformations first before trying to draw gates on your data. Because flowGate is written to allow on-the-fly changes to almost all aspects of your plot, R has to re-calculate the transformation every time you change anything else in the plot, which can cause this artifact when working with large amounts of data.
The flowCore and flowWorkspace vignettes go into great detail about how to apply transformations (start with the flowWorkspace one since it’s designed for GatingSet data), and if you just want a decent transform without any fuss, have a look at the estimateLogicle
function in flowWorkspace.
Now that we have a compensated and transformed GatingSet object holding all of our flow cytometry data, it is time to start gating through it. If you were working on your own data, you would probably be able to jump right in knowing what parameters are in your dataset, but since this is an example set, it is helpful to have a quick look at the channel names contained in the data.
There are two kinds of names for each channel in this GatingSet object: the detector name (such as “FL1-H”) and, if specified, the marker name (such as “CD15 FITC”). We can access the first kind of name with colnames()
like we did above, and the second kind with markernames()
:
colnames(gs)
#> [1] "FSC-H" "SSC-H" "FL1-H" "FL2-H" "FL3-H" "FL2-A" "FL4-H" "Time"
markernames(gs)
#> FL1-H FL2-H FL3-H FL4-H
#> "CD15 FITC" "CD45 PE" "CD14 PerCP" "CD33 APC"
Note that not every detector name has a corresponding marker name. Thankfully, flowGate can handle either kind of name interchangeably, so it’s not hard to use whichever is more appropriate for the situation.
As with most cytometry experiments, the first thing we are going to look at is cells, as defined by their forward and side scatter. Ironically, because this vignette is non-interactive, you will have to do some of the legwork here yourself to draw your gates. I will annotate this to help you follow along, but your best bet is to run this code while reading to see how it works.
#> [1] 2
When you first run the gs_gate_interactive command, a window like this should appear
The sidebar on the left lets you decide what kind of gate you want to draw, and the main window in the middle shows a plot of your data that you can interact with. You can also set the number of bins with a slider to the left - more bins means the data will be plotted with higher resolution (more, smaller dots).
To draw a gate, the first thing you need to do is pick what kind of gate you want to draw. It is very important that you pick the kind of gate you want to draw first, and then draw it second. Doing it the other way tends to cause errors.
To select your leukocytes, switch the gating mode over to polygon by clicking on the polygon radio button
Once you have clicked on the kind of gate you want, you can proceed to draw it. Polygon gates (like this one) are drawn by clicking multiple times to trace a polygon. Other gates are drawn differently (Rectangle and Span gates are drawn by clicking and dragging a rectangle, and Quadrant gates are drawn by clicking once where the four quadrants meet).
Go ahead and draw a polygon gate on your data.
If you think you’ve mis-clicked and want to start over, just hit the “Reset” button on the top left and then start drawing your polygon again. Once you are happy with it, click “Done” to close the window and apply the gate to your whole GatingSet.
When you are finished gating, R will return a list containing a few items that help you to re-create the gate you just drew. The first is the actual gate definition as a flowCore gate object. This will give you the filterID, the coordinates, gate type, and so on, and can be passed to other cytometry packages or added to other GatingSets like any gate. The second is the number of bins you chose to display the data as: this is useful if you want to exactly re-create the plot you gated on later. Finally, the third item in the list is the scaling parameters used in the gating. In this case, we didn’t turn on any scaling since we were looking at linear data, so it should return “unused”, but in a moment, we’ll look at what that means more directly.
If something goes wrong when you are drawing your gates (such as if you draw a polygon gate when you still have “rectangle gate” selected), the shiny app can hang. If you exit out of the shiny window without clicking on “done” first, or if you do click “done” but get some warnings or errors, it’s a good idea to make sure that the shiny app isn’t still running in the background. Have a look at your R Console window and see if there is a stop sign.
If that button is there, it means that the shiny app is still running and you should stop it before proceeding. Trying to do anything else in R while the shiny app runs in the background can cause all kinds of mysterious errors.
Now that we have drawn a leukocytes gate, it is a good idea to have a look at the plot and see that it looks the way we want it to. There are a number of ways to plot flow cytometry data in R—my favourite is with the ggcyto package, which is automatically installed with flowGate for visualization purposes. Since we only want to peek at the data right now to make sure our gate looks right, we can use the easy-but-rigid autoplot()
function, like so:
autoplot(gs[[1]], gate = "Leukocytes")
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
The important thing here is that you have a gate drawn on a hex plot with a percentage in the middle of it—don’t worry if it doesn’t look exactly like the one here, and don’t worry if the plot doesn’t look the way you want it to for publication. We’ll cover how to make very nice plots at the end.
Common Mistake: if, when you run autoplot, you don’t see a gate but you do see a big “0%” sitting roughly where your gate should be, that means you didn’t switch the window to polygon gate before drawing your polygon, and the program is trying to make a rectangle gate out of the very first point you clicked for your polygon (hence an invisibly-small rectangle with 0% of the events in it). Again, we’re working on making this mistake harder to make, but for now, just redo the gate by re-running gs_gate_interactive()
again. Just make sure you add regate = TRUE
to the command so flowGate knows to delete this Leukocytes gate before trying to add another Leukocytes gate to root.
Now that we have a Leukocytes gate drawn, we can drill down into them and gate on the other markers in our sample. The next likely gate will be to take all of the CD45+ cells for further analysis. For illustrative purposes, let’s gate this one with a 1-D span gate on a histogram.
#> [1] 3
Note that this time, unlike previously, we specify the dimensions (dims
) we want to work with (CD45), and also the subset
we want to look at (the “parent gate”, in this case Leukocytes). Running gs_gate_interactive()
with these arguments will draw a histogram instead of a dot plot, but otherwise the window will look as before.
Switch the gating mode over to “Span”
Then draw your gate. For span and rectangle gates, you draw them by clicking and dragging a rectangle on the plot. In fact, the only difference between span and rectangle gates is that span only considers the horizontal dimensions, while rectangle considers both. So for this gate, you can draw the rectangle as tall as you want, since span gates don’t care about the vertical dimensions of the rectangle. This can be useful for drawing the gate exactly where you want it, since you can use the vertical dimensions to help line it up with your histogram peaks.
As before, when you are happy with the gate, click Done to close the window and apply the gate. Unlike with polygon and quadrant gates, you don’t need to click Reset if you want to adjust the rectangle on your graph—you can just adjust or redraw it as many times as you like (note: nothing bad will happen if you click Reset when drawing rectangles, so don’t worry if you do).
Now, as before, we can have a peek at the gate. This time, however, we’re going to use the more expanded ggcyto command to start to get familiar with it. ggcyto uses the same grammar of graphics that ggplot uses, so if you are already familiar with ggplot, this should be very straightforward. If not, a full introduction to the grammar of graphics is beyond the scope of this vignette, but look at the code below and see if you can follow what is happening.
The idea behind the grammar of graphics is that you can draw any graph by first specifying the data the graph comes from, and then specifying the kind of image you want to make from those data. In this case, the first line of code tells ggcyto that we want to make a graph based on the first sample contained in gs (gs[[1]]
), and that we want to look at the CD45 dimension of these data. In the grammar of graphics, this is called an aesthetic, hence “aes”.
After this line specifying what sort of data we want to look at, the next three apply types of images to the data. The first, geom_density()
, draws a density plot of the CD45 aesthetic in our data. The second, geom_gate("CD45")
, adds the gate named “CD45” to our data. The third, geom_stats()
, adds all relevant stats (in this case the percent parent) to any gates drawn on the graph.
Using ggcyto()
instead of autoplot()
is a little more cumbersome to type, but provides much more customization to the resulting graph, and is well worth learning how to use if you are regularly going to use R and flowGate for flow analysis. For that reason, the rest of this vignette will use ggcyto()
calls to generate graphs, so we can get more familiar with what they look like.
The last gate to add to this plot is a quadrant gate on CD33 and CD15. As before, we’ll call gs_gate_interactive()
with the appropriate arguments, then switch our gating mode to quadrant, and then click exactly once, where you want the center of the quadrant gate to be. As with the polygon gate, if you mis-click, you can click on Reset to reset your selection and then try again.
Note this time, however, the fact that we haven’t transformed any fluorescent channels is really causing problems. Fortunately, you can click on the “use FlowJo Biex” checkbox to enable re-scaling this plot on a flowJo-style biexponential scale.