flowPloidy
is part of the BioConductor respository. To install it, you must first have the Bioconductor system installed. To do, from within R, call:
source("https://bioconductor.org/biocLite.R")
biocLite()
Then install flowPloidy
:
biocLite("flowPloidy")
The examples in this vignette, and the flowPloidy
help files, use data in the flowPloidyData
package, which is also on BioConductor. (You don’t need flowPloidyData
to use flowPloidy
on your own data, only for the examples).
biocLite("flowPloidyData")
You may also install the developmental version of flowPloidy
directly from its Github repository. To do so, you will need the devtools
package:
install.packages("devtools", dependencies = TRUE)
library("devtools")
install_github("plantarum/flowPloidy")
Note that the developmental version of flowPloidy
is also available directly from BioConductor, if you are using the development branch. Instructions on how to do so are available on the BioConductor website.
Once our packages are installed, we only need to load them into memory to use them:
library(flowPloidy)
library(flowPloidyData)
The flowPloidyData
package provides a single variable, flowPloidyFiles
. This is a vector of file names that points to the sample FCS files in your R installation that we will use in this vignette. The actual path to the files will differ slightly on different computers, and on different operating systems; R takes care of those details for us.
You will need to know which channel to use for your analysis. This will most likely remain the same for all analyses from the same flow cytometer. We can use the viewFlowChannels
function to see the options:
viewFlowChannels(flowPloidyFiles[1])
## [1] "FS.INT.LIN" "SS.INT.LIN" "TIME" "FL3.INT.LIN"
## [5] "FL3.PEAK.LIN" "FS.PEAK.LIN" "SS.PEAK.LIN" "FS.TOF.LIN"
## [9] "SS.TOF.LIN"
Hopefully you see something there that corresponds to the channel you’re interested in. In our case, it’s “FL3.INT.LIN
”.
With all the necessary packages installed, and the name of the channel of interest identified, we are ready to proceed with a flowPloidy
analysis.
fh1 <- FlowHist(file = flowPloidyFiles[1], channel = "FL3.INT.LIN",
analyze = FALSE)
(Note that we used the analyze = FALSE
option, to delay analysis until we have inspected the data, for the purposes of this tutorial. This is not necessary, and if you omit the analyze
argument analysis will proceed automatically.)
Now that we have our data loaded, plot
will display it for us:
plot(fh1)
During the loading process, flowPloidy
attempts to locate cell populations and identify initial values for model fitting. You can see this by setting the init = TRUE
argument in the plot
function:
plot(fh1, init = TRUE)
A number of features are displayed on the histogram:
a thick vertical blue line with a label at the top indicates the estimated position of the first sample (‘A’) G1 peak. A blue circle indicates the actual position on the histogram used as the initial value.
a thin vertical blue line without a label indicates the estimated position of the G2 peak corresponding to the A G1 peak.
as for Peak A, but orange
the thin, dashed grey line indicates the initial parameters used for the model-fitting.
The initial estimate is rough, but in this case it’s close enough for the model-fitting routines to find a solution, so we can immediately proceed to the complete analysis.
fh1 <- fhAnalyze(fh1)
## analyzing 188-15.LMD
Note that I have assigned the output of fhAnalyze()
to the original flowHist
object fh1
. The output of the analysis is stored in a new slot in the fh1
object – all the original data remains unaltered in that object.
Now that fh1
contains a fitted model, the results will be displayed when we call plot
again:
plot(fh1)
The plot has been updated to include the fitted model components in different colours:
The aggregate and S-phase components may be quite small, and therefore difficult to see.
In the upper right corner, we see a brief summary of the numerical results. For Sample A and Sample B, peak statistics are reported as position / cell counts / coefficient of variation
. We can see that the coarse initial estimates were indeed close enough to give us a reasonable model fitting. The numerical results are now available by printing the FlowHist
object:
fh1
## FlowHist object '188-15.LMD'
## channel: FL3.INT.LIN
## bins: 256
## linearity: variable
## debris: SC
## 7 model components: fA1, fA2, fB1, SC, AG, brA, brB
## 12 parameters: a1, Ma, Sa, a2, d, b1, Mb, Sb, SCa, Ap, BRA, BRB
## 5 special parameters: xx, SCvals, DBvals, TRvals, QDvals
## Model fit
##
## Analysis
## ========
##
## Ratio Peak A / Peak B: 0.728, SE: 4e-04
##
## | | counts| size| cvs|
## |:------|--------:|-------:|-----:|
## |Peak A | 1443.859| 99.041| 0.022|
## |Peak B | 2822.705| 136.115| 0.020|
##
## RCS: 1.59
Here we see the ratio between the peaks with the standard error of the ratio, and a table of peak data: nuclei counts (i.e., the number of nuclei modelled in the G1 peaks), peak size, the coefficient of variation for the peak, and the Reduced Chi-Square value (RCS) recommended by Bagwell (1993) as an objective assessment of model fit (but see Rabinovitch (1994) for a contrasting view).
Sometimes, particularly with noisy histograms, flowPloidy
will not produce useful starting values:
fh2 <- FlowHist(file = flowPloidyFiles["734.LMD"], channel = "FL3.INT.LIN")
## analyzing 734.LMD
plot(fh2, init = TRUE)
Here we can see that high spike around bin 50 was ignored, resulting in both the A and B peaks being placed together near bin 136. We can manually set the appropriate starting values with pickInit
:
fh2 <- pickInit(fh2)
flowHist
will prompt you to click on the peaks of the first and second peak manually, adding a circle to the plot at each point.
The new peak values are used to recalculate the initial values:
plot(fh2, init = TRUE)
Now we are ready for the analysis:
fh2 <- fhAnalyze(fh2)
## analyzing 734.LMD
plot(fh2)
fh2
## FlowHist object '734.LMD'
## channel: FL3.INT.LIN
## bins: 256
## linearity: variable
## debris: SC
## 7 model components: fA1, fA2, fB1, SC, AG, brA, brB
## 12 parameters: a1, Ma, Sa, a2, d, b1, Mb, Sb, SCa, Ap, BRA, BRB
## 5 special parameters: xx, SCvals, DBvals, TRvals, QDvals
## Model fit
##
## Analysis
## ========
##
## Ratio Peak A / Peak B: 0.366, SE: 0.00056
##
## | | counts| size| cvs|
## |:------|--------:|-------:|-----:|
## |Peak A | 1011.892| 50.903| 0.035|
## |Peak B | 1598.355| 138.932| 0.027|
##
## RCS: 2.332
By default, the histogram analysis uses the Single-Cut Debris model, (described in Bagwell et al. 1991). This works well for most of our data. However, Bagwell (1993) recommends the Multi-Cut model for fresh material, and in some cases it provides better RCS values (although peak parameters rarely change significantly).
There are a few ways to switch to the Multi-Cut model. You can update a FlowHist
object with the function updateFlowHist
:
fh2MC <- updateFlowHist(fh2, debris = "MC")
## updating FlowHist
## analyzing 734.LMD
plot(fh2MC)
In this case, the Multi-Cut model produced a RCS value nearly twice that of the Single-Cut model. Note, however, that the peak parameters hardly changed, and the A/B ratio is identical.
You may also choose which debris model to use as an argument to FlowHist
(and the related batchFlowHist
discussed below):
fh2MC <- FlowHist(file = flowPloidyFiles["734.LMD"],
channel = "FL3.INT.LIN", debris = "MC")
A third option for changing the debris model is to use the interactive plotting function browseFlowHist
, described below.
By default, linearity (the ratio between the G1 and G2 peaks of the same sample) is fit as a parameter. Ideally, linearity should be 2, but due to issues with the running of the flow cytometer, it frequently varies between ca 1.95 and 2.05. If you want to force linearity to be fixed at 2, you can use the linearity
argument to updateFlowHist
or FlowHist
, as we did for debris
above. In this case, possible values for linearity
are fixed
(fixed at 2), or variable
(fit as a parameter). When linearity
is fixed
, it is reported as NA
in plot and summary functions. You may also set/change linearity
with the browseFlowHist
GUI described below.
To summarize the results in a data table, use the tabulateFlowHist
command:
tabulateFlowHist(fh1)
## channel components totalEvents countsA
## 188-15.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 10351 1443.859
## countsB sizeA sizeB cvA cvB ratioAB
## 188-15.LMD 2822.705 99.04086 136.1154 0.02187847 0.01987391 0.7276241
## ratioSE rcs linearity
## 188-15.LMD 0.0003994698 1.590141 1.982862
To combine multiple objects in a single summary, combine them as a list:
tabulateFlowHist(list(fh1, fh2))
## channel components totalEvents countsA
## 188-15.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 10351 1443.859
## 734.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 8630 1011.892
## countsB sizeA sizeB cvA cvB ratioAB
## 188-15.LMD 2822.705 99.04086 136.1154 0.02187847 0.01987391 0.7276241
## 734.LMD 1598.355 50.90349 138.9319 0.03458740 0.02737535 0.3663918
## ratioSE rcs linearity
## 188-15.LMD 0.0003994698 1.590141 1.982862
## 734.LMD 0.0005609404 2.331781 1.952023
As a convenience, if you pass a file name to the file
argument of exportFlowHist
, the table will be saved to that file:
tabulateFlowHist(list(fh1, fh2), file = "flow-results.csv")
batchFlowHist
If you have a large number of files to process, flowHist
can read them all in together. The batchFlowHist
function provides this feature. To use it, you need a list of file names to process. The R function list.files()
is helpful here. If your FCS files are in the directory “datadir/
”, you can collect all their names using:
my.files <- list.files("datadir/", full.names = TRUE)
If you have a mix of FCS files and other files together, you can supply a pattern to match only your data:
my.files <- list.files("datadir/", pattern = ".LMD", full.names = TRUE)
If you want to combine directories, use c()
:
my.files <- list.files("datadir1/", full.names = TRUE)
my.files <- c(my.files, list.files("datadir2/", full.names = TRUE))
For this example, I’ll continue to use the sample data included with flowPloidyData
. Here, we’ll use all of them (i.e., flowPloidyFiles
), not just one (e.g. flowPloidyFiles[1]
).
batchFlowHist
The batchFlowHist()
function has two required arguments:
a vector of file names (character), such as we generated in the previous section.
a string (character) indicating the data channel to read, which we discovered using the function viewFlowChannels()
above.
The following optional arguments may also be used:
the number of bins to group the data into (default: 256)
boolean, if TRUE
(the default) report the filenames as they are processed. Useful for debugging, or tracking progress on long-running jobs
numeric, the size of the moving window used to identify local peaks (default: 20)
numeric, the size of the moving window used to reduce noise in the histogram during peak detection (default: 20)
character, either “SC” or “MC” to use the Single-Cut or Multi-Cut debris model, as described above. (default: “SC”)
character, either “fixed” or “variable” to set the linearity parameter to 2, or allow it to be fit as a model parameter, as described above. (default: “variable”)
If you have clean, narrow peaks, you may want to lower the values of window
and smooth
, say to 16 and 8 respectively. Don’t worry about finding perfect values, we provide a GUI interface (below) to allow for quick and painless correction of peak detection.
batch1 <- batchFlowHist(flowPloidyFiles, channel = "FL3.INT.LIN")
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//188-15.LMD
## analyzing 188-15.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//222.LMD
## analyzing 222.LMD
##
## *** Analysis Failed: 222.LMD ***
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//240+S.LMD
## analyzing 240+S.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//240-4-2+rad.LMD
## analyzing 240-4-2+rad.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//248+S.LMD
## analyzing 248+S.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//248+r.LMD
## analyzing 248+r.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//337.LMD
## analyzing 337.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//350.LMD
## analyzing 350.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//358.LMD
## analyzing 358.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//597.LMD
## analyzing 597.LMD
##
## *** Analysis Failed: 597.LMD ***
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//734.LMD
## analyzing 734.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//744.LMD
## analyzing 744.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//S.621.LMD
## analyzing S.621.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//SM239.LMD
## analyzing SM239.LMD
(You will see some Error messages in the output. That’s ok, we’ll fix the problems shortly!)
You can export the results with tabulateFlowHist()
:
tabulateFlowHist(batch1)
## channel components totalEvents
## 188-15.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 10351
## 222.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 9503
## 240+S.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 14453
## 240-4-2+rad.LMD FL3.INT.LIN fA1;fA2;fB1;fB2;SC;AG;brA;brB 5122
## 248+S.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 14739
## 248+r.LMD FL3.INT.LIN fA1;fA2;fB1;fB2;SC;AG;brA;brB 9026
## 337.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 18948
## 350.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 14426
## 358.LMD FL3.INT.LIN fA1;fB1;SC;AG;brA;brB 13453
## 597.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 9263
## 734.LMD FL3.INT.LIN fA1;fB1;SC;AG;brA;brB 8630
## 744.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 6432
## S.621.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 11400
## SM239.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB 5658
## countsA countsB sizeA sizeB cvA
## 188-15.LMD 1443.8587 2822.7054 99.04086 136.1154 0.02187847
## 222.LMD NA NA NA NA NA
## 240+S.LMD 164.7604 2026.9687 87.69784 164.8631 0.12545655
## 240-4-2+rad.LMD 461.7506 660.9406 63.99719 108.2691 0.03053639
## 248+S.LMD 2661.7095 2935.5363 77.76789 196.9778 0.02715978
## 248+r.LMD 894.2283 2079.3352 68.76870 111.2225 0.02520056
## 337.LMD 0.0000 2293.7212 88.03744 234.0855 0.03175282
## 350.LMD 2688.3866 1646.6759 124.26213 167.4479 0.03523762
## 358.LMD 282.6209 1441.7430 131.60593 200.5481 0.04889248
## 597.LMD NA NA NA NA NA
## 734.LMD 590.4361 924.2660 139.30629 139.0746 0.02733551
## 744.LMD 465.1411 1617.8503 102.66327 140.2114 0.02497280
## S.621.LMD 1584.5243 781.6297 66.64673 201.5002 0.04414191
## SM239.LMD 1150.2769 1612.0374 99.12578 140.9764 0.02740613
## cvB ratioAB ratioSE rcs linearity
## 188-15.LMD 0.01987391 0.7276241 0.0003994698 1.590141 1.982862
## 222.LMD NA NA NA NA NA
## 240+S.LMD 0.06401397 0.5319435 0.1190280689 5.128025 1.904515
## 240-4-2+rad.LMD 0.03418960 0.5910938 0.0016707736 2.112959 2.100000
## 248+S.LMD 0.02576721 0.3948053 0.0004575979 3.455624 1.979404
## 248+r.LMD 0.03593791 0.6182986 0.0012838065 3188.197683 1.925709
## 337.LMD 0.04936347 0.3760910 1.6495732764 2.506160 1.923674
## 350.LMD 0.05406628 0.7420944 0.0034498176 1.353658 1.962301
## 358.LMD 0.05989804 0.6562311 0.0141204064 2.706424 NA
## 597.LMD NA NA NA NA NA
## 734.LMD 0.02577074 1.0016662 0.7096797892 16.478613 NA
## 744.LMD 0.04243955 0.7322036 0.0025099492 3.235380 1.944316
## S.621.LMD 0.03758527 0.3307526 0.0019463835 2.648945 2.016584
## SM239.LMD 0.03064514 0.7031372 0.0005715006 1.286542 1.960193
You can scroll through the resulting plots with the following code:
parOld <- par(ask = TRUE)
lapply(batch1, FUN = plot)
## press enter to scroll through your files!
par(parOld)
However, that’s a bit tedious if you need to tweak the parameters for any of the individual FlowHist
datasets in the list. A more covenient approach is provided by the browseFlowHist
function, which also allows us to address the histograms for which our analysis failed. This is described in the next section.
browseFlowHist
browseFlowHist
uses the Shiny framework to provide an interface for viewing histograms. The main purpose of this interface is to allow users a simple, quick way to confirm that the fitted models are sensible, and to correct the most common problem: misidentified peaks that lead to bad starting parameters.
To use browseFlowHist
, pass it the list of FlowHist
objects we created with batchFlowHist
above:
batch1 <- browseFlowHist(batch1)
IMPORTANT remember to assign browseFlowHist(batch1)
to a variable. If you don’t, all of the corrections you make will be discarded when you close the app. If you want to keep the original values, assign it to a new variable, i.e., batch1_corrected <- browseFlowHist(batch1)
.
At this point, R should respond with:
Listening on http://127.0.0.1:3459
(the number after the :
will probably be different!) and your web browser will open to display something like this:
(You may need to maximize your browser window for the layout to look ‘correct’)
In the upper-left corner you will find the navigation. Click on Next
and Prev
to move forwards and backwards over your list of files. Note the counter - you can’t move past the first or last files, of course.
For the first histogram, file 188-15.LMD
, the model fit is quite good, so we don’t need to worry about improving the initial parameters. Click on the Linearity
or Debris Model
buttons to see how much difference these options make (not much in this case). Notice that as you change the options, the model fit updates automatically. The last selections you make before moving on to the next plot are the ones that will be saved!
The second plot, for 222.LMD
, doesn’t show a fitted model:
This is due to bad initial peak estimates, despite the fact that they look ok. To fix this, move the Samp. A
peak estimate by clicking around near the top of the highest peak. You might need to try a few different spots before you find one that works (I don’t know why this sample is so tricky!).
As you click on the plot, the analysis updates using the new initial peak estimate. If you want to move the peak for sample B, select that in the checkbox above the plot.
Once a good initial peak is selected, the analysis succeeds with a good fit:
The analysis for 240+S.LMD
succeeded, and even produced a not-terrible RCS value:
However, a quick inspection of the plot shows that it’s not a very good fit at all. So we need to move the A
and/or B
peaks until we get something sensible:
That’s better. However, in this case, the Multi-Cut debris model fits even better:
Browse through the rest of the histograms to see which ones succeeded, and which ones need correction. You should be able to get acceptable fits (RCS < 4) for all of them, despite the fact that some of these files are poor quality data.
When you’re done, click on Return to R
to get back to the command line. (You can close the browser window after you click this, not before). Now all you’re corrected analyses will be stored in the variable you passed the results of browseFlowHist()
to. You can check this with:
plot(batch1[["240+S.LMD"]])
We have the corrected model fits, so we can export the data for further analysis:
tabulateFlowHist(batch1)
Remember, if you don’t pass a file
argument to tabulateFlowHist
it only prints the results to he screen, they aren’t saved in a file (which file could it be?). To save the results of your analysis to disk, use:
tabulateFlowHist(batch1, file = "my-analysis-output.csv")
Bagwell, C. Bruce. 1993. “Theoretical Aspects of Flow Cytometry Data Analysis.” In Clinical Flow Cytometry: Principles and Applications, edited by Kennth D. Bauer, Ricardo E. Duque, and T. Vincent Shankey, 41–61. Williams & Wilkins.
Bagwell, C. Bruce, Sara W. Mayo, Sherry D. Whetstone, Shelly A. Hitchcox, David R. Baker, Donald J. Herbert, Donald L. Weaver, Michael A. Jones, and Edmund J. Lovett. 1991. “DNA Histogram Debris Theory and Compensation.” Cytometry 12 (2). Wiley Subscription Services, Inc., A Wiley Company: 107–18. doi:10.1002/cyto.990120203.
Rabinovitch, Peter S. 1994. “DNA Content Histogram and Cell-Cycle Analysis.” In Methods in Cell Biology, edited by Zbigniew Darzynkiewicz, J. Paul Robinson, and Harry A. Crissman, 41:263–96. San Diego, California: Academic Press.