Contents

1 Introduction

The purpose of the alevinQC package is to generate a summary QC report based on the output of an alevin (Srivastava et al. 2018) run. The QC report can be generated as a html or pdf file, or launched as a shiny application.

2 Installation

alevinQC can be installed using the BiocManager CRAN package.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("alevinQC")

After installation, load the package into the R session.

library(alevinQC)

3 Assumed output directory structure

For more information about running alevin, we refer to the documentation. When invoked, alevin generates several output files in the specified output directory. alevinQC assumes that this structure is retained, and will return an error if it isn’t - thus, it is not recommended to move or rename the output files from alevin. alevinQC assumes that the following files (in the indicated structure) are available in the provided baseDir (note that currently, in order to generate the full set of files, alevin must be invoked with the --dumpFeatures flag).

baseDir
  |--alevin
  |    |--featureDump.txt
  |    |--filtered_cb_frequency.txt
  |    |--MappedUmi.txt
  |    |--quants_mat_cols.txt
  |    |--quants_mat_rows.txt
  |    |--quants_mat.gz
  |    |--raw_cb_frequency.txt
  |    |--whitelist.txt
  |--aux_info
  |    |--meta_info.json
  |--cmd_info.json

4 Check that all required alevin files are available

The report generation functions (see below) will check that all the required files are available in the provided base directory. However, you can also call the function checkAlevinInputFiles() to run the check manually. If one or more files are missing, the function will raise an error indicating the missing file(s).

baseDir <- system.file("extdata/alevin_example", package = "alevinQC")
checkAlevinInputFiles(baseDir = baseDir)

5 Generate QC report

The alevinQCReport() function generates the QC report from the alevin output. Depending on the file extension of the outputFile argument, and the value of outputFormat, the function can generate either an html report or a pdf report.

outputDir <- tempdir()
alevinQCReport(baseDir = baseDir, sampleId = "testSample", 
               outputFile = "alevinReport.html", 
               outputFormat = "html_document",
               outputDir = outputDir, forceOverwrite = TRUE)

6 Create shiny app

In addition to static reports, alevinQC can also generate a shiny application, containing the same summary figures as the pdf and html reports.

app <- alevinQCShiny(baseDir = baseDir, sampleId = "testSample")

Once created, the app can be launched using the runApp() function from the shiny package.

shiny::runApp(app)

7 Generate individual plots

The individual plots included in the QC reports can also be independently generated. To do so, we must first read the alevin output into an R object.

alevin <- readAlevinQC(baseDir = baseDir)
#> reading in alevin gene-level counts across cells
#> Joining, by = "CB"

The resulting list contains three entries:

head(alevin$cbTable)
#>                 CB originalFreq ranking collapsedFreq mappingRate
#> 1 GACTGCGAGGGCATGT       121577       1        123419    0.853256
#> 2 GGTGCGTAGGCTACGA       110467       2        111987    0.844339
#> 3 ATGAGGGAGTAGTGCG       106446       3        108173    0.826177
#> 4 ACTGTCCTCATGCTCC       104794       4        106085    0.778442
#> 5 CGAACATTCTGATACG       104616       5        106072    0.802634
#> 6 ACTGTCCCATATGGTC        99208       6        100776    0.811999
#>   duplicationRate dedupRate nbrGenesAboveMean nbrMappedUMI totalUMICount
#> 1     0.000510955  0.293416              7345       105308         74409
#> 2     0.000541694  0.292190              7306        94555         66927
#> 3     0.000541090  0.294305              6876        89370         63068
#> 4     0.000393819  0.299899              6733        82581         57815
#> 5     0.000501289  0.303393              7142        85137         59307
#> 6     0.000597173  0.300086              6637        81830         57274
#>   nbrGenesAboveZero inFinalWhiteList inFirstWhiteList
#> 1              7532             TRUE             TRUE
#> 2              7520             TRUE             TRUE
#> 3              7078             TRUE             TRUE
#> 4              6925             TRUE             TRUE
#> 5              7344             TRUE             TRUE
#> 6              6831             TRUE             TRUE
knitr::kable(alevin$summaryTables$fullDataset)
Total number of processed reads 7197662
Number of reads with valid cell barcode (no Ns) 7162300
Total number of observed cell barcodes 188613
knitr::kable(alevin$summaryTables$initialWhitelist)
Number of barcodes in initial whitelist 299
Fraction reads in initial whitelist barcodes 87.41%
Mean number of reads per cell (initial whitelist) 20939
Median number of reads per cell (initial whitelist) 342
Median number of detected genes per cell (initial whitelist) 205
Total number of detected genes (initial whitelist) 31396
Median UMI count per cell (initial whitelist) 212
knitr::kable(alevin$summaryTables$finalWhitelist)
Number of barcodes in final whitelist 98
Fraction reads in final whitelist barcodes 83.8%
Mean number of reads per cell (final whitelist) 61242
Median number of reads per cell (final whitelist) 58349
Median number of detected genes per cell (final whitelist) 5269
Total number of detected genes (final whitelist) 31050
Median UMI count per cell (final whitelist) 31939
knitr::kable(alevin$versionTable)
Start time Tue Nov 20 15:43:04 2018
Salmon version 0.11.4
Index /mnt/scratch5/avi/alevin/data/mohu/salmon_index/
R1file /mnt/scratch5/avi/alevin/data/10x/mohu/100/all_bcs.fq
R2file /mnt/scratch5/avi/alevin/data/10x/mohu/100/all_reads.fq
tgMap /mnt/scratch5/avi/alevin/data/mohu/gtf/txp2gene.tsv

The plots can now be generated using the dedicated plotting functions provided with alevinQC (see the help file for the respective function for more information).

plotAlevinKneeRaw(alevin$cbTable)

plotAlevinBarcodeCollapse(alevin$cbTable)

plotAlevinQuant(alevin$cbTable)

plotAlevinKneeNbrGenes(alevin$cbTable)

8 Session info

sessionInfo()
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] alevinQC_1.0.0   BiocStyle_2.12.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1           highr_0.8            later_0.8.0         
#>  [4] pillar_1.3.1         compiler_3.6.0       BiocManager_1.30.4  
#>  [7] RColorBrewer_1.1-2   plyr_1.8.4           tools_3.6.0         
#> [10] digest_0.6.18        evaluate_0.13        tibble_2.1.1        
#> [13] gtable_0.3.0         pkgconfig_2.0.2      rlang_0.3.4         
#> [16] shiny_1.3.2          GGally_1.4.0         crosstalk_1.0.0     
#> [19] yaml_2.2.0           xfun_0.6             dplyr_0.8.0.1       
#> [22] stringr_1.4.0        knitr_1.22           htmlwidgets_1.3     
#> [25] shinydashboard_0.7.1 cowplot_0.9.4        DT_0.5              
#> [28] grid_3.6.0           tidyselect_0.2.5     reshape_0.8.8       
#> [31] glue_1.3.1           R6_2.4.0             rmarkdown_1.12      
#> [34] bookdown_0.9         purrr_0.3.2          ggplot2_3.1.1       
#> [37] magrittr_1.5         promises_1.0.1       scales_1.0.0        
#> [40] htmltools_0.3.6      tximport_1.12.0      assertthat_0.2.1    
#> [43] xtable_1.8-4         mime_0.6             colorspace_1.4-1    
#> [46] httpuv_1.5.1         labeling_0.3         stringi_1.4.3       
#> [49] lazyeval_0.2.2       munsell_0.5.0        rjson_0.2.20        
#> [52] crayon_1.3.4

References

Srivastava, Avi, Laraib Malik, Tom Sean Smith, Ian Sudbery, and Rob Patro. 2018. “Alevin Efficiently Estimates Accurate Gene Abundances from dscRNA-seq Data.” bioRxiv Doi:10.1101/335000.