1 Basics

1.1 Install `qsvaR`

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. qsvaR is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install qsvaR by using the following commands in your R session:

## To install Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("qsvaR")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

## You can install the development version from GitHub with:
BiocManager::install("LieberInstitute/qsvaR")

1.2 Required knowledge

qsvaR is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. That is, packages like SummarizedExperiment. Here it might be useful for you to check the qSVA framework manuscript (Jaffe et al, PNAS, 2017).

If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.

1.3 Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the qsvaR tag and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

1.4 Citing `qsvaR`

We hope that qsvaR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("qsvaR")
#> To cite package 'qsvaR' in publications use:
#> 
#>   Stolz JM, Tnani H, Collado-Torres L (2024). _qsvaR_.
#>   doi:10.18129/B9.bioc.qsvaR <https://doi.org/10.18129/B9.bioc.qsvaR>,
#>   https://github.com/LieberInstitute/qsvaR/qsvaR - R package version
#>   1.8.0, <http://www.bioconductor.org/packages/qsvaR>.
#> 
#>   Stolz JM, Tnani H, Tao R, Jaffe AE, Collado-Torres L (2024). "qsvaR."
#>   _bioRxiv_. doi:10.1101/TODO <https://doi.org/10.1101/TODO>,
#>   <https://www.biorxiv.org/content/10.1101/TODO>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

2 `qsvaR` Overview

library("qsvaR")
library("limma")
library("BiocFileCache")

2.1 Significant Transcripts

Differential expressions analysis requires the ability normalize complex datasets. In the case of postmortem brain tissue we are tasked with removing the effects of bench degradation. Our current work expands the scope of qSVA by generating degradation profiles (5 donors across 4 degradation time points: 0, 15, 30, and 60 minutes) from six human brain regions (n = 20 * 6 = 120): dorsolateral prefrontal cortex (DLPFC), hippocampus (HPC), medial prefrontal cortex (mPFC), subgenual anterior cingulate cortex (sACC), caudate, amygdala (AMY). We identified an average of 80,258 transcripts associated (FDR < 5%) with degradation time across the six brain regions. Testing for an interaction between brain region and degradation time identified 45,712 transcripts (FDR < 5%). A comparison of regions showed a unique pattern of expression changes associated with degradation time particularly in the DLPFC, implying that this region may not be representative of the effects of degradation on gene expression in other tissues. Furthermore previous work was done by analyzing expressed regions (Collado-Torres et al, NAR, 2017), and while this is an effective method of analysis, expressed regions are not a common output for many pipelines and are computationally expensive to identify, thus creating a barrier for the use of any qSVA software. In our most recent work expression quantification was performed at the transcript level using Salmon (Patro et al, Nat Methods, 2017). The changes from past work on qSVs to now is illustrated in the below cartoon.

The diagram on the right shows the 2016 experimental design for qSVA.The diagram on the right shows the 2022 experimental design for qSVA.

Figure 1: The diagram on the right shows the 2016 experimental design for qSVA.The diagram on the right shows the 2022 experimental design for qSVA

The qsvaR (Stolz, Tnani, and Collado-Torres, 2024) package combines an established method for removing the effects of degradation from RNA-seq data with easy to use functions. The first step in this workflow is to create an RangedSummarizedExperiment object with the transcripts identified in our qSVA experiment. If you already have a RangedSummarizedExperiment of transcripts we can do this with the getDegTx() function as shown below.If not this can be generated with the SPEAQeasy (a RNA-seq pipeline maintained by our lab) pipeline usinge the --qsva flag. If you already have a RangedSummarizedExperiment object with transcripts then you do not need to run SPEAQeasy. This flag requires a full path to a text file, containing one Ensembl transcript ID per line for each transcript desired in the final transcripts R output object (called rse_tx). The sig_transcripts argument in this package should contain the same Ensembl transcript IDs as the text file for the --qsva flag. The goal of qsvaR is to provide software that can remove the effects of bench degradation from RNA-seq data.

bfc <- BiocFileCache::BiocFileCache()

## Download brainseq phase 2 ##
rse_file <- BiocFileCache::bfcrpath(
    "https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata",
    x = bfc
)
load(rse_file, verbose = TRUE)
#> Loading objects:
#>   rse_tx

2.2 Get Degradation Matrix

In this next step we subset for the transcripts associated with degradation. These were determined by Joshua M. Stolz et al, 2022. We have provided three models to choose from. Here the names "cell_component", "top1500", and "standard" refer to models that were determined to be effective in removing degradation effects. The "standard" model involves taking the union of the top 1000 transcripts associated with degradation from the interaction model and the main effect model. The "top1500" model is the same as the "standard" model except the union of the top 1500 genes associated with degradation is selected. The most effective of our models, "cell_component", involved deconvolution of the degradation matrix to determine the proportion of cell types within our studied tissue. These proportions were then added to our model.matrix() and the union of the top 1000 transcripts in the interaction model, the main effect model, and the cell proportions model were used to generate this model of quality surrogate variables (qSVs). In this example we will choose "cell_component" when using the getDegTx() and select_transcripts() functions.

Figure 2: The above venn diagram shows the overlap between transcripts in each of the previously mentioned models

# obtain transcripts with degradation signature
DegTx <- getDegTx(rse_tx, type = "cell_component")
dim(DegTx)
#> [1] 2976  900

2.3 Generate principal components

The qSVs are derived from taking the principal components (PCs) of the selected transcript expression data. This can be done with the function getPCs. qSVs are essentially pricipal components from an rna-seq experiment designed to model bench degradation. For more on principal components you can read and introductory article here. rse_tx specifies a RangedSummarizedExperiment object that has the specified degraded transcripts. For us this is DegTx. Here tpm is the name of the assay we are using within the RangedSummarizedExperiment object, where TPM stands for transcripts per million.

pcTx <- getPCs(rse_tx = DegTx, assayname = "tpm")

2.4 Calculate Number of PCs Needed

Next we use the k_qsvs() function to calculate how many PCs we will need to account for the variation. A model matrix accounting for relevant variables should be used. Common variables such as Age, Sex, Race and Religion are often included in the model. Again we are using our RangedSummarizedExperiment DegTx as the rse_tx option. Next we specify the mod with our model.matrix(). model.matrix() creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly. For more information on creating a design matrix for your experiment see the documentation here. Again we use the assayname option to specify that we are using the tpm assay.

# design a basic model matrix to model the number of pcs needed
mod <- model.matrix(~ Dx + Age + Sex + Race + Region,
    data = colData(rse_tx)
)

## To ensure that the results are reproducible, you will need to set a
## random seed with the set.seed() function. Internally, we are using
## sva::num.sv() which needs a random seed to ensure reproducibility of the
## results.
set.seed(20230621)

# use k qsvs function to return a integer of pcs needed
k <- k_qsvs(rse_tx = DegTx, mod = mod, assayname = "tpm")
print(k)
#> [1] 34

2.5 Return qSV Matrix

Finally we subset our data to the calculated number of PCs. The output of this function will be the qsvs for each sample. Here we use the qsvPCs argument to enter the principal components (pcTx). Here the argument k is the number of PCs we are going to include as calculated in the previous step.

# get_qsvs use k to subset our pca analysis
qsvs <- get_qsvs(qsvPCs = pcTx, k = k)
dim(qsvs)
#> [1] 900  34

This can be done in one step with our wrapper function qSVA which just combinds all the previous mentioned functions.

## To ensure that the results are reproducible, you will need to set a
## random seed with the set.seed() function. Internally, we are using
## sva::num.sv() which needs a random seed to ensure reproducibility of the
## results.
set.seed(20230621)

## Example use of the wrapper function qSVA()
qsvs_wrapper <- qSVA(rse_tx = rse_tx, type = "cell_component", mod = mod, assayname = "tpm")
dim(qsvs_wrapper)
#> [1] 900  34

2.6 Differential Expression

Next we can use a standard limma package approach to do differential expression on the data. The key here is that we add our qsvs to the model.matrix(). Here we input our RangedSummarizedExperiment object and our model.matrix() with qSVs. Note here that the RangedSummarizedExperiment object is the original object loaded with the full list of transcripts, not the the one we subsetted for qSVs. This is because while PCs can be generated from a subset of genes, differential expression is best done on the full dataset. The expected output is a sigTx object that shows the results of differential expression.

### should be done by cbinding mod to pcs
## subset to an expression cutoff
rse_tx <- rse_tx[rowData(rse_tx)$passExprsCut, ]

# create a model.matrix with demographic infor and qsvs
mod_qSVA <- cbind(mod, qsvs)

# log tranform transcript expression
txExprs <- log2(assays(rse_tx)$tpm + 1)

# linear model differential expression
fitTx <- lmFit(txExprs, mod_qSVA)

# generate empirical bayes for DE
eBTx <- eBayes(fitTx)

# get DE results table
sigTx <- topTable(eBTx,
    coef = 2,
    p.value = 1, number = nrow(rse_tx)
)
head(sigTx)
#>                         logFC   AveExpr         t      P.Value    adj.P.Val
#> ENST00000553142.5 -0.06547988 2.0390889 -5.981344 3.242087e-09 0.0003006452
#> ENST00000552074.5 -0.12911383 2.4347985 -5.372043 1.002462e-07 0.0046480161
#> ENST00000530589.1 -0.10297938 2.4271711 -4.918614 1.044003e-06 0.0242568671
#> ENST00000510632.1  0.08994392 0.9073516  4.918168 1.046321e-06 0.0242568671
#> ENST00000450454.6  0.08446871 1.0042440  4.814104 1.746151e-06 0.0256773930
#> ENST00000572236.1 -0.05358333 0.6254025 -4.806373 1.813172e-06 0.0256773930
#>                           B
#> ENST00000553142.5 10.661953
#> ENST00000552074.5  7.426876
#> ENST00000530589.1  5.230075
#> ENST00000510632.1  5.228001
#> ENST00000450454.6  4.749607
#> ENST00000572236.1  4.714453

If we look at a plot of our most significant transcript we can see that at this level we don’t have that much fold change in our data at any individual transcript. These transcripts are however significant and it might be valuable to repeat the analysis at gene level. At gene level the results can be used to do gene ontology enrichment with packages such as clusterProfiler.

# get expression for most significant gene
yy <- txExprs[rownames(txExprs) == rownames(sigTx)[1], ]

## Visualize the expression of this gene using boxplots
p <- boxplot(yy ~ rse_tx$Dx,
    outline = FALSE,
    ylim = range(yy), ylab = "log2 Exprs", xlab = "",
    main = paste(rownames(sigTx)[1])
)

We can assess the effectiveness of our analysis by first performing DE without qSVs

# log tranform transcript expression
txExprs_noqsv <- log2(assays(rse_tx)$tpm + 1)

# linear model differential expression with generic model
fitTx_noqsv <- lmFit(txExprs_noqsv, mod)

# generate empirical bayes for DE
eBTx_noqsv <- eBayes(fitTx_noqsv)

# get DE results table
sigTx_noqsv <- topTable(eBTx_noqsv,
    coef = 2,
    p.value = 1, number = nrow(rse_tx)
)

## Explore the top results
head(sigTx_noqsv)
#>                        logFC   AveExpr         t      P.Value    adj.P.Val
#> ENST00000399220.2 -0.4976648 1.8820796 -8.227630 6.687751e-16 6.201686e-11
#> ENST00000302632.3 -0.5046321 2.7424034 -7.767379 2.187319e-14 1.014172e-09
#> ENST00000409584.5  0.2110462 0.6115977  7.686040 3.980683e-14 1.230456e-09
#> ENST00000540372.5 -0.1844651 0.5463078 -7.621955 6.356374e-14 1.473598e-09
#> ENST00000550948.1 -0.3359378 1.3968139 -7.496492 1.573811e-13 2.443954e-09
#> ENST00000412814.1 -0.2589885 0.6288506 -7.495831 1.581302e-13 2.443954e-09
#>                          B
#> ENST00000399220.2 25.15571
#> ENST00000302632.3 21.85502
#> ENST00000409584.5 21.28881
#> ENST00000540372.5 20.84637
#> ENST00000550948.1 19.98955
#> ENST00000412814.1 19.98506

Next we should subset our differential expression output to just the t-statistic

## Subset the topTable() results to keep just the t-statistic
DE_noqsv <- data.frame(t = sigTx_noqsv$t, row.names = rownames(sigTx_noqsv))
DE <- data.frame(t = sigTx$t, row.names = rownames(sigTx))

## Explore this data.frame()
head(DE)
#>                           t
#> ENST00000553142.5 -5.981344
#> ENST00000552074.5 -5.372043
#> ENST00000530589.1 -4.918614
#> ENST00000510632.1  4.918168
#> ENST00000450454.6  4.814104
#> ENST00000572236.1 -4.806373

Using our DEqual function we can make a plot comparing the t-statistics from degradation and our differential expression output. In the first model below there is a 0.5 correlation between degradation t-statistics and our differential expression. This means the data is likely confounded for degradation and will lead to many false positives.

## Generate a DEqual() plot using the model results without qSVs
DEqual(DE_noqsv)

(#fig:Generate DEqual without qSVs)Result of Differential Expression without qSVA normalization.

In the plot below when we add qSVs to our model we reduce the association with degradation to -0.014, which is very close to 0.

## Generate a DEqual() plot using the model results with qSVs
DEqual(DE)

(#fig:Generate DEqual with qSVs)Result of Differential Expression with qSVA normalization.

3 Conclusion

We have shown that this method is effective for removing the effects of degradation from RNA-seq data. We found that the qsvaR is simpler to use than the previous version from 2016 that used expressed regions instead of transcripts making this software package preferable for users. I would encourage users to read how each set of degradation transcripts was selected as not all models may be appropriate for every experiment. Thank you for your interest and for using qsvaR (Stolz, Tnani, and Collado-Torres, 2024)!

4 Acknowledgements

We would like to thank:

Heena Divecha for proofreading the documentation of qsvaR
Louise A. Huuki-Myers for guidance with understanding R code and editing qsvaR
Nicholas J. Eagles for help processing RNA-seq data with SPEAQeasy
Aja Hope for proofreading the documentation of qsvaR

5 Reproducibility

The qsvaR package (Stolz, Tnani, and Collado-Torres, 2024) was made possible thanks to:

R (R Core Team, 2024)
BiocFileCache (Shepherd and Morgan, 2024)
BiocStyle (Oleś, 2024)
covr (Hester, 2023)
ggplot2 (Wickham, 2016)
knitr (Xie, 2024)
limma (Ritchie, Phipson, Wu, Hu, Law, Shi, and Smyth, 2015)
RefManageR (McLean, 2017)
rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2024)
sessioninfo (Wickham, Chang, Flight, Müller, and Hester, 2021)
testthat (Wickham, 2011)
SummarizedExperiment (Morgan, Obenchain, Hester, and Pagès, 2024)
sva (Leek, Johnson, Parker, Fertig, Jaffe, Zhang, Storey, and Torres, 2024)

This package was developed using biocthis.

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("Intro_qsvaR.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("Intro_qsvaR.Rmd", tangle = TRUE)

Date the vignette was generated.

#> [1] "2024-05-01 01:37:52 EDT"

Wallclock time spent generating the vignette.

#> Time difference of 10.551 mins

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 beta (2024-04-15 r86425)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-05-01
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version     date (UTC) lib source
#>  abind                  1.4-5       2016-07-21 [2] CRAN (R 4.4.0)
#>  annotate               1.82.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  AnnotationDbi          1.66.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  backports              1.4.1       2021-12-13 [2] CRAN (R 4.4.0)
#>  bibtex                 0.5.1       2023-01-26 [2] CRAN (R 4.4.0)
#>  Biobase              * 2.64.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  BiocFileCache        * 2.12.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  BiocGenerics         * 0.50.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  BiocManager            1.30.22     2023-08-08 [2] CRAN (R 4.4.0)
#>  BiocParallel           1.38.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  BiocStyle            * 2.32.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  Biostrings             2.72.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  bit                    4.0.5       2022-11-15 [2] CRAN (R 4.4.0)
#>  bit64                  4.0.5       2020-08-30 [2] CRAN (R 4.4.0)
#>  blob                   1.2.4       2023-03-17 [2] CRAN (R 4.4.0)
#>  bookdown               0.39        2024-04-15 [2] CRAN (R 4.4.0)
#>  bslib                  0.7.0       2024-03-29 [2] CRAN (R 4.4.0)
#>  cachem                 1.0.8       2023-05-01 [2] CRAN (R 4.4.0)
#>  cli                    3.6.2       2023-12-11 [2] CRAN (R 4.4.0)
#>  codetools              0.2-20      2024-03-31 [3] CRAN (R 4.4.0)
#>  colorspace             2.1-0       2023-01-23 [2] CRAN (R 4.4.0)
#>  crayon                 1.5.2       2022-09-29 [2] CRAN (R 4.4.0)
#>  curl                   5.2.1       2024-03-01 [2] CRAN (R 4.4.0)
#>  DBI                    1.2.2       2024-02-16 [2] CRAN (R 4.4.0)
#>  dbplyr               * 2.5.0       2024-03-19 [2] CRAN (R 4.4.0)
#>  DelayedArray           0.30.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  digest                 0.6.35      2024-03-11 [2] CRAN (R 4.4.0)
#>  dplyr                  1.1.4       2023-11-17 [2] CRAN (R 4.4.0)
#>  edgeR                  4.2.0       2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  evaluate               0.23        2023-11-01 [2] CRAN (R 4.4.0)
#>  fansi                  1.0.6       2023-12-08 [2] CRAN (R 4.4.0)
#>  farver                 2.1.1       2022-07-06 [2] CRAN (R 4.4.0)
#>  fastmap                1.1.1       2023-02-24 [2] CRAN (R 4.4.0)
#>  filelock               1.0.3       2023-12-11 [2] CRAN (R 4.4.0)
#>  genefilter             1.86.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  generics               0.1.3       2022-07-05 [2] CRAN (R 4.4.0)
#>  GenomeInfoDb         * 1.40.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  GenomeInfoDbData       1.2.12      2024-04-16 [2] Bioconductor
#>  GenomicRanges        * 1.56.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  ggplot2                3.5.1       2024-04-23 [2] CRAN (R 4.4.0)
#>  glue                   1.7.0       2024-01-09 [2] CRAN (R 4.4.0)
#>  gtable                 0.3.5       2024-04-22 [2] CRAN (R 4.4.0)
#>  highr                  0.10        2022-12-22 [2] CRAN (R 4.4.0)
#>  htmltools              0.5.8.1     2024-04-04 [2] CRAN (R 4.4.0)
#>  httr                   1.4.7       2023-08-15 [2] CRAN (R 4.4.0)
#>  IRanges              * 2.38.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  jquerylib              0.1.4       2021-04-26 [2] CRAN (R 4.4.0)
#>  jsonlite               1.8.8       2023-12-04 [2] CRAN (R 4.4.0)
#>  KEGGREST               1.44.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  knitr                  1.46        2024-04-06 [2] CRAN (R 4.4.0)
#>  labeling               0.4.3       2023-08-29 [2] CRAN (R 4.4.0)
#>  lattice                0.22-6      2024-03-20 [3] CRAN (R 4.4.0)
#>  lifecycle              1.0.4       2023-11-07 [2] CRAN (R 4.4.0)
#>  limma                * 3.60.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  locfit                 1.5-9.9     2024-03-01 [2] CRAN (R 4.4.0)
#>  lubridate              1.9.3       2023-09-27 [2] CRAN (R 4.4.0)
#>  magrittr               2.0.3       2022-03-30 [2] CRAN (R 4.4.0)
#>  Matrix                 1.7-0       2024-03-22 [3] CRAN (R 4.4.0)
#>  MatrixGenerics       * 1.16.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  matrixStats          * 1.3.0       2024-04-11 [2] CRAN (R 4.4.0)
#>  memoise                2.0.1       2021-11-26 [2] CRAN (R 4.4.0)
#>  mgcv                   1.9-1       2023-12-21 [3] CRAN (R 4.4.0)
#>  munsell                0.5.1       2024-04-01 [2] CRAN (R 4.4.0)
#>  nlme                   3.1-164     2023-11-27 [3] CRAN (R 4.4.0)
#>  pillar                 1.9.0       2023-03-22 [2] CRAN (R 4.4.0)
#>  pkgconfig              2.0.3       2019-09-22 [2] CRAN (R 4.4.0)
#>  plyr                   1.8.9       2023-10-02 [2] CRAN (R 4.4.0)
#>  png                    0.1-8       2022-11-29 [2] CRAN (R 4.4.0)
#>  purrr                  1.0.2       2023-08-10 [2] CRAN (R 4.4.0)
#>  qsvaR                * 1.8.0       2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
#>  R6                     2.5.1       2021-08-19 [2] CRAN (R 4.4.0)
#>  Rcpp                   1.0.12      2024-01-09 [2] CRAN (R 4.4.0)
#>  RefManageR           * 1.4.0       2022-09-30 [2] CRAN (R 4.4.0)
#>  rlang                  1.1.3       2024-01-10 [2] CRAN (R 4.4.0)
#>  rmarkdown              2.26        2024-03-05 [2] CRAN (R 4.4.0)
#>  RSQLite                2.3.6       2024-03-31 [2] CRAN (R 4.4.0)
#>  S4Arrays               1.4.0       2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  S4Vectors            * 0.42.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  sass                   0.4.9       2024-03-15 [2] CRAN (R 4.4.0)
#>  scales                 1.3.0       2023-11-28 [2] CRAN (R 4.4.0)
#>  sessioninfo          * 1.2.2       2021-12-06 [2] CRAN (R 4.4.0)
#>  SparseArray            1.4.0       2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  statmod                1.5.0       2023-01-06 [2] CRAN (R 4.4.0)
#>  stringi                1.8.3       2023-12-11 [2] CRAN (R 4.4.0)
#>  stringr                1.5.1       2023-11-14 [2] CRAN (R 4.4.0)
#>  SummarizedExperiment * 1.34.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  survival               3.6-4       2024-04-24 [3] CRAN (R 4.4.0)
#>  sva                    3.52.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  tibble                 3.2.1       2023-03-20 [2] CRAN (R 4.4.0)
#>  tidyselect             1.2.1       2024-03-11 [2] CRAN (R 4.4.0)
#>  timechange             0.3.0       2024-01-18 [2] CRAN (R 4.4.0)
#>  UCSC.utils             1.0.0       2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  utf8                   1.2.4       2023-10-22 [2] CRAN (R 4.4.0)
#>  vctrs                  0.6.5       2023-12-01 [2] CRAN (R 4.4.0)
#>  viridisLite            0.4.2       2023-05-02 [2] CRAN (R 4.4.0)
#>  withr                  3.0.0       2024-01-16 [2] CRAN (R 4.4.0)
#>  xfun                   0.43        2024-03-25 [2] CRAN (R 4.4.0)
#>  XML                    3.99-0.16.1 2024-01-22 [2] CRAN (R 4.4.0)
#>  xml2                   1.3.6       2023-12-04 [2] CRAN (R 4.4.0)
#>  xtable                 1.8-4       2019-04-21 [2] CRAN (R 4.4.0)
#>  XVector                0.44.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  yaml                   2.3.8       2023-12-11 [2] CRAN (R 4.4.0)
#>  zlibbioc               1.50.0      2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#> 
#>  [1] /tmp/RtmpNm7ftI/Rinst3ed670511efd0
#>  [2] /home/biocbuild/bbs-3.19-bioc/R/site-library
#>  [3] /home/biocbuild/bbs-3.19-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

6 Bibliography

This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux et al., 2024) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.26. 2024. URL: https://github.com/rstudio/rmarkdown.

[2] J. Hester. covr: Test Coverage for Packages. R package version 3.6.4. 2023. URL: https://CRAN.R-project.org/package=covr.

[3] J. T. Leek, W. E. Johnson, H. S. Parker, et al. sva: Surrogate Variable Analysis. R package version 3.52.0. 2024.

[4] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[5] M. Morgan, V. Obenchain, J. Hester, et al. SummarizedExperiment: SummarizedExperiment container. R package version 1.34.0. 2024. URL: https://bioconductor.org/packages/SummarizedExperiment.

[6] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.32.0. 2024. URL: https://github.com/Bioconductor/BiocStyle.

[7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. URL: https://www.R-project.org/.

[8] M. E. Ritchie, B. Phipson, D. Wu, et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies”. In: Nucleic Acids Research 43.7 (2015), p. e47. DOI: 10.1093/nar/gkv007.

[9] L. Shepherd and M. Morgan. BiocFileCache: Manage Files Across Sessions. R package version 2.12.0. 2024.

[10] J. M. Stolz, H. Tnani, and L. Collado-Torres. qsvaR. https://github.com/LieberInstitute/qsvaR/qsvaR - R package version 1.8.0. 2024. DOI: 10.18129/B9.bioc.qsvaR. URL: http://www.bioconductor.org/packages/qsvaR.

[11] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.

[12] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[13] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.2. 2021. URL: https://CRAN.R-project.org/package=sessioninfo.

[14] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.46. 2024. URL: https://yihui.org/knitr/.

Introduction to qsvaR

1 May 2024

Package

1 Basics

1.1 Install `qsvaR`

1.2 Required knowledge

1.3 Asking for help

1.4 Citing `qsvaR`

2 `qsvaR` Overview

2.1 Significant Transcripts

2.2 Get Degradation Matrix

2.3 Generate principal components

2.4 Calculate Number of PCs Needed

2.5 Return qSV Matrix

2.6 Differential Expression

3 Conclusion

4 Acknowledgements

5 Reproducibility

6 Bibliography

Introduction to qsvaR

1 May 2024

Package

1 Basics

1.1 Install qsvaR

1.2 Required knowledge

1.3 Asking for help

1.4 Citing qsvaR

2 qsvaR Overview

2.1 Significant Transcripts

2.2 Get Degradation Matrix

2.3 Generate principal components

2.4 Calculate Number of PCs Needed

2.5 Return qSV Matrix

2.6 Differential Expression

3 Conclusion

4 Acknowledgements

5 Reproducibility

6 Bibliography

1.1 Install `qsvaR`

1.4 Citing `qsvaR`

2 `qsvaR` Overview