systemPipeRdata 2.11.0
systemPipeRdata
provides data analysis workflow templates compatible with the
systemPipeR
software package (H Backman and Girke 2016). The latter is a Workflow Management
System (WMS) for designing and running end-to-end analysis workflows with
automated report generation for a wide range of data analysis applications.
Support for running external software is provided by a command-line interface
(CLI) that adopts the Common Workflow Language (CWL). How to use systemPipeR
is explained in its main vignette
here.
The workflow templates provided by systemPipeRdata
come equipped with sample
data and the necessary parameter files required to run a selected workflow.
This setup simplifies the learning process of using systemPipeR
, facilitates
testing of workflows, and serves as a foundation for designing new workflows.
The standardized directory structure (Figure 1) utilized by the workflow
templates and their sample data is outlined in the Directory
Structure
section of systemPipeR's
main vignette.
Figure 1: Directory structure ofsystemPipeR's
workflows. For details, see here.
The systemPipeRdata
package is available at Bioconductor and can be installed from within R as follows.
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("systemPipeRdata")
library("systemPipeRdata") # Loads the package
library(help = "systemPipeRdata") # Lists package info
vignette("systemPipeRdata") # Opens vignette
An overview table of workflow templates, included in systemPipeRdata
, can be
returned as shown below. By clicking the URLs in the last column of the below
workflow list, users can view the Rmd
source file of a workflow, as well as
the final HTML
report generated after running a workflow on the provided test
data. A list of the default data analysis steps included in each workflow is
given here. Additional workflow templates are available
on this project’s GitHub organization (for details, see
below). To create an empty workflow template without
any test data included, users want to choose the new
template, which includes
only the required directory structure and parameter files.
availableWF()
Name | Description | URL |
---|---|---|
new | Generic Workflow Template | Rmd, HTML |
rnaseq | RNA-Seq Workflow Template | Rmd, HTML |
riboseq | RIBO-Seq Workflow Template | Rmd, HTML |
chipseq | ChIP-Seq Workflow Template | Rmd, HTML |
varseq | VAR-Seq Workflow Template | Rmd, HTML |
SPblast | BLAST Workflow Template | Rmd, HTML |
SPcheminfo | Cheminformatics Drug Similarity Template | Rmd, HTML |
SPscrna | Basic Single-Cell Workflow Template | Rmd, HTML |
Table 1: Workflow templates
The chosen example below uses the genWorkenvir
function from the
systemPipeRdata
package to create an RNA-Seq workflow environment (selected
under workflow="rnaseq"
) that is fully populated with a small test data set,
including FASTQ files, reference genome and annotation data. The name of the
resulting workflow directory can be specified under the mydirname
argument.
The default NULL
uses the name of the chosen workflow. An error is issued if
a directory of the same name and path exists already. After this, the user’s R
session needs to be directed into the resulting rnaseq
directory (here with
setwd
). The other workflow templates from the above
table can be loaded the same way.
library(systemPipeRdata)
genWorkenvir(workflow = "rnaseq")
setwd("rnaseq")
On Linux and OS X systems the same can be achieved from the command-line of a terminal with the following commands.
$ Rscript -e "systemPipeRdata::genWorkenvir(workflow='rnaseq', mydirname='rnaseq')"
$ cd rnaseq
For running and working with systemPipeR
workflows, users want to visit
systemPipeR’s main
vignette.
The following gives only a very brief preview on how to run workflows, and create scientific
and technical reports.
After a workflow environment (directory) has been created and the corresponding
R session directed into the resulting directory (here rnaseq
), the workflow
can be loaded from the included R Markdown file (Rmd
, here systemPipeRNAseq.Rmd
).
This template provides common data analysis steps that are typical for RNA-Seq
workflows. Users have the options to add, remove, modify workflow steps by
applying these changes to the sal
workflow management container directly, or
updating the Rmd
file first and then updating sal
accordingly.
library(systemPipeR)
sal <- SPRproject()
sal <- importWF(sal, file_path = "systemPipeRNAseq.Rmd", verbose = FALSE)
The default analysis steps of the imported RNA-Seq workflow are listed below. Users can modify the existing steps, add new ones or remove steps as needed.
Default analysis steps in RNA-Seq Workflow
HISAT2
(or any other RNA-Seq aligner)Once the workflow has been loaded into sal
, it can be executed from start to
finish (or partially) with the runWF
command. However, running the workflow
will only be possible if all dependent CL software is installed on a user’s
system. Their names and availability on a system can be listed with
listCmdTools(sal, check_path=TRUE)
.
sal <- runWF(sal)
Workflows can be visualized as topology graphs using the plotWF
function.
plotWF(sal)