1 About the template

This is a cheminformatics workflow template of the systemPipeRdata package, a companion package to systemPipeR (H Backman and Girke 2016). Like other workflow templates, it can be loaded with a single command. Users have the flexibility to utilize the template as is or modify it as needed. More in-depth information on designing workflows can be found in the main vignette of systemPipeRdata. This template serves as a starting point for conducting structure similarity searching and clustering of small molecules. Most of its steps use functions of the ChemmineR package from Bioconductor. There are no command-line (CL) software tools required for running this workflow in its current form as all steps are based on R functions.

The Rmd file (SPcheminfo.Rmd) associated with this vignette serves a dual purpose. It acts both as a template for executing the workflow and as a template for generating a reproducible scientific analysis report. Thus, users want to customize the text (and/or code) of this vignette to describe their experimental design and analysis results. This typically involves deleting the instructions how to work with this workflow, and customizing the text describing experimental designs, other metadata and analysis results.

The following data analysis routines are included in this workflow template:

  • Import of small molecules from a structure definition file (SDF)
  • Plotting of small molecule structures
  • Computation of atom pairs and finger prints for structure searching
  • All-against-all structure comparisons of small molecules
  • Heatmap plot of resulting distance matrix

2 Workflow environment

The environment of the chosen workflow is generated with the genWorenvir function. After this, the user’s R session needs to be directed into the resulting directory (here SPcheminfo).

systemPipeRdata::genWorkenvir(workflow = "SPcheminfo", mydirname = "SPcheminfo")
setwd("SPcheminfo")

The SPRproject function initializes a new workflow project instance. This function call creates an empty SAL workflow container and at the same time a linked project log directory (default name .SPRproject) that acts as a flat-file database of a workflow. For additional details, please visit this section in systemPipeR's main vignette.

library(systemPipeR)
sal <- SPRproject()
sal

The importWF function allows to import all the workflow steps outlined in the source Rmd file of this vignette into a SAL (SYSargsList) workflow container. Once imported, the entire workflow can be executed from start to finish using the runWF function. More details regarding this process are provided in the following section here.

sal <- importWF(sal, "SPcheminfo.Rmd")
sal <- runWF(sal)

2.1 Step 1: Load packages

The first step loads the systemPipeR and ChemmineR packages.

appendStep(sal) <- LineWise(code = {
    library(systemPipeR)
    library(ChemmineR)
}, step_name = "load_packages")

2.2 Step 2: Import molecule structures

This step imports 100 small molecule structures from an SDF file with the read.SDFset function. The structures are stored in an SDFset object, a class defined by the ChemmineR package.

appendStep(sal) <- LineWise(code = {
    sdfset <- read.SDFset("https://cluster.hpcc.ucr.edu/~tgirke/Documents/R_BioCond/Samples/sdfsample.sdf")
}, step_name = "load_data", dependency = "load_packages")

2.3 Step 3: Visualize molecule structures

The structures of selected molecules (here first four) are be visualized with the plot function.

appendStep(sal) <- LineWise(code = {
    png("results/mols_plot.png", 700, 600)
    # Here only first 4 are plotted. Please choose the ones
    # you want to plot.
    ChemmineR::plot(sdfset[1:4])
    dev.off()
}, step_name = "vis_mol", dependency = "load_data", run_step = "optional")

2.4 Step 4: Physicochemical properties

Basic physicochemical properties are computed for the small molecules stored in sdfset. For this example atom frequencies, molecular weight and formula are computed. For more options users want to consult the vignette of the ChemmineR package.

appendStep(sal) <- LineWise(code = {
    propma <- data.frame(MF = MF(sdfset), MW = MW(sdfset), atomcountMA(sdfset))
    readr::write_csv(propma, "results/basic_mol_info.csv")
}, step_name = "basic_mol_info", dependency = "load_data", run_step = "optional")

2.5 Step 5: Box plots of properties

In this example, the extracted property data is visualized using a box plot.

appendStep(sal) <- LineWise(code = {
    png("results/atom_req.png", 700, 700)
    boxplot(propma[, 3:ncol(propma)], col = "#6cabfa", main = "Atom Frequency")
    dev.off()
}, step_name = "mol_info_plot", dependency = "basic_mol_info",
    run_step = "optional")