Example S1: Installation and data examples

The stable version of this package is available on Bioconductor. You can install it by running the following:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("vidger")

The latest developmental version of ViDGER can be installed via GitHub using the devtools package:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("btmonier/vidger", ref = "devel")

Once installed, you will have access to the following functions:

  • vsBoxplot()
  • vsScatterPlot()
  • vsScatterMatrix()
  • vsDEGMatrix()
  • vsMAPlot()
  • vsMAMatrix()
  • vsVolcano()
  • vsVolcanoMatrix()
  • vsFourWay()

Further explanation will be given to how these functions work later on in the documentation. For the following examples, three toy data sets will be used: df.cuff, df.deseq, and df.edger. Each of these data sets reflect the three RNA-seq analyses this package covers. These can be loaded in the R workspace by using the following command:

data(<data_set>)

Where <data_set> is one of the previously mentioned data sets. Some of the recurring elements that are found in each of these functions are the type and d.factor arguments. The type argument tells the function how to process the data for each analytical type (i.e. "cuffdiff", "deseq", or "edger"). The d.factor argument is used specifically for DESeq2 objects which we will discuss in the DESeq2 section. All other arguments are discussed in further detail by looking at the respective help file for each functions (i.e. ?vsScatterPlot).

An overview of the data used

As mentioned earlier, three toy data sets are included with this package. In addition to these data sets, 5 “real-world” data sets were also used. All real-world data used is currently unpublished from ongoing collaborations. Summaries of this data can be found in the following tables:

Table 1: An overview of the toy data sets included in this package. In this table, each data set is summarized in terms of what analytical software was used, organism ID, experimental layout (replicates and treatments), number of transcripts (IDs), and size of the data object in terms of megabytes (MB).

Data Software Organism Reps Treat. IDs Size (MB)
df.cuff CuffDiff H 2 3 1200 0.2
sapiens
df.deseq DESeq2 D. 2 3 29391 2.3
melanogaster
df.deseq edgeR A. 2 3 724 0.1
thaliana

Table 2: “Real-world” (RW) data set statistics. To test the reliability of our package, real data was used from human collections and several plant samples. Each data set is summarized in terms of organism ID, number of experimental samples (n), experimental conditions, and number of transcripts ( IDs).

Data Organism n Exp. Conditions IDs
RW-1 H. 10 Two treatment dosages taken at two 198002
sapiens time points and one control sample
taken at one time point
RW-2 M. 24 Two phenotypes taken at four time 63517
domestia points (three replicates each)
RW-3 V. 6 Two conditions (three replicates 59262
ripria: each).
bud
RW-4 V. 6 Two conditions (three replicates 17962
ripria: each).
shoot-tip
(7 days)
RW-5 V. 6 Two conditions (three replicates 19064
ripria: each).
shoot-tip
(21 days)

Example S2: Creating box plots

Box plots are a useful way to determine the distribution of data. In this case we can determine the distribution of FPKM or CPM values by using the vsBoxPlot() function. This function allows you to extract necessary results-based data from analytical objects to create a box plot comparing \(log_{10}\) (FPKM or CPM) distributions for experimental treatments.

With Cuffdiff

vsBoxPlot(
    data = df.cuff, d.factor = NULL, type = 'cuffdiff', title = TRUE, 
    legend = TRUE, grid = TRUE
)
A box plot example using the `vsBoxPlot()` function with 
`cuffdiff` data. In this example, FPKM distributions for each treatment within 
an experiment are shown in the form of a box and whisker plot.

Figure 1: A box plot example using the vsBoxPlot() function with
cuffdiff data. In this example, FPKM distributions for each treatment within an experiment are shown in the form of a box and whisker plot.

With DESeq2

vsBoxPlot(
    data = df.deseq, d.factor = 'condition', type = 'deseq', 
    title = TRUE, legend = TRUE, grid = TRUE
)
A box plot example using the `vsBoxPlot()` function with 
`DESeq2` data. In this example, FPKM distributions for each treatment within 
an experiment are shown in the form of a box and whisker plot.

Figure 2: A box plot example using the vsBoxPlot() function with
DESeq2 data. In this example, FPKM distributions for each treatment within an experiment are shown in the form of a box and whisker plot.

With edgeR

vsBoxPlot(
    data = df.edger, d.factor = NULL, type = 'edger', 
    title = TRUE, legend = TRUE, grid = TRUE
)
A box plot example using the `vsBoxPlot()` function with `edgeR` 
data. In this example, CPM distributions for each treatment within an 
experiment are shown in the form of a box and whisker plot

Figure 3: A box plot example using the vsBoxPlot() function with edgeR
data. In this example, CPM distributions for each treatment within an experiment are shown in the form of a box and whisker plot

Aesthetic variants to box plots

vsBoxPlot() can allow for different iterations to showcase data distribution. These changes can be implemented using the aes parameter. Currently, there are 6 different variants:

  • box: standard box plot
  • violin: violin plot
  • boxdot: box plot with dot plot overlay
  • viodot: violin plot with dot plot overlay
  • viosumm: violin plot with summary stats overlay
  • notch: box plot with notch

box variant

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "box"
)
A box plot example using the `aes` parameter: `box`.

Figure 4: A box plot example using the aes parameter: box

violin variant

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "violin"
)
A box plot example using the `aes` parameter: `violin`.

Figure 5: A box plot example using the aes parameter: violin

boxdot variant

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "boxdot"
)
A box plot example using the `aes` parameter: `boxdot`.

Figure 6: A box plot example using the aes parameter: boxdot

viodot variant

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "viodot"
)
A box plot example using the `aes` parameter: `viodot`.

Figure 7: A box plot example using the aes parameter: viodot

viosumm variant

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "viosumm"
)
A box plot example using the `aes` parameter: `viosumm`.

Figure 8: A box plot example using the aes parameter: viosumm

notch variant

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "notch"
)
A box plot example using the `aes` parameter: `notch`.

Figure 9: A box plot example using the aes parameter: notch

Color palette variants to box plots

In addition to aesthetic changes, the fill color of each variant can also be changed. This can be implemented by modifiying the fill.color parameter.

The palettes that can be used for this parameter are based off of the palettes found in the RColorBrewer package. A visual list of all the palettes can be found here.

Color variant example 1

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "box", fill.color = "RdGy"
)
Color variant 1. A box plot example using the `fill.color` 
parameter: `RdGy`.

Figure 10: Color variant 1
A box plot example using the fill.color parameter: RdGy.

Color variant example 2

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "viosumm", fill.color = "Paired"
)
Color variant 2. A violin plot example using the `fill.color` 
parameter: `Paired` with the `aes` parameter: `viosumm`.

Figure 11: Color variant 2
A violin plot example using the fill.color parameter: Paired with the aes parameter: viosumm.

Color variant example 3

data("df.edger")
vsBoxPlot(
   data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
   legend = TRUE, grid = TRUE, aes = "notch", fill.color = "Greys"
)
Color variant 3. A notched box plot example using the `fill.color` 
parameter: `Greys` with the `aes` parameter: `notch`. Using these parameters,
we can also generate grey-scale plots.

Figure 12: Color variant 3
A notched box plot example using the fill.color parameter: Greys with the aes parameter: notch. Using these parameters, we can also generate grey-scale plots.

Example S3: Creating scatter plots

This example will look at a basic scatter plot function, vsScatterPlot(). This function allows you to visualize comparisons of \(log_{10}\) values of either FPKM or CPM measurements of two treatments depending on analytical type.

With Cuffdiff

vsScatterPlot(
    x = 'hESC', y = 'iPS', data = df.cuff, type = 'cuffdiff',
    d.factor = NULL, title = TRUE, grid = TRUE
)
A scatterplot example using the `vsScatterPlot()` function with 
`Cuffdiff` data. In this visualization, $log_{10}$ comparisons are made of 
fragments per kilobase of transcript per million mapped reads (FPKM) 
measurments. The dashed line represents regression line for the comparison.

Figure 13: A scatterplot example using the vsScatterPlot() function with
Cuffdiff data. In this visualization, \(log_{10}\) comparisons are made of fragments per kilobase of transcript per million mapped reads (FPKM) measurments. The dashed line represents regression line for the comparison.

With DESeq2

vsScatterPlot(
    x = 'treated_paired.end', y = 'untreated_paired.end', 
    data = df.deseq, type = 'deseq', d.factor = 'condition', 
    title = TRUE, grid = TRUE
)
A scatterplot example using the `vsScatterPlot()` function with 
`DESeq2` data. In this visualization, $log_{10}$ comparisons are made of 
fragments per kilobase of transcript per million mapped reads (FPKM) 
measurments. The dashed line represents regression line for the comparison.

Figure 14: A scatterplot example using the vsScatterPlot() function with
DESeq2 data. In this visualization, \(log_{10}\) comparisons are made of fragments per kilobase of transcript per million mapped reads (FPKM) measurments. The dashed line represents regression line for the comparison.

With edgeR

vsScatterPlot(
    x = 'WM', y = 'MM', data = df.edger, type = 'edger',
    d.factor = NULL, title = TRUE, grid = TRUE
)
A scatterplot example using the `vsScatterPlot()` function with 
`edgeR` data. In this visualization, $log_{10}$ comparisons are made of 
fragments per kilobase of transcript per million mapped reads (FPKM) 
measurments. The dashed line represents regression line for the comparison.

Figure 15: A scatterplot example using the vsScatterPlot() function with
edgeR data. In this visualization, \(log_{10}\) comparisons are made of fragments per kilobase of transcript per million mapped reads (FPKM) measurments. The dashed line represents regression line for the comparison.

Example S4: Creating scatter plot matrices

This example will look at an extension of the vsScatterPlot() function which is vsScatterMatrix(). This function will create a matrix of all possible comparisons of treatments within an experiment with additional info.

With Cuffdiff

vsScatterMatrix(
    data = df.cuff, d.factor = NULL, type = 'cuffdiff', 
    comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
)
A scatterplot matrix example using the `vsScatterMatrix()` 
function with `Cuffdiff` data. Similar to the scatterplot function, this 
visualization allows for all comparisons to be made within an experiment. In 
addition to the scatterplot visuals, FPKM distributions (histograms) and 
correlation (Corr) values are generated.

Figure 16: A scatterplot matrix example using the vsScatterMatrix()
function with Cuffdiff data. Similar to the scatterplot function, this visualization allows for all comparisons to be made within an experiment. In addition to the scatterplot visuals, FPKM distributions (histograms) and correlation (Corr) values are generated.

With DESeq2

vsScatterMatrix(
    data = df.deseq, d.factor = 'condition', type = 'deseq',
    comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
)
A scatterplot matrix example using the `vsScatterMatrix()` 
function with `DESeq2` data. Similar to the scatterplot function, this 
visualization allows for all comparisons to be made within an experiment. In 
addition to the scatterplot visuals, FPKM distributions (histograms) and 
correlation (Corr) values are generated.

Figure 17: A scatterplot matrix example using the vsScatterMatrix()
function with DESeq2 data. Similar to the scatterplot function, this visualization allows for all comparisons to be made within an experiment. In addition to the scatterplot visuals, FPKM distributions (histograms) and correlation (Corr) values are generated.

With edgeR

vsScatterMatrix(
    data = df.edger, d.factor = NULL, type = 'edger', comp = NULL,
    title = TRUE, grid = TRUE, man.title = NULL
)
A scatterplot matrix example using the `vsScatterMatrix()` 
function with `edgeR` data. Similar to the scatterplot function, this 
visualization allows for all comparisons to be made within an experiment. In 
addition to the scatterplot visuals, FPKM distributions (histograms) and 
correlation (Corr) values are generated.

Figure 18: A scatterplot matrix example using the vsScatterMatrix()
function with edgeR data. Similar to the scatterplot function, this visualization allows for all comparisons to be made within an experiment. In addition to the scatterplot visuals, FPKM distributions (histograms) and correlation (Corr) values are generated.

Example S5: Creating differential gene expression matrices

Using the vsDEGMatrix() function allows the user to visualize the number of differentially expressed genes (DEGs) at a given adjusted p-value (padj = ) for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

With Cuffdiff

vsDEGMatrix(
    data = df.cuff, padj = 0.05, d.factor = NULL, type = 'cuffdiff', 
    title = TRUE, legend = TRUE, grid = TRUE
)
A matrix of differentially expressed genes (DEGs) at a given 
*p*-value using the `vsDEGMatrix()` function with `Cuffdiff` data. With this 
function, the user is able to visualize the number of DEGs ata given adjusted 
*p*-value for each experimental treatment level. Higher color intensity 
correlates to a higher number of DEGs.

Figure 19: A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix() function with Cuffdiff data. With this function, the user is able to visualize the number of DEGs ata given adjusted p-value for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

With DESeq2

vsDEGMatrix(
    data = df.deseq, padj = 0.05, d.factor = 'condition', 
    type = 'deseq', title = TRUE, legend = TRUE, grid = TRUE
)
A matrix of differentially expressed genes (DEGs) at a given 
*p*-value using the `vsDEGMatrix()` function with `DESeq2` data. With this 
function, the user is able to visualize the number of DEGs ata given adjusted 
*p*-value for each experimental treatment level. Higher color intensity 
correlates to a higher number of DEGs.

Figure 20: A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix() function with DESeq2 data. With this function, the user is able to visualize the number of DEGs ata given adjusted p-value for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

With edgeR

vsDEGMatrix(
    data = df.edger, padj = 0.05, d.factor = NULL, type = 'edger', 
    title = TRUE, legend = TRUE, grid = TRUE
)
A matrix of differentially expressed genes (DEGs) at a given 
*p*-value using the `vsDEGMatrix()` function with `edgeR` data. With this 
function, the user is able to visualize the number of DEGs ata given adjusted 
*p*-value for each experimental treatment level. Higher color intensity 
correlates to a higher number of DEGs.

Figure 21: A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix() function with edgeR data. With this function, the user is able to visualize the number of DEGs ata given adjusted p-value for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

Grey-scale DEG matrices

A grey-scale option is available for this function if you wish to use a grey-to-white gradient instead of the classic blue-to-white gradient. This can be invoked by setting the grey.scale parameter to TRUE.

vsDEGMatrix(data = df.deseq, d.factor = "condition", type = "deseq",
    grey.scale = TRUE
)

Example S6: Creating MA plots

vsMAPlot() visualizes the variance between two samples in terms of gene expression values where logarithmic fold changes of count data are plotted against mean counts. For more information on how each of the aesthetics are plotted, please refer to the figure captions and Method S1.

With Cuffdiff

vsMAPlot(
    x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL, 
    type = 'cuffdiff', padj = 0.05, y.lim = NULL, lfc = NULL, 
    title = TRUE, legend = TRUE, grid = TRUE
)
MA plot visualization using the `vsMAPLot()` function with 
`Cuffdiff` data. LFCs are plotted mean counts to determine the variance 
between two treatments in terms of gene expression. Blue nodes on the graph 
represent statistically significant LFCs which are greater than a given value 
than a user-defined LFC parameter. Green nodes indicate statistically 
significant LFCs which are less than the user-defined LFC parameter. Gray 
nodes are data points that are not statistically significant. Numerical values 
in parantheses for each legend color indicate the number of transcripts that 
meet the prior conditions. Triangular shapes represent values that exceed the 
viewing area of the graph. Node size changes represent the magnitude of the 
LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines 
indicate user-defined LFC values.

Figure 22: MA plot visualization using the vsMAPLot() function with
Cuffdiff data. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

With DESeq2

vsMAPlot(
    x = 'treated_paired.end', y = 'untreated_paired.end', 
    data = df.deseq, d.factor = 'condition', type = 'deseq', 
    padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE, 
    legend = TRUE, grid = TRUE
)
MA plot visualization using the `vsMAPLot()` function with 
`DESeq2` data. LFCs are plotted mean counts to determine the variance between 
two treatments in terms of gene expression. Blue nodes on the graph represent 
statistically significant LFCs which are greater than a given value than a 
user-defined LFC parameter. Green nodes indicate statistically significant
LFCs which are less than the user-defined LFC parameter. Gray nodes are data 
points that are not statistically significant. Numerical values in parantheses 
for each legend color indicate the number of transcripts that meet the prior 
conditions. Triangular shapes represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Dashed lines indicate user-defined 
LFC values.

Figure 23: MA plot visualization using the vsMAPLot() function with
DESeq2 data. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

With edgeR

vsMAPlot(
    x = 'WW', y = 'MM', data = df.edger, d.factor = NULL, 
    type = 'edger', padj = 0.05, y.lim = NULL, lfc = NULL, 
    title = TRUE, legend = TRUE, grid = TRUE
)
MA plot visualization using the `vsMAPLot()` function with 
`edgeR` data. LFCs are plotted mean counts to determine the variance between 
two treatments in terms of gene expression. Blue nodes on the graph represent 
statistically significant LFCs which are greater than a given value than a 
user-defined LFC parameter. Green nodes indicate statistically significant 
LFCs which are less than the user-defined LFC parameter. Gray nodes are data 
points that are not statistically significant. Numerical values in parantheses 
for each legend color indicate the number of transcripts that meet the prior 
conditions. Triangular shapes represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Dashed lines indicate user-defined 
LFC values.

Figure 24: MA plot visualization using the vsMAPLot() function with
edgeR data. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

Example S7: Creating MA plot matrices

Similar to a scatter plot matrix, vsMAMatrix() will produce visualizations for all comparisons within your data set. For more information on how the aesthetics are plotted in these visualizations, please refer to the figure caption and Method S1.

With Cuffdiff

 vsMAMatrix(
    data = df.cuff, d.factor = NULL, type = 'cuffdiff', 
    padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE, 
    grid = TRUE, counts = TRUE, data.return = FALSE
)
A MA plot matrix using the `vsMAMatrix()` function with `Cuffdiff` 
data. Similar to the `vsMAPlot()` function, `vsMAMatrix()` will generate a 
matrix of MA plots for all comparisons within an experiment. LFCs are plotted 
mean counts to determine the variance between two treatments in terms of gene 
expression. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. 
Triangular shapes represent values that exceed the viewing area of the graph. 
Node size changes represent the magnitude of the LFC values (i.e. larger 
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC 
values.

Figure 25: A MA plot matrix using the vsMAMatrix() function with Cuffdiff
data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a matrix of MA plots for all comparisons within an experiment. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

With DESeq2

vsMAMatrix(
    data = df.deseq, d.factor = 'condition', type = 'deseq', 
    padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE, 
    grid = TRUE, counts = TRUE, data.return = FALSE
)
A MA plot matrix using the `vsMAMatrix()` function with `DESeq2` 
data. Similar to the `vsMAPlot()` function, `vsMAMatrix()` will generate a 
matrix of MA plots for all comparisons within an experiment. LFCs are plotted 
mean counts to determine the variance between two treatments in terms of gene 
expression. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. 
Triangular shapes represent values that exceed the viewing area of the graph. 
Node size changes represent the magnitude of the LFC values (i.e. larger 
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC 
values.

Figure 26: A MA plot matrix using the vsMAMatrix() function with DESeq2
data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a matrix of MA plots for all comparisons within an experiment. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

With edgeR

vsMAMatrix(
    data = df.edger, d.factor = NULL, type = 'edger', 
    padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE, 
    grid = TRUE, counts = TRUE, data.return = FALSE
)
A MA plot matrix using the `vsMAMatrix()` function with `edgeR` 
data. Similar to the `vsMAPlot()` function, `vsMAMatrix()` will generate a 
matrix of MA plots for all comparisons within an experiment. LFCs are plotted 
mean counts to determine the variance between two treatments in terms of gene 
expression. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. 
Triangular shapes represent values that exceed the viewing area of the graph. 
Node size changes represent the magnitude of the LFC values (i.e. larger 
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC 
values.

Figure 27: A MA plot matrix using the vsMAMatrix() function with edgeR
data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a matrix of MA plots for all comparisons within an experiment. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

Example S8: Creating volcano plots

The next few visualizations will focus on ways to display differential gene expression between two or more treatments. Volcano plots visualize the variance between two samples in terms of gene expression values where the \(-log_{10}\) of calculated p-values (y-axis) are a plotted against the \(log_2\) changes (x-axis). These plots can be visualized with the vsVolcano() function. For more information on how each of the aesthetics are plotted, please refer to the figure captions and Method S1.

With Cuffdiff

vsVolcano(
    x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL, 
    type = 'cuffdiff', padj = 0.05, x.lim = NULL, lfc = NULL, 
    title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
)
A volcano plot example using the `vsVolcano()` function with 
`Cuffdiff` data. In this visualization, comparisons are made between the 
$-log_{10}$ *p*-value versus the $log_2$ fold change (LFC) between two 
treatments. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. Left 
and right brackets (< and >) represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Vertical and horizontal lines 
indicate user-defined LFC and adjusted *p*-values, respectively.

Figure 28: A volcano plot example using the vsVolcano() function with
Cuffdiff data. In this visualization, comparisons are made between the \(-log_{10}\) p-value versus the \(log_2\) fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted p-values, respectively.

With DESeq2

vsVolcano(
    x = 'treated_paired.end', y = 'untreated_paired.end', 
    data = df.deseq, d.factor = 'condition', type = 'deseq', 
    padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE, 
    legend = TRUE, grid = TRUE, data.return = FALSE
)
A volcano plot example using the `vsVolcano()` function with 
`DESeq2` data. In this visualization, comparisons are made between the 
$-log_{10}$ *p*-value versus the $log_2$ fold change (LFC) between two 
treatments. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. Left 
and right brackets (< and >) represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Vertical and horizontal lines 
indicate user-defined LFC and adjusted *p*-values, respectively.

Figure 29: A volcano plot example using the vsVolcano() function with
DESeq2 data. In this visualization, comparisons are made between the \(-log_{10}\) p-value versus the \(log_2\) fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted p-values, respectively.

With edgeR

vsVolcano(
    x = 'WW', y = 'MM', data = df.edger, d.factor = NULL, 
    type = 'edger', padj = 0.05, x.lim = NULL, lfc = NULL, 
    title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
)