Welcome to SeqPlots

SeqPlots - An interactive tool for visualizing NGS signals and sequence motif densities along genomic features using average plots and heatmaps.

Examples of Seq Plots interfce and outputs

Summary

SeqPlots is a web browser tool for plotting average track signals (e.g. read coverage) and sequence motif densities over user specified genomic features. The data can be visualized in linear plots with error estimates or as series of heatmaps that can be sorted and clustered. The software can be run locally on a desktop or deployed on a server and allows easy data sharing. SeqPlots pre-calculates and stores binary result matrices, allowing rapid plot generation. Plots can also be run in batch.

Key features

Adding and managing files

Supported file formats

Tracks:

Features:

Files must be formatted according to UCSC guidelines. All widely used chromosome names conventions are accepted, e.g. for human files either 'chr1' or '1' can be used, however these conventions should not be mixed within single files.

Adding files

Pressing the Add files button brings up the file upload panel.

File upload panel

You can drag and drop files here or press the Add files... button to opens a file selection menu. Before starting the upload the following mandatory information must be provided about each file:

Comments are optional.

The contents of the a text field can be copied to all files by clicking the icon at the left of the field. The default values can be set using Set defaults... button. Default values are stored using the browser cookies, and the settings will be remembered across different sessions as long as the same web browser is used. File extensions that are not supported will raise an error.

File upload panel with 4 files selected

Individual files can be uploaded by pressing 'start' next to the file name or all files can be uploaded at once by pressing the Start upload button at the top of file upload panel.

During the upload process a progress bar is displayed. After upload SeqPlots gives a message that upload was successful or or gives an error message. Common errors are misformatted file formats or chromosome names do not matched the reference genome. For more information please refer to errors documantation

A feedback on successfully upload files

To dismiss the upload window, click on X or outside the window.

Downloading and removing files

Clicking the New plot set button brings up the file collection window. The primary function of this window is to choose signal tracks and feature files to use for calculating the plots. However, it also provides basic file management capabilities. Information on files can be reviewed and files can be downloaded or deleted. Fields can be searched, filtered and sorted by any column. The red x button on the right site of file table removes a single file from the collection, while Remove selected files button will erase all selected files.

The file collection window

Running the plot-set jobs

The file collection modal allows choosing signal tracks and feature files from the collection to calculate average plots and heat maps. Press New plot set button to bring it up. If you wish tu upload more files please refer to adding new files documantation.This window have three tabs:

The file collection modal

Selecting files

Both Tracks and Features tabs allow to review all the information about files, filter them and sort by any column. The “Search:” dialog allows to quickly filter the files by any field, while dropdowns below the file grid allow for more advanced filtering on specific columns.

Files are selected by clicking on file name, or any other part of the row beside Show comment and Download or Remove buttons. Chosen files are highlighted in light blue. Clicking the file name again will cancel the selection. At least one signal track or motif and one feature file must be selected before starting the calculation.

Setting up plot options

The set of options controlling the plot settings is available below the file grid/motif option:

  1. Bin track @ [bp]: - this numeric input determines the resolution of data acquisition; the default value 10 means that 10bp intervals within the plotting range will be summarized by calculating the mean. Higher values increases the speed of calculation and produces smoother plots. See the explanations.
  2. Choose the plot type - this radio box determines the mode of plots

  3. *Ignore strand** - the directionality (strand) will be ignored, that its +, - and ranges will be centered on start and plotted in the same direction

  4. Ignore zeros - the signal values equal to 0 in the track will be ignored, that is will be excluded from mean and errors calculation

  5. Calculate heatmap - this checkbox determines if heat map matrix should be saved; uncheck it will speed up calculation calculation, but only average plots will be feasible in this plot set.

  6. Plotting distances in [bp] - the distances in to be plotted:

Plotting sequence motif density

Sequence features tab allows to calculate and plot the motif density around genomic features using the reference sequence package. Motif plots can be mixed with track files' signal plots. The following options can be set here:

  1. DNA motif - the DNA motif
  2. Sliding window size in base pairs [bp] - the size of the sliding window for motif calculation. The value (number of matching motifs within the window) is reported in the middle of the window, e.g. if window is set to 200bp, DNA motif is “GC” and there are 8 CpGs in first 200 bp of the chromosome the value 8 will be reported at 100th bp.
  3. **Display name** - The name of the motif that will be shown in key and heatmap labels. Leave blank to use DNA motif value.
  4. Plot heatmap or error estimates - this checkbox determines if heatmap matrix and error estimates should be calculated. If unchecked much faster algorithm will be used for motif density calculation, but only the average plot without the error estimates will be available.
  5. Match reverse complement as well - determined if reverse complement motif should be reported as well. For example the TATA motif will report both TATA and ATAT with this option selected.

Sequence motifs selection tab

Clicking Add button adds the motif to plot set, while Reset All clears the motif selection. On the right side from motif setting panel is the list summary of included motifs.

Starting the plot set calculation

The option are executed by pressing Run calculation button. This dismisses the file collection modal and brings up the calculation dialog, which shows the progress. On Linux and Mac OS X (systems supporting fork based parallelization) the calculation can be stopped using the Cancel button - this will bring back all settings in file collection modal.

The calculation progress dialog

After the successful execution the plot array will appear. In case of error the informative error pop-up will explain the problem. Please reffer to error section for further information.

The plot array

Plotting

This section focuses on average (line) plots and options common between these and heatmaps. For heatmap options please refer to heatmps documentation.

Previewing plot

After calculating or loading plot set select the pairs of features and tracks/motifs using plot array checkboxes. Clicking on the column name (tracks/motifs) toggles the whole column selection, Similarly clicking on row name (features) toggles the whole row selection. Clicking on top-left most cell of plot array toggles the selection of whole array.

Plot preview plus `Line plot`, `Heatmap` and `refresh`
buttons

If at least one pair on plot array is selected pressing Line plot button produces average plot preview and Heatmap button the heatmap preview. Finally, pressing refresh button or [RETURN] key from keyboard applies the new selection and options. These operations are done automatically in reactive mode.

The tab selection area - icons represnts seven
panels

Below the plotting buttons are seven panels. On application start the first panel responsible for bringing file upload, management and plot set calculation modals is active. The further three panels hold common plot settings.

Titles and axis panel

This panel groups settings influencing the plot main title, axis labels, various font sizes plus vertical and horizontal plot limits.

The view on titles and axis panel

Guide lines and data scaling

Controls in this panel controls the display of guide lines and error estimates, and allows to log scale the signal prior to plotting.

The view on guide lines and data scaling

Keys, labels and colors panel

This panel groups two types of controls. Colors, Label and Priority/Order are a checkboxes revealing further controls on plot set grid, specific for a feature-track pair or sub-heatmap. Show plot key, Show error estimate key and Legend font size re global controls specific for average plots. Inputs on plot set grid do not have specific labels, but the tooltip explaining their meaning is shown on mouse cursor hover.

The view on keys, labels and colors panel (left). Color picker,
label text input and Priority/Order checkboxes reviled on plot set
grid (right).

Plotting and adjusting heatmaps

Heatmaps can be more informative comparing to average plots. If the variability in signal along given genomic feature comes from different biological classes the average plot might not be sufficient for proper examination of the signal or even misleading. SeqPlots implements heatmap plotting in similar way to Galaxy, plotting track-feature pairs as sub-heatmaps horizontally aligned on single figure. All sub-heatmaps must have the same number of data rows, hence in single plot mode simultaneous plotting is possible only on single feature or features containing exact same number of ranges. The heatmaps can be sorted and clustered by k-means, hierarchical clustering and super self organising maps (SupreSOM).

An example of heatmap plot

Heatmap setup tab

This tab groups heatmap specific options, that allows to manipulate various data processing and graphical options.

The view on Heatmap setup tab (left). Color picker, Label text
input, Priority/Order checkboxes, Choose individual heatmaps for
sorting/clustering control and Set individual color key limits numric
inputs reviled on plot set grid (right).

Other options controlling heatmap appearance

Many options from other tabs influence heatmap output. Here we provide the list of these inputs, please refer to “Viewing and manipulating plots” for further reference.

Output files and batch operations

Plots can be downladed as portable document files (PDFs) by clicking Line plot or Heatmap buttons in “Download:” section of tool panel (above the plot preview).

Download:" section of tool panel with `Line plot` and `Heatmap`buttons

Small buttons next to Line plot and Heatmap produce additional output files:

The cluster report contains following columns:

Sample report:

chromosome  start   end     width   strand  metadata_group  originalOrder   ClusterID   SortingOrder    FinalOrder
chrI        9065087 9070286 5200    +       g1              1               1           3               3
chrI        5171285 5175522 4238    -       g1              2               3           50              43
chrI        9616508 9618109 1602    -       g1              3               3           13              43
chrI        3608395 3611844 3450    +       g1              4               3           11              12

Table view:

chromosome start end width strand metadata_group originalOrder ClusterID SortingOrder FinalOrder


chrI 9065087 9070286 5200 + g1 1 1 3 3 chrI 5171285 5175522 4238 - g1 2 3 50 43 chrI 9616508 9618109 1602 - g1 3 3 13 43 chrI 3608395 3611844 3450 + g1 4 3 11 12

PDF output size

The last tab (Batch operation and setup) on the tool panel includes batch operations and various other settings including PDF output size. By default the output PDF will be A4 landscape. This can be changed using the drop-down list to following settings:

The view on top part of batch operation and setup
panel

Batch operations

Controls to plot multiple plots at once are located on the Batch operation and setup tab, just below PDF paper options. It is possible to output the plots to multipage PDF, plot an array of plots on a single page (for average plots) or mix these options together.

The view on bottom part of batch operation and setup
panel

The first drop-down controls the type of the plot - either average or heatmap. The second drop down determines the strategy to traverse the plot grid. The options include:

The multi plot grid option controls how many plots will be placed on each page of the PDF output, e.g. 1x1 means one plot per one page, while 3x4 means 3 columns and 4 rows of plots. If number of plots exceeds the number of slots on page the new page will be added to the PDF.

Filter names will apply a filter to plot titles, which are based on on uploaded file names. For example, if you uploaded 100 files starting with a prefix of “my_experiment_”, you can remove this fragment from each plot title and/or heatmap caption by putting this string in Filter names.

Finally, pressing Get PDF produces the final output file. Please see example below:

Batch plot usage example - multiple average plots aranged in 6x2
plot grid

Saving and loading plot sets

SeqPlots allows to save the plot sets as binary R files. This allows to quickly load pre-calculated set for replotting. Furthermore, the saved plot sets can be shared with other SePlots users.

Load or save plotset

Following controls are available on “Load or save plotset” panel:

All saved dataset can be found in data location/publicFiles. Any SeqPlots Rdata binaries put in the folder will become available for loading in Load saved plot set control.

The view on the "Load or save plotset" panel

Plot set files structure

The plot sets files can be also directly loaded in R. This allows further processing and customization of the plots. Data structure is a nested list, which elements be accessed by [[ R operator. The nesting goes as follow:

The example structure:

List of 2
 $ HTZ1_Differential_genes_TOP100_v2.gff:List of 2
  ..$ HTZ1_JA00001_IL1andIL2_F_N2_L3_NORM_linear_1bp_IL010andIL009_averaged.bw    :List of 7
  .. ..$ means   : num [1:501] 2.52 2.52 2.52 2.53 2.54 ...
  .. ..$ stderror: num [1:501] 0.114 0.112 0.111 0.11 0.109 ...
  .. ..$ conint  : num [1:501] 0.226 0.223 0.221 0.218 0.217 ...
  .. ..$ all_ind : num [1:501] -1000 -995 -990 -985 -980 -975 -970 -965 -960 -955 ...
  .. ..$ e       : NULL
  .. ..$ desc    : chr "HTZ1_JA00001_IL1andIL2_F_N2_L3_NORM_linear_1bp_IL010andIL009_averaged\n@HTZ1_Differential_genes_TOP100_v2"
  .. ..$ heatmap : num [1:100, 1:501] 2.36 5.25 2.2 3.48 4.32 ...
  ..$ HTZ1_JA00001_IL3andIIL5_F_lin35_L3_NORM_linear_1bp_IL008andIL011_averaged.bw:List of 7
  .. ..$ means   : num [1:501] 2.36 2.35 2.35 2.36 2.38 ...
  .. ..$ stderror: num [1:501] 0.126 0.125 0.125 0.126 0.125 ...
  .. ..$ conint  : num [1:501] 0.249 0.249 0.247 0.251 0.249 ...
  .. ..$ all_ind : num [1:501] -1000 -995 -990 -985 -980 -975 -970 -965 -960 -955 ...
  .. ..$ e       : NULL
  .. ..$ desc    : chr "HTZ1_JA00001_IL3andIIL5_F_lin35_L3_NORM_linear_1bp_IL008andIL011_averaged\n@HTZ1_Differential_genes_TOP100_v2"
  .. ..$ heatmap : num [1:100, 1:501] 2.61 3.17 1.42 2.46 4.26 ...
 $ HTZ1_Differential_genes_BOTTOM100.gff:List of 2
  ..$ HTZ1_JA00001_IL1andIL2_F_N2_L3_NORM_linear_1bp_IL010andIL009_averaged.bw    :List of 7
  .. ..$ means   : num [1:501] 1.57 1.57 1.58 1.6 1.62 ...
  .. ..$ stderror: num [1:501] 0.0996 0.0985 0.1003 0.1022 0.1018 ...
  .. ..$ conint  : num [1:501] 0.198 0.195 0.199 0.203 0.202 ...
  .. ..$ all_ind : num [1:501] -1000 -995 -990 -985 -980 -975 -970 -965 -960 -955 ...
  .. ..$ e       : NULL
  .. ..$ desc    : chr "HTZ1_JA00001_IL1andIL2_F_N2_L3_NORM_linear_1bp_IL010andIL009_averaged\n@HTZ1_Differential_genes_BOTTOM100"
  .. ..$ heatmap : num [1:100, 1:501] 1.64 1.37 1.61 1.77 1.86 ...
  ..$ HTZ1_JA00001_IL3andIIL5_F_lin35_L3_NORM_linear_1bp_IL008andIL011_averaged.bw:List of 7
  .. ..$ means   : num [1:501] 1.94 1.94 1.95 1.96 1.97 ...
  .. ..$ stderror: num [1:501] 0.123 0.123 0.124 0.126 0.128 ...
  .. ..$ conint  : num [1:501] 0.244 0.245 0.246 0.251 0.253 ...
  .. ..$ all_ind : num [1:501] -1000 -995 -990 -985 -980 -975 -970 -965 -960 -955 ...
  .. ..$ e       : NULL
  .. ..$ desc    : chr "HTZ1_JA00001_IL3andIIL5_F_lin35_L3_NORM_linear_1bp_IL008andIL011_averaged\n@HTZ1_Differential_genes_BOTTOM100"
  .. ..$ heatmap : num [1:100, 1:501] 1.61 1.37 1.29 3.04 3.77 ...

Advanced options

Some additional SeqPlots options can be located at very bottom of Bach operation and setup tab:

The view on 'Advanced options' section of the batch operation and
setup panel

Error messages

Adding the files:

Problem with line N: "line_text" [internal_error]

The import of feature file (GFF or BED) was not successful due to mis-formatted file.


Chromosome names provided in the file does not match ones defined in 
reference genome. 
INPUT: [chr3R, chr2L, chr2R, chr3L] 
GENOME: [chrI, chrII, chrIII, chrIV, chrV, ...]

There are unexpected chromosome names in input file. Following genomes: Arabidopsis thaliana, Caenorhabditis elegans, Cyanidioschyzon_merolae, Drosophila melanogaster, Homo sapiens, Oryza sativa, Populus trichocarpa, Saccharomyces cerevisiae and Zea mays support chromosome names remapping between different naming conventions, including: AGPv2, ASM9120v1, Ensembl, JGI2_0, MSU6, NCBI, TAIR10 and UCSC. If you see above error in one of these genomes there are still unexpected names after the correction. The problematic chromosome names are given in the error message. Remove GFF/BED lines corresponding to them or upgrade the genome to one containing proper naming. Alternatively set genome to NA.


File already exists, change the name or remove old one.

File named like this already exists in the database, it is impossible to have two files sharing same filename.


ERROR: solving row 300: negative widths are not allowed

The the row 300 have end coordinate smaller than beginning, hence the width in negative. To fix it the start and stop indicates should be swapped. This error often happens when negative strand (-) ranges are misformatted.


Explanations


Session Information

## R version 3.1.1 (2014-07-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BiocStyle_1.3.12                    
##  [2] seqplots_0.99.2                     
##  [3] jpeg_0.1-8                          
##  [4] png_0.1-7                           
##  [5] RSelenium_1.3.1                     
##  [6] XML_3.98-1.1                        
##  [7] caTools_1.17.1                      
##  [8] RJSONIO_1.3-0                       
##  [9] RCurl_1.95-4.3                      
## [10] bitops_1.0-6                        
## [11] BSgenome.Celegans.UCSC.ce10_1.3.1000
## [12] BSgenome_1.33.9                     
## [13] rtracklayer_1.25.16                 
## [14] Biostrings_2.33.14                  
## [15] XVector_0.5.8                       
## [16] GenomicRanges_1.17.42               
## [17] GenomeInfoDb_1.1.20                 
## [18] IRanges_1.99.28                     
## [19] S4Vectors_0.2.4                     
## [20] testthat_0.8.1                      
## [21] BiocGenerics_0.11.5                 
## [22] devtools_1.5                        
## 
## loaded via a namespace (and not attached):
##  [1] BatchJobs_1.3            BBmisc_1.7              
##  [3] BiocParallel_0.99.22     brew_1.0-6              
##  [5] Cairo_1.5-5              checkmate_1.4           
##  [7] class_7.3-11             codetools_0.2-9         
##  [9] DBI_0.3.0                digest_0.6.4            
## [11] evaluate_0.5.5           fail_1.2                
## [13] fields_7.1               foreach_1.4.2           
## [15] formatR_1.0              futile.logger_1.3.7     
## [17] futile.options_1.0.0     GenomicAlignments_1.1.30
## [19] grid_3.1.1               htmltools_0.2.6         
## [21] httpuv_1.3.0             httr_0.5                
## [23] iterators_1.0.7          knitr_1.6.18            
## [25] kohonen_2.0.15           lambda.r_1.1.6          
## [27] maps_2.3-9               markdown_0.7.4          
## [29] MASS_7.3-34              memoise_0.2.1           
## [31] mime_0.1.2               parallel_3.1.1          
## [33] plotrix_3.5-7            Rcpp_0.11.2             
## [35] roxygen2_4.0.2           Rsamtools_1.17.34       
## [37] RSQLite_0.11.4           rstudio_0.98.1028       
## [39] rstudioapi_0.1           sendmailR_1.1-2         
## [41] shiny_0.10.1             spam_1.0-1              
## [43] stringr_0.6.2            tools_3.1.1             
## [45] whisker_0.3-2            xtable_1.7-4            
## [47] zlibbioc_1.11.1

References

R project and Bioconductor

JavaScript and CSS

Important conceptual contribution to the project

Server deployment

Publications containing figures made by SeqPlots