Package version: pqsfinder 1.2.3

Contents

1 Introduction

The main functionality of the pqsfinder package is to detect DNA sequence patterns that are likely to fold into an intramolecular G-quadruplex (G4). G4 is a DNA structure that can form as an alternative to the canonical B-DNA. G4s are believed to be involved in regulation of diverse biological processes, such as telomere maintenance, DNA replication, chromatin formation, transcription, recombination or mutation (Maizels and Gray 2013; Kejnovsky, Tokan, and Lexa 2015). The main idea of our algorithmic approach is based on the fact that G4 structures arise from compact sequence motifs composed of four consecutive and possibly imperfect guanine runs (G-run) interrupted by loops of semi-arbitrary lengths. The algorithm first identifies four consecutive G-run sequences. Subsequently, it examines the potential of such G-runs to form a stable G4 and assigns a corresponding quatitative score to each. Non-overlapping potential quadruplex-forming sequences (PQS) with positive score are then reported.

It is important to note that unlike many other approaches, our algorithm is able to detect sequences responsible for G4s folded from imperfect G-runs containing bulges or mismatches and as such is more sensitive than competing algorithms.1 We also believe the presented solution is the most scalable, since it can be easily and quickly customized (see chapter Customizing detection algorithm for details). The program can be made to detect novel or experimental G4 types that might be discovered or studied in future.

For those interested in non-B DNA, we have previously authored a similar package that can be used to search for triplexes, another type of non-B DNA structure. For details, please see triplex package landing page.

2 G4-quadruplex detection

As usual, before first package use, it is necessary to load the pqsfinder package using the following command:

library(pqsfinder)

Identification of potential quadruplex-forming sequences (PQS) in DNA is performed using the pqsfinder function. This function has one required parameter representing the studied DNA sequence in the form of a DNAString object and several modifying options with predefined values. For complete description, please see pqsfinder function man page.

2.1 Basic quadruplex detection

As a simple example, let’s find all PQS in a short DNA sequence.

seq <- DNAString("TTTTGGGCGGGAGGAGTGGAGTTTTTAACCCCAAAAATTTGGGAGGGTGGGTGGGAGAA")
pqs <- pqsfinder(seq)
pqs
## Searching on sense strand...
## Searching on antisense strand...
##   PQS views on a 59-letter DNAString subject
## subject: TTTTGGGCGGGAGGAGTGGAGTTTTTAACCCCAAAAATTTGGGAGGGTGGGTGGGAGAA
## quadruplexes:
##     start width score strand
## [1]     5    17    49      + [GGGCGGGAGGAGTGGAG]
## [2]    41    15    89      + [GGGAGGGTGGGTGGG]

Detected PQS are returned in the form of a PQSViews class, which represents the basic container for storing a set of views on the same input sequence based on XStringViews object from Biostrings package. Each PQS in the view is defined by start location, width and score. All these values can be accessed by standard functions start(x), width(x), strand(x) and score(x) for further manipulation and analysis. For example:

data.frame(b = start(pqs), w = width(pqs), s = strand(pqs), sc = score(pqs))
##    b  w s sc
## 1  5 17 + 49
## 2 41 15 + 89

By default, pqsfinder function reports only the locally best non-overlapping PQS, ignoring any other that would overlap it. However, it’s possible to get numbers of all overlapping PQS at each position of the input sequence. To achieve that, call density(x) function on the PQSViews object:2

density(pqs)
##  [1]   0   0   0   0 106 123 133 133 226 245 253 253 269 273 273 277 277
## [18] 281 284 284 284 283 283 283 283 283 283 283 283 283 283 283 283 283
## [35] 283 283 283 283 283 283 287 287 287 280 280 280 280 246 246 244 241
## [52] 141 141 135 126   9   9   0   0

The following example shows, how such density vector could be simply visualized along the input sequence using Gviz from Bioconductor.

library(Gviz)
ss <- DNAStringSet(seq)
names(ss) <- "chr1"
dtrack <- DataTrack(
  start = 1:length(density(pqs)), width = 1, data = density(pqs),
  chromosome = "chr1", genome = "", name = "density")
strack <- SequenceTrack(ss, chromosome = "chr1", name = "sequence")
suppressWarnings(plotTracks(c(dtrack, strack), type = "h"))

2.2 Modifying basic algorithm options

Depending on the particular type of PQS you want to detect, the algorithm options can be tuned to find the PQS effectively and exclusively. The table bellow gives an overview of all basic algorithm options and their descriptions.

Option name Description
strand Strand specification (+, - or *).
max_len Maximal total length of PQS.
min_score Minimal score of PQS to be reported.
run_min_len Minimal length of each PQS run (G-run).
run_max_len Maximal length of each PQS run.
loop_min_len Minimal length of each PQS inner loop.
loop_max_len Maximal length of each PQS inner loop.
max_bulges Maximal number of runs containing a bulge.
max_mismatches Maximal number of runs containing a mismatch.
max_defects Maximum number of defects in total (#bulges + #mismatches).

The more you narrow these options in terms of shorter PQS length, narrower run or loop length ranges and lower number of defects, the faster the detection process will be, with a possible loss of sensitivity.

Important note: In each G-run, the algorithm allows at most one type of defect and at least one G-run must be perfect, that means without any defect. Therefore the values of max_bulges, max_mismatches and max_defects must fall into the range from 0 to 3.

Example 1: If you are insterested solely in G-quadruplexes with perfect G-runs, just restrict max_defects to zero:

pqsfinder(seq, max_defects = 0)
## Searching on sense strand...
## Searching on antisense strand...
##   PQS views on a 59-letter DNAString subject
## subject: TTTTGGGCGGGAGGAGTGGAGTTTTTAACCCCAAAAATTTGGGAGGGTGGGTGGGAGAA
## quadruplexes:
##     start width score strand
## [1]    41    15    89      + [GGGAGGGTGGGTGGG]

Example 2: In case you don’t mind defects in G-runs, but you want to report only higher quality PQS, increase min_score value:

pqsfinder(seq, min_score = 80)
## Searching on sense strand...
## Searching on antisense strand...
##   PQS views on a 59-letter DNAString subject
## subject: TTTTGGGCGGGAGGAGTGGAGTTTTTAACCCCAAAAATTTGGGAGGGTGGGTGGGAGAA
## quadruplexes:
##     start width score strand
## [1]    41    15    89      + [GGGAGGGTGGGTGGG]

3 Exporting results

As mentioned above, the results of detection are stored in the PQSViews object. Because the PQSViews class is only an extension of the XStringViews class, all operations applied to the XStringViews object can also be applied to the PQSViews object as well.

Additionaly, PQSViews class supports a conversion mechanism to create GRanges objects. Thus, all detected PQS can be easily transformed into elements of a GRanges object and saved as a GFF3 file, for example.

3.1 GRanges conversion and export to GFF3

In this example, the output of the pqsfinder function will be stored in a GRanges object and subsequently exported as a GFF3 file. At first, let’s do the conversion using the following command:

gr <- as(pqs, "GRanges")
gr
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |     score
##          <Rle> <IRanges>  <Rle> | <numeric>
##   [1]     chr1  [ 5, 21]      + |        49
##   [2]     chr1  [41, 55]      + |        89
##   -------
##   seqinfo: 1 sequence from an unspecified genome

Please note that the chromosome name is arbitrarily set to chr1, but it can be freely changed to any other value afterwards. In the next step the resulting GRanges object is exported as a GFF3 file.

library(rtracklayer)
export(gr, "test.gff", version = "3")

Please note, that it is necessary to load the rtracklayer library before running the export command. The contents of the resulting GFF3 file are:

text <- readLines("test.gff", n = 10)
cat(strwrap(text, width = 80, exdent = 3), sep = "\n")
## ##gff-version 3
## ##date 2016-11-30
## chr1 rtracklayer sequence_feature 5 21 49 + .
## chr1 rtracklayer sequence_feature 41 55 89 + .

Another possibility of utilizing the results of detection is to transform the PQSViews object into a DNAStringSet object, another commonly used class of the Biostrings package. PQS stored inside DNAStringSet can be exported into a FASTA file, for example.

3.2 DNAStringSet conversion and export to FASTA

In this example, the output of the pqsfinder function will be stored in a DNAStringSet object and subsequently exported as a FASTA file. At first, let’s do the conversion using the following command:

dss <- as(pqs, "DNAStringSet")
dss
##   A DNAStringSet instance of length 2
##     width seq                                          names               
## [1]    17 GGGCGGGAGGAGTGGAG                            start=5;end=21;st...
## [2]    15 GGGAGGGTGGGTGGG                              start=41;end=55;s...

In the next step, the DNAStringSet object is exported as a FASTA file.

writeXStringSet(dss, file = "test.fa", format = "fasta")

The contents of the resulting FASTA file are:

text <- readLines("test.fa", n = 10)
cat(text, sep = "\n")
## >start=5;end=21;strand=+;score=49;
## GGGCGGGAGGAGTGGAG
## >start=41;end=55;strand=+;score=89;
## GGGAGGGTGGGTGGG

Please, note that all attributes of detection such as start position, end position and score value are stored as a name parameter (inside the DNAStringSet), and thus, they are also shown in the header line of the FASTA format (the line with the initial > symbol).

4 A real world example

In the following example, we load the human genome from the BSgenome package and identify all potential G4 (PQS) in the region of AHNAK gene on chromose 11. We then export the identified positions into a genome annotation track (via a GFF3 file) and an additional FASTA file. Finally, we plot some graphs showing the PQS score distribution and the distribution of PQS along the studied genomic sequence.

  1. Load necessary libraries and genomes.

    library(pqsfinder)
    library(biomaRt)
    library(BSgenome.Hsapiens.UCSC.hg38)
    library(rtracklayer)
    library(Gviz)
  2. Retrive AHNAK gene Biomart annotation.

    gnm <- "hg38"
    gene <- "AHNAK"
    mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
    btrack <- BiomartGeneRegionTrack(biomart = mart, genome = gnm,  symbol = gene,  name = gene)
  3. Get AHNAK sequence from BSgenome package extended by 1000 nucleotides on both sides.

    extend <- 1000
    seq_start <- min(start(btrack)) - extend
    seq_end <- max(end(btrack)) + extend
    chr <- chromosome(btrack)
    seq <- Hsapiens[[chr]][seq_start:seq_end]
  4. Search for PQS on both strands.

    pqs <- pqsfinder(seq)
  5. Display the results.

    pqs
    ##   PQS views on a 124694-letter DNAString subject
    ## subject: GCGGGTGTCTGTAATCCCAGCTACTTGGGAGG...TGCACCAGCTGCACCTAGCATTTTCAGATCC
    ## quadruplexes:
    ##        start width score strand
    ##   [1]    778    29   111      + [GGGGAGGGGGAGCAAGGGGTGTAAGAGGG]
    ##   [2]    927    42    55      + [GTGGCCAACTCTCCCAGGG...TCACAACGCCAGGGAAGGG]
    ##   [3]   1071    39   104      - [CCCCCTCTAGTCCCAAACCTAAGCCCACACTCTGTCCCC]
    ##   [4]   1499    32    47      + [GGGATAGATTTGGAAAGGATCCTGCTGTTGGG]
    ##   [5]   1605    37    62      - [CCCGAGTCTGTCCTTTTGGTGCCCCCCATACAAGCCC]
    ##   [6]   1781    39    57      - [CCCTTTCCTCTTCATGAACCCTTTCTGCTTCATGAACCC]
    ##   [7]   1846    29    81      - [CCCTTCACCTTCCCTCCCTGTCGTCTCCC]
    ##   [8]   1918    37    89      - [CCCCGAGTCACACCCAACTCATCCCCATCGGACTCCC]
    ##   [9]   1986    44    54      - [CCTGGCCTGGTCTCTCTCA...ACGTGGGCCTGCATGCCCC]
    ##   ...    ...   ...   ...    ...
    ## [687] 123669    41    45      + [GGACCATGGAGTCGGGTCGCCAGCTCTATGTGGTGTTTGGG]
    ## [688] 123830    26    68      + [GGAGAGCCCAGCGGGGATGGGAAGGG]
    ## [689] 123875    28    64      + [GACTCAGGCCTCCCTGGGGCAGGGCGGG]
    ## [690] 123977    44   110      - [CTCCCCTACCCACCACAGC...GCCCTTGCATCCCACCCCC]
    ## [691] 124037    25    46      - [CCCAGCCTTCCTGGGGCCACAACCC]
    ## [692] 124101    47    51      - [CCCTAGCCCAAGTGCAAGC...AACAACAACCTGGCCACCC]
    ## [693] 124273    45   101      - [CCAGGCCCGCTGCTGCCCA...CCCAGAACCCCCGACCCAC]
    ## [694] 124355    37    44      - [CCCTCTAAAACCTAAACTACTCTTCCAACTTGGTCCC]
    ## [695] 124604    31    72      - [CCAACTCCTGGCCTCAGCCATCCCCCCACCC]
  6. Sort the results by score to see the best one.

    pqs_s <- pqs[order(score(pqs), decreasing = TRUE)]
    pqs_s
    ##   PQS views on a 124694-letter DNAString subject
    ## subject: GCGGGTGTCTGTAATCCCAGCTACTTGGGAGG...TGCACCAGCTGCACCTAGCATTTTCAGATCC
    ## quadruplexes:
    ##        start width score strand
    ##   [1] 114398    42   163      + [GGGACCCGGGAGTGGGCTG...GGAGGGGGGCCGCTGGGGG]
    ##   [2] 113317    47   153      + [GGGAGTTGGGCGGGGGGAA...TCCGGAGGGGAAGGGGCGG]
    ##   [3]  73196    37   151      - [CCCCCGACACACCTCCCCCTACTCTCCACCCGCCCCC]
    ##   [4] 103330    36   151      - [CCCTGCCCTTCCCTCCAACACCCCCACCGACCCCCC]
    ##   [5] 114459    36   137      - [CCCCCTCCCCGCATCCACTGCCCCCTGTCCTGTCCC]
    ##   [6]  72217    41   134      - [CCTGCCCCACCCCCTACCCTGCCCCCTGCAGCCCAGCATCC]
    ##   [7] 109897    26   132      - [CCCCATCACCCCCTTCCCCTGGCCCC]
    ##   [8]  59388    46   131      - [CCCCACGTTTACCACTACC...ACACACCCCAACCCAGCCC]
    ##   [9] 111482    29   131      - [CCCCAGAGCCCCACACACCCCTCCGCCCC]
    ##   ...    ...   ...   ...    ...
    ## [687] 104090    38    42      - [CCCAAGGGATAGCCAGGTCCTCTGCTACCCAGGCCAGC]
    ## [688] 105599    31    42      + [GGGGGGTGCCTCACCAGCTGGCAGCTCTGGG]
    ## [689] 105752    28    42      - [CTGCCACACCAGGCCCTGAGCCAGGCCC]
    ## [690] 108832    42    42      - [CCCACCACACACACAAGGC...GAGAAGCAGTCCTCATCCC]
    ## [691] 114123    38    42      + [GGGCTCAGCCCCAGGGCACGGCCATAGCCGGCGCGAGG]
    ## [692] 115401    41    42      - [CCCAGCAGTCAACTCTGCTGCCAGCCCTGGGGGCTGTGTCC]
    ## [693] 117569    31    42      + [GGGTTTCACCATGTTGGCCAGGGTGGTCTCG]
    ## [694] 118299    38    42      - [CCCTTGCCTTTTCTACAATAGATAGCATCCTGCATCCC]
    ## [695] 122286    34    42      - [CTGAGCCAGTACCACCAGCCCAAGGTATTCTCCC]
  7. Export all PQS into a GFF3-formatted file.

    export(as(pqs, "GRanges"), "test.gff", version = "3")

    The contents of the GFF3 file are as follows (the first 5 records only):

    ## ##gff-version 3
    ## ##date 2016-11-30
    ## chr1 rtracklayer sequence_feature 778 806 111 + .
    ## chr1 rtracklayer sequence_feature 927 968 55 + .
    ## chr1 rtracklayer sequence_feature 1071 1109 104 - .
    ## chr1 rtracklayer sequence_feature 1499 1530 47 + .
    ## chr1 rtracklayer sequence_feature 1605 1641 62 - .
  8. Export all PQS into a FASTA format file.

    writeXStringSet(as(pqs, "DNAStringSet"), file = "test.fa", format = "fasta")

    The contents of the FASTA file are as follows (the first 5 records only):

    ## >start=778;end=806;strand=+;score=111;
    ## GGGGAGGGGGAGCAAGGGGTGTAAGAGGG
    ## >start=927;end=968;strand=+;score=55;
    ## GTGGCCAACTCTCCCAGGGATTCTCACAACGCCAGGGAAGGG
    ## >start=1071;end=1109;strand=-;score=104;
    ## CCCCCTCTAGTCCCAAACCTAAGCCCACACTCTGTCCCC
    ## >start=1499;end=1530;strand=+;score=47;
    ## GGGATAGATTTGGAAAGGATCCTGCTGTTGGG
    ## >start=1605;end=1641;strand=-;score=62;
    ## CCCGAGTCTGTCCTTTTGGTGCCCCCCATACAAGCCC
  9. Show histogram for score distribution of detected PQS.

    hist(score(pqs), breaks = 20, main = "Histogram of PQS score")

  10. Show PQS score and density distribution along AHNAK gene annotation using Gviz package.

    strack <- DataTrack(
      start = start(pqs)+seq_start, end = end(pqs)+seq_start,
      data = score(pqs), chromosome = chr, genome = gnm, name = "score")
    dtrack <- DataTrack(
      start = (seq_start):(seq_start+length(density(pqs))-1), width = 1,
      data = density(pqs), chromosome = chr, genome = gnm,
      name = "density")
    atrack <- GenomeAxisTrack()
    suppressWarnings(plotTracks(c(btrack, strack, dtrack, atrack), type = "h"))

5 Customizing the detection algorithm

The underlying detection algorithm is almost fully customizable, it can even be set up to find fundamentally different types of G-quadruplexes. The very first option how to change the detection behavior is to tune scoring bonuses, penalizations and factors. Supported options are summarized in the table bellow:

Option name Description
tetrad_bonus Score bonus for each G tetrade, regardless whether the tetrade contains mismatches or not.
bulge_penalty Penalization for a run with bulge.
mismatch_penalty Penalization for a run with mismatch.
loop_mean_factor Penalization factor of loop length mean.
loop_sd_factor Penalization factor of loop length standard deviation.

5.1 Customizing the scoring function

A more complicated way to influence the algorithm output is to implement a custom scoring function and pass it throught the custom_scoring_fn options. Before you start experimenting with this feature, please consider the fact that custom scoring function can influence the overall algorithm performance very negatively, particularly on long sequences. The best use case of this feature is rapid prototyping of novel scoring techniques, which can be later implemented efficiently, for example in the next version of this package. Thus, if you have any suggestions how to further improve the default scoring system (DSS), please let us know, we would highly appreciate that.

Basically, the custom scoring function should take the following 10 arguments:

The function will return a new score as a single integer value. Please note that if use_default_scoring is enabled, the custom scoring function is evaluated after the DSS but only if the DSS resulted in positive score (for performance reasons). On the other hand, when use_default_scoring is disabled, custom scoring function is evaluated on every PQS.

Example: Imagine you would like to assign a particular type of quadruplex a more favourable score. For example, you might want to reflect that G-quadruplexes with all loops containing just a single cytosine tend to be more stable than similar ones with different nucleotide at the same place. This can be easily implemented by the following custom scoring function:

c_loop_bonus <- function(subject, score, start, width, loop_1,
                         run_2, loop_2, run_3, loop_3, run_4) {
  l1 <- run_2 - loop_1
  l2 <- run_3 - loop_2
  l3 <- run_4 - loop_3
  if (l1 == l2 && l1 == l3 && subject[loop_1] == DNAString("C") &&
      subject[loop_1] == subject[loop_2] &&
      subject[loop_1] == subject[loop_3]) {
    score <- score + 20
  }
  return(score)
}

Without the custom scoring function, the two PQS found in the example sequence will have the same score.

seq <- DNAString("GGGCGGGCGGGCGGGAAAAAAAAAAAAAGGGAGGGAGGGAGGG")
pqsfinder(seq)
## Searching on sense strand...
## Searching on antisense strand...
##   PQS views on a 43-letter DNAString subject
## subject: GGGCGGGCGGGCGGGAAAAAAAAAAAAAGGGAGGGAGGGAGGG
## quadruplexes:
##     start width score strand
## [1]     1    15    89      + [GGGCGGGCGGGCGGG]
## [2]    29    15    89      + [GGGAGGGAGGGAGGG]

However, if the custom scoring function presented above is applied, the two PQS are clearly distinguishable by score:

pqsfinder(seq, custom_scoring_fn = c_loop_bonus)
## Searching on sense strand...
## Searching on antisense strand...
##   PQS views on a 43-letter DNAString subject
## subject: GGGCGGGCGGGCGGGAAAAAAAAAAAAAGGGAGGGAGGGAGGG
## quadruplexes:
##     start width score strand
## [1]     1    15   109      + [GGGCGGGCGGGCGGG]
## [2]    29    15    89      + [GGGAGGGAGGGAGGG]

5.2 Complete replacement of the default scoring system

There might be use cases when it is undesirable to have the default scoring system (DSS) enabled. In this example we show how to change the detection algorithm behavior to find quite a different type of sequence motif - an interstrand G-quadruplex (isG4) (Kudlicki 2016). Unlike standard intramolecular G-quadruplex, isG4 can be defined by interleaving runs of guanines and cytosines respectively. Its canonical form can be described by a regular expression GnNaCnNbGnNcCn.

To detect isG4s by the pqsfinder function, it is essential to change three options. At first, disable the DSS by setting use_default_scoring to FALSE. Second, specify a custom regular expression defining one run of the quadruplex by setting run_re to G{3,6}|C{3,6}. The last step is to define a custom scoring function validating each PQS:

isG4 <- function(subject, score, start, width, loop_1,
                 run_2, loop_2, run_3, loop_3, run_4) {
  r1 <- loop_1 - start
  r2 <- loop_2 - run_2
  r3 <- loop_3 - run_3
  r4 <- start + width - run_4
  
  if (!(r1 == r2 && r1 == r3 && r1 == r4))
    return(0)
  
  run_1_s <- subject[start:start+r1-1]
  run_2_s <- subject[run_2:run_2+r2-1]
  run_3_s <- subject[run_3:run_3+r3-1]
  run_4_s <- subject[run_4:run_4+r4-1]
  
  if (length(grep("^G+$", run_1_s)) && length(grep("^C+$", run_2_s)) &&
      length(grep("^G+$", run_3_s)) && length(grep("^C+$", run_4_s)))
    return(r1 * 20)
  else
    return(0)
}

Let’s see how it all works together:

pqsfinder(DNAString("AAAAGGGATCCCTAAGGGGTCCC"), strand = "+",
          use_default_scoring = FALSE, run_re = "G{3,6}|C{3,6}",
          custom_scoring_fn = isG4)
## Searching on sense strand...
##   PQS views on a 23-letter DNAString subject
## subject: AAAAGGGATCCCTAAGGGGTCCC
## quadruplexes:
##     start width score strand
## [1]     5    19    60      + [GGGATCCCTAAGGGGTCCC]

6 Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.42.0                  
##  [3] biomaRt_2.30.0                    rtracklayer_1.34.1               
##  [5] Gviz_1.18.1                       GenomicRanges_1.26.1             
##  [7] GenomeInfoDb_1.10.1               pqsfinder_1.2.3                  
##  [9] Biostrings_2.42.1                 XVector_0.14.0                   
## [11] IRanges_2.8.1                     S4Vectors_0.12.1                 
## [13] BiocGenerics_0.20.0               BiocStyle_2.2.1                  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8                   biovizBase_1.22.0            
##  [3] lattice_0.20-34               Rsamtools_1.26.1             
##  [5] assertthat_0.1                rprojroot_1.1                
##  [7] digest_0.6.10                 mime_0.5                     
##  [9] R6_2.2.0                      plyr_1.8.4                   
## [11] backports_1.0.4               acepack_1.4.1                
## [13] RSQLite_1.1                   evaluate_0.10                
## [15] BiocInstaller_1.24.0          httr_1.2.1                   
## [17] ggplot2_2.2.0                 zlibbioc_1.20.0              
## [19] GenomicFeatures_1.26.0        lazyeval_0.2.0               
## [21] data.table_1.9.8              rpart_4.1-10                 
## [23] Matrix_1.2-7.1                rmarkdown_1.2                
## [25] splines_3.3.2                 BiocParallel_1.8.1           
## [27] AnnotationHub_2.6.4           stringr_1.1.0                
## [29] foreign_0.8-67                RCurl_1.95-4.8               
## [31] munsell_0.4.3                 shiny_0.14.2                 
## [33] httpuv_1.3.3                  htmltools_0.3.5              
## [35] nnet_7.3-12                   SummarizedExperiment_1.4.0   
## [37] tibble_1.2                    gridExtra_2.2.1              
## [39] htmlTable_1.7                 interactiveDisplayBase_1.12.0
## [41] Hmisc_4.0-0                   matrixStats_0.51.0           
## [43] XML_3.98-1.5                  GenomicAlignments_1.10.0     
## [45] bitops_1.0-6                  xtable_1.8-2                 
## [47] gtable_0.2.0                  DBI_0.5-1                    
## [49] magrittr_1.5                  scales_0.4.1                 
## [51] stringi_1.1.2                 latticeExtra_0.6-28          
## [53] Formula_1.2-1                 RColorBrewer_1.1-2           
## [55] ensembldb_1.6.2               tools_3.3.2                  
## [57] dichromat_2.0-0               Biobase_2.34.0               
## [59] survival_2.40-1               yaml_2.1.14                  
## [61] AnnotationDbi_1.36.0          colorspace_1.3-1             
## [63] cluster_2.0.5                 memoise_1.0.0                
## [65] VariantAnnotation_1.20.2      knitr_1.15.1

References

Kejnovsky, Eduard, Viktor Tokan, and Matej Lexa. 2015. “Transposable Elements and G-Quadruplexes.” Chromosome Research 23. doi:http://dx.doi.org/10.1007/s10577-015-9491-7.

Kudlicki, Andrzej S. 2016. “G-Quadruplexes Involving Both Strands of Genomic DNA Are Highly Abundant and Colocalize with Functional Sites in the Human Genome.” PLoS ONE 11 (1). doi:http://dx.doi.org/10.1371/journal.pone.0146174.

Maizels, Nancy, and Lucas T. Gray. 2013. “The G4 Genome.” PLoS Genet 9 (4). Public Library of Science: 1–6. doi:10.1371/journal.pgen.1003468.


  1. We have tested pqsfinder on experimentally verified G4 sequences. The results of that work are reflected in default settings of searches. Details of these tests will be presented elsewhere.

  2. Clusters of overlapping PQS usually have steep edges when the number of neighboring G-runs is low, but could be more spread out in other situations.