shuffle_sequences {universalmotif}R Documentation

Shuffle input sequences.

Description

Given a set of input sequences, shuffle the letters within those sequences with any k-let size.

Usage

shuffle_sequences(sequences, k = 1, method = "linear",
  leftovers = "asis", progress = FALSE, BP = FALSE)

Arguments

sequences

XStringSet For method = 'markov', DNAStringSet and RNAStringSet only.

k

numeric(1) K-let size.

method

character(1) One of c('markov', 'linear', 'random'). Only relevant is k > 1. See details.

leftovers

character(1) For method = 'random'. One of c('asis', 'first', 'split', 'discard'). See details.

progress

logical(1) Show progress. Not recommended if BP = TRUE.

BP

logical(1) Allows the use of BiocParallel within shuffle_sequences(). See BiocParallel::register() to change the default backend. Setting BP = TRUE is only recommended for large jobs (such as shuffling billions of letters). Furthermore, the behaviour of progress = TRUE is changed if BP = TRUE; the default BiocParallel progress bar will be shown (which unfortunately is much less informative).

Details

If method = 'markov', then the Markov model is used to generate sequences which will maintain (on average) the k-let frequencies. Please note that this method is not a 'true' shuffling, and for short sequences (e.g. <100bp) this can result in slightly more dissimilar sequences versus true shuffling. See Fitch (1983) and Altschul and Erickson (1985) for a discussion on the topic.

If method = 'linear', then the input sequences are split linearly every k letters; for example, for k = 3 'ACAGATAGACCC' becomes 'ACA GAT AGA CCC'; afterwhich these 3-lets are shuffled randomly. If method = 'random', then k-lets are picked from the sequence completely randomly. This however can leave 'leftover' letters, where lone letter islands smaller than k are left. There are a few options provided to deal with these: leftovers = 'asis' will leave these letter islands in place; leftovers = 'first' will place these letters at the beginning of the sequence; leftovers = 'split' will place half of the leftovers at the beginning and end of the sequence; leftovers = 'discard' simply gets rid of the leftovers.

Do note however, that the method parameter is only relevant for k > 1. For this, a simple sample call is performed.

Value

XStringSet The input sequences will be returned with identical names and lengths.

Author(s)

Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca

References

Altschul SF, Erickson BW (1985). “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation That Preserves Dinucleotide and Codon Usage.” Molecular Biology and Evolution, 2, 526-538.

Fitch WM (1983). “Random sequences.” Journal of Molecular Biology, 163, 171-176.

See Also

create_sequences(), scan_sequences(), enrich_motifs(), shuffle_motifs()

Examples

sequences <- create_sequences()
sequences.shuffled <- shuffle_sequences(sequences, k = 2)


[Package universalmotif version 1.0.22 Index]