OrientNucleotides {DECIPHER} | R Documentation |
Orients nucleotide sequences to match the directionality and complementarity of specified reference sequences.
OrientNucleotides(myXStringSet, reference = which.max(width(myXStringSet)), type = "sequences", orientation = "all", threshold = 0.05, verbose = TRUE, processors = 1)
myXStringSet |
A |
reference |
The index of reference sequences with the same (desired) orientation. By default the first sequence with maximum width will be used. |
type |
Character string indicating the type of results desired. This should be (an abbreviation of) either |
orientation |
Character string(s) indicating the allowed reorientation(s) of non-reference sequences. This should be (an abbreviation of) either |
threshold |
Numeric giving the decrease in k-mer distance required to adopt the alternative orientation. |
verbose |
Logical indicating whether to display progress. |
processors |
The number of processors to use, or |
Biological sequences can sometimes have inconsistent orientation that interferes with their analysis. OrientNucleotides
will reorient sequences by changing their directionality and/or complementarity to match specified reference
sequences in the same set. The process works by finding the k-mer distance between the reference sequence(s) and each allowed orientation
of the sequences. Alternative orientations that lessen the distance by at least threshold
are adopted. Note that this procedure requires a moderately similar reference
sequence be available for each sequence that needs to be reoriented. Sequences for which a corresponding reference is unavailable will most likely be left alone because alternative orientations will not pass the threshold
. For this reason, it is recommended to specify several markedly different sequences as reference
s.
OrientNucleotides
can return two types of results: the relative orientations of sequences and/or the reoriented sequences. If type
is "sequences"
(the default) then the reoriented sequences are returned. If type
is "orientations"
then a character vector is returned that specifies whether sequences were reversed ("r"
), complemented ("c"
), reversed complemented ("rc"
), or in the same orientation (""
) as the reference sequences (marked by NA
). If type
is "both"
then the output is a list with the first component containing the "orientations"
and the second component containing the "sequences"
.
Erik Wright eswright@pitt.edu
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER") dna <- SearchDB(db, remove="all") DNA <- dna # 175 sequences # reorient subsamples of the first 169 sequences s <- sample(169, 30) DNA[s] <- reverseComplement(dna[s]) s <- sample(169, 30) DNA[s] <- reverse(dna[s]) s <- sample(169, 30) DNA[s] <- complement(dna[s]) DNA <- OrientNucleotides(DNA, reference=170:175) DNA==dna # all were correctly reoriented