scoreSequences {KinSwingR} | R Documentation |
Scores each input sequence for a match against all PWMs provided from buildPWM() and generates p-values for scores. The output of this function is to be used for building the swing metric, the predicted activity of kinases.
scoreSequences(input_data = NULL, pwm_in = NULL, background = "random", n = 1000, force_trim = FALSE, verbose = FALSE)
input_data |
A data.frame of phoshopeptide data. Must contain 4 columns and the following format must be adhered to. Column 1 - Annotation, Column 2 - centered peptide sequence, Column 3 - Fold Change [-ve to +ve], Column 4 - p-value [0-1] |
pwm_in |
List of PWMs created using buildPWM() |
background |
Option to provide a data.frame of peptides to use as background. If providing a background as a table, this must contain two columns; Column 1 - Annotation, Column 2 - centered peptide sequence. These must be centered. OR generate a random background for PWM scoring from the input list - background = random. Default: "random" |
n |
Number of permutations to perform for generating background. Default: "1000" |
force_trim |
This function will detect if a peptide sequence is of different length to the PWM models generated (provided in pwm_in) and trim the input sequences to the same length as the PWM models. If a background is provided, this will also be trimmed to the same width as the PWM models. Options are: "TRUE, FALSE". Default = FALSE |
verbose |
Turn verbosity on/off. To turn on, verbose=TRUE. Options are: "TRUE, FALSE". Default = FALSE |
A list with 3 elements: 1) PWM-substrate scores: substrate_scores$peptide_scores, 2) PWM-substrate p-values: substrate_scores$peptide_p 3) Background used for reproducibility: substrate_scores$background 4) input_data is returned in the case that it was trimmed.
## import data data(example_phosphoproteome) data(phosphositeplus_human) ## clean up the annotations ## sample 100 data points for demonstration sample_data <- head(example_phosphoproteome, 100) annotated_data <- cleanAnnotation(input_data = sample_data) ## build the PWM models: set.seed(1234) sample_pwm <- phosphositeplus_human[sample(nrow(phosphositeplus_human), 1000),] pwms <- buildPWM(sample_pwm) ## score the PWM - substrate matches ## Using a "random" background, to calculate the p-value of the matches ## Using n=10 for demonstration ## set.seed for reproducibility set.seed(1234) substrate_scores <- scoreSequences(input_data = annotated_data, pwm_in = pwms, background = "random", n = 10)