To install this package, start R (version “3.6”) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("pulsedSilac")
To install the development version of the package, please do it from the development branch in github.
BiocManager::install('marcpaga/pulsedSilac')
SILAC (Stable Isotope Label by Amino acids in Cell culture) is a method that is often used to differentially compare protein expression between samples. Pulsed SILAC is a variant that is used to measure protein turnover. Rather than comparing samples that are labeled to completion with either heavy or light isotopes of amino acids, in pulsed SILAC cells are harvested at a series of time points after the growth media containing light amino acids was switched to media containing heavy amino acids. Each sample is then processed and analyzed by mass spectrometry. The isotope ratio each peptide at each time point can now be used to calculate the turnover rate of each measured protein in a proteome-wide fashion (Pratt et al. (2002), Doherty et al. (2005), Schwanhäusser et al. (2009)).
Figure 1: Pulsed SILAC overview
This package contains a set of functions to analyze pulsed SILAC quantitative data. Functions are provided to organize the data, calculate isotope ratios, isotope fractions, model protein turnover, compare turnover models, estimate cell growth and estimate isotope recycling. Several visualization tools are also included to perform basic data exploration, quality control, condition comparison, individual model inspection and model comparison.
Many Bioconductor packages use SummarizedExperiment
objects or derivatives from it to organize omics data. This is a rectangular class of object that contains assays (matrix-like organized data) in which rows represent features (genes, proteins, metabolites, …) and columns represent samples. If you are unfamiliar with this format, please read this introductory vignette.
This package uses two SummarizedExperiment
derived classes: SilacProteinExperiment
and SilacPeptideExperiment
. It also introduces a new class, the SilacProteomicsExperiment
, which encapsulates both the previous classes.
SilacProteinExperiment
and SilacPeptideExperiment
The SilacProteinExperiment
and SilacPeptideExperiment
are, as their name indicates, dedicated to store either protein or peptide related data. These two objects are almost identical to a SummarizedExperiment
object. Their only difference is the initialization of the metadata
slot with what will be called metaoptions
. These metaoptions
contain values that are indicative of where certain information, required for the analysis, is located in the object. For example, the conditionCol
value is used to indicate which column of colData
has the information about the different experiment conditions.
Figure 2: SilacProteinExperiment/SilacPeptideExperiment object structure
Constructing a SilacProteinExperiment
or a SilacPeptideExperiment
is extremely similar to constructing a SummarizedExperiment
. The only additional arguments are the different metaoptions
values.
Here is an example of constructing a SilacProteinExperiment
:
require(pulsedSilac)
## assays
assays_protein <- list(expression = matrix(1:9, ncol = 3))
## colData
colData <- data.frame(sample = c('A1', 'A2', 'A3'),
condition = c('A', 'A', 'A'),
time = c(1, 2, 3))
## rowData
rowData_protein <- data.frame(prot_id = LETTERS[1:3])
## construct the SilacProteinExperiment
protExp <- SilacProteinExperiment(assays = assays_protein,
rowData = rowData_protein,
colData = colData,
conditionCol = 'condition',
timeCol = 'time')
protExp
## class: SilacProteinExperiment
## dim: 3 3
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
Here is an example of constructing a SilacPeptideExperiment
:
## assays
assays_peptide <- list(expression = matrix(1:15, ncol = 3))
## colData
colData <- data.frame(sample = c('A1', 'A2', 'A3'),
condition = c('A', 'A', 'A'),
time = c(1, 2, 3))
## rowData
rowData_peptide <- data.frame(pept_id = letters[1:5],
prot_id = c('A', 'A', 'B', 'C', 'C'))
## construct the SilacProteinExperiment
peptExp <- SilacPeptideExperiment(assays = assays_peptide,
rowData = rowData_peptide,
colData = colData,
conditionCol = 'condition',
timeCol = 'time')
peptExp
## class: SilacPeptideExperiment
## dim: 5 3
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
The accessor functions are exactly the same as in a SummarizedExperiment
object: rowData()
, colData()
, assays()
and metadata()
. To access the metaoptions
slot the both the metaoptions()
and metadata()
functions can be used.
## assays
assays(protExp)
## List of length 1
## names(1): expression
assays(peptExp)
## List of length 1
## names(1): expression
## rowData
rowData(protExp)
## DataFrame with 3 rows and 1 column
## prot_id
## <character>
## 1 A
## 2 B
## 3 C
rowData(peptExp)
## DataFrame with 5 rows and 2 columns
## pept_id prot_id
## <character> <character>
## 1 a A
## 2 b A
## 3 c B
## 4 d C
## 5 e C
## colData
colData(protExp)
## DataFrame with 3 rows and 3 columns
## sample condition time
## <character> <character> <numeric>
## 1 A1 A 1
## 2 A2 A 2
## 3 A3 A 3
colData(peptExp)
## DataFrame with 3 rows and 3 columns
## sample condition time
## <character> <character> <numeric>
## 1 A1 A 1
## 2 A2 A 2
## 3 A3 A 3
## metaoptions
metaoptions(protExp)
## $conditionCol
## [1] "condition"
##
## $timeCol
## [1] "time"
metaoptions(peptExp)[['proteinCol']] <- 'prot_id'
metaoptions(peptExp)
## $conditionCol
## [1] "condition"
##
## $timeCol
## [1] "time"
##
## $proteinCol
## [1] "prot_id"
Similar to the accessor function, most of the SummarizedExperiment
operations, such as subsetting and aggregating, can be applied to the two classes. For subsetting these are mainly: [
, subset
, $
; and for aggregation these are mainly: cbind
, rbind
and merge
. For a more detailed explanation please refer to the manual.
## subsetting by rows and columns
protExp[1, 1:2]
## class: SilacProteinExperiment
## dim: 1 2
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
peptExp[1, 1:2]
## class: SilacPeptideExperiment
## dim: 1 2
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## subsetting by rows based on rowData
subset(protExp, prot_id == 'A')
## class: SilacProteinExperiment
## dim: 1 3
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
subset(peptExp, pept_id %in% c('a', 'b'))
## class: SilacPeptideExperiment
## dim: 2 3
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## quick acces to colData
protExp$sample
## [1] "A1" "A2" "A3"
peptExp$condition
## [1] "A" "A" "A"
## combining by columns
cbind(protExp[, 1], protExp[, 2:3])
## class: SilacProteinExperiment
## dim: 3 3
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
## combining by rows
rbind(peptExp[1:3,], peptExp[4:5, ])
## class: SilacPeptideExperiment
## dim: 5 3
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## combine rows and columns
merge(peptExp[1:3, 1], peptExp[3:5, 2:3])
## class: SilacPeptideExperiment
## dim: 5 3
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
SilacProteomicsExperiment
The SilacProteomicsExperiment
object contains both a SilacProteinExperiment
and a SilacPeptideExperiment
object. Therefore it can be used to store both levels of information in a single object. The protein and peptide level are aligned/linked by the colData
slot and by the linkerDf
slot. The linkerDf
slot contains a data.frame
that indicates which peptides are assigned to each protein and vice versa.
Figure 3: SilacProteomicsExperiment object structure
To construct a SilacProteomicsExperiment
we need both a SilacProteinExperiment
object and a SilacPeptideExperiment
object.
ProteomicsExp <- SilacProteomicsExperiment(SilacProteinExperiment = protExp,
SilacPeptideExperiment = peptExp)
ProteomicsExp
## class: SilacProteomicsExperiment
## metadata(7): idColProt idColPept ... timeCol proteinCol
## protein:
## |-- dim: 3 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 5 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
In this case, the levels are only aligned by columns. There is no information on which are the relationships between proteins and peptides. This information can be added in the linkerDf
slot. Which contains a data.frame
that indicates the relationships between the two levels features. One can be constructed with the buildLinkerDf()
function.
## list with the relationships
protein_to_peptide <- list(A = c('a', 'b'), B = c('c'), C = c('d', 'e'))
## function to build the data.frame
linkerDf <- buildLinkerDf(protIDs = LETTERS[1:3],
pepIDs = letters[1:5],
protToPep = protein_to_peptide)
linkerDf
## protID pepID protRow pepRow
## 1 A a 1 1
## 2 A b 1 2
## 3 B c 2 3
## 4 C d 3 4
## 5 C e 3 5
ProteomicsExp <- SilacProteomicsExperiment(SilacProteinExperiment = protExp,
SilacPeptideExperiment = peptExp,
linkerDf = linkerDf)
Linked SilacProteomicsExperiment
objects can be useful to apply linked subsetting operations or to recalculate protein level data from peptide data.
If the slot is general to the SilacProteomicsExperiment
object then the same functions as in a SummarizedExperiment
can be used.
## colData
colData(ProteomicsExp)
## DataFrame with 3 rows and 3 columns
## sample condition time
## <character> <character> <numeric>
## 1 A1 A 1
## 2 A2 A 2
## 3 A3 A 3
## linkerDf
linkerDf(ProteomicsExp)
## protID pepID protRow pepRow
## 1 A a 1 1
## 2 A b 1 2
## 3 B c 2 3
## 4 C d 3 4
## 5 C e 3 5
## metaoptions
metaoptions(ProteomicsExp)
## $conditionCol
## [1] "condition"
##
## $timeCol
## [1] "time"
##
## $idColProt
## [1] NA
##
## $idColPept
## [1] NA
##
## $linkedSubset
## [1] TRUE
##
## $subsetMode
## [1] "protein"
##
## $proteinCol
## [1] NA
But for names shared between protein and peptide slots, like assays
and rowData
, there are two options to access them. Using the SummarizedExperiment
generic will return a named list with the two elements.
## assays
assays(ProteomicsExp)
## $protein
## List of length 1
## names(1): expression
##
## $peptide
## List of length 1
## names(1): expression
## rowData
rowData(ProteomicsExp)
## $protein
## DataFrame with 3 rows and 1 column
## prot_id
## <character>
## 1 A
## 2 B
## 3 C
##
## $peptide
## DataFrame with 5 rows and 2 columns
## pept_id prot_id
## <character> <character>
## 1 a A
## 2 b A
## 3 c B
## 4 d C
## 5 e C
One can use the same functions and just add Prot
or Pept
as suffixes to specifically access a slot at the protein or the peptide level.
## assays of protein level
assaysProt(ProteomicsExp)
## List of length 1
## names(1): expression
## assays of peptide level
assaysPept(ProteomicsExp)
## List of length 1
## names(1): expression
## rowData of protein level
rowDataProt(ProteomicsExp)
## DataFrame with 3 rows and 1 column
## prot_id
## <character>
## 1 A
## 2 B
## 3 C
## rowData of peptide level
rowDataPept(ProteomicsExp)
## DataFrame with 5 rows and 2 columns
## pept_id prot_id
## <character> <character>
## 1 a A
## 2 b A
## 3 c B
## 4 d C
## 5 e C
Finally, to extract the SilacProteinExperiment
and SilacPeptideExperiment
one can use the ProtExp
and PeptExp
functions.
ProtExp(ProteomicsExp)
## class: SilacProteinExperiment
## dim: 3 3
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
PeptExp(ProteomicsExp)
## class: SilacPeptideExperiment
## dim: 5 3
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
When subsetting row-wise on a SilacProteomicsExperiment
there will be two main questions:
The answer to these questions is written in the metaoptions
values. The first one is subsetMode
which can be protein
or peptide
. The second one is linkedSubset
, which can be TRUE
or FALSE
. Moreover, if the latter is set to TRUE
then the idColProt
and idColPept
will also be required.
Here are some examples of using the [
in different situations.
Subset at protein level without affecting the peptide level.
## indicate which rowDat columns have unique ids for proteins and peptides
metaoptions(ProteomicsExp)[['idColProt']] <- 'prot_id'
metaoptions(ProteomicsExp)[['idColPept']] <- 'pept_id'
## indicate that we want to apply the subset at protein level
metaoptions(ProteomicsExp)[['subsetMode']] <- 'protein'
## and not extend it to the peptide level
metaoptions(ProteomicsExp)[['linkedSubset']] <- FALSE
ProteomicsExp[1:2,]
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 2 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 5 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
Note that the protein level has two proteins while the peptide level kept all five peptides.
Subset at protein level and extending the subset to the peptide level.
## to extend we set the metaoption to TRUE
metaoptions(ProteomicsExp)[['linkedSubset']] <- TRUE
ProteomicsExp[1:2,]
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 2 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 3 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
In this case the protein level has two proteins and the peptide level has also been subset to the three peptides that are assigned to the two proteins in the linkerDf
.
The same process can also be applied at peptide level.
## indicate that we want to apply the subset at protein level
metaoptions(ProteomicsExp)[['subsetMode']] <- 'peptide'
## to extend we set the metaoption to TRUE
metaoptions(ProteomicsExp)[['linkedSubset']] <- TRUE
ProteomicsExp[1:2,]
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 1 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 2 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
In this case, the first two peptides, which belong to the first protein, are selected along with that protein.
Other subset operations can be done using subsetProt()
and subsetPep()
, which call the generic subset()
on the rowData
slot of the protein and peptide level respectively. Note that subsetProt()
and subsetPep()
will behave differently depending on the metaoption linkedSubset
as seen above with [
.
## without linked Subset
metaoptions(ProteomicsExp)[['linkedSubset']] <- FALSE
subsetProt(ProteomicsExp, prot_id == 'B')
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 1 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 5 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## with linked Subset
metaoptions(ProteomicsExp)[['linkedSubset']] <- TRUE
subsetProt(ProteomicsExp, prot_id == 'B')
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 1 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 1 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
Finally, cbind
, rbind
and merge
can be used for aggregation.
For cbind
the number of proteins and peptides must be the same and they are expected to be in the same order.
## cbind
cbind(ProteomicsExp[, 1], ProteomicsExp[, 2])
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 3 2
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 5 2
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
For rbind
the number of samples must be the same and they are expected to be in the same order.
## rbind
rbind(ProteomicsExp[1:2,], ProteomicsExp[3,])
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 2 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 3 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
merge
can be used when samples have to be aggregated in one object but the proteins/peptides are not the same. It requires IDs that can be matched between experiments.
## merge
merge(ProteomicsExp[1:3, 1], ProteomicsExp[3:4, 2:3])
## class: SilacProteomicsExperiment
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein:
## |-- dim: 3 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(1): prot_id
## peptide:
## |-- dim: 4 3
## |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
metaoptions
The metaoptions
are a set of “option” values used to indicate where critical analysis information is required. The idea is that these values are defined at the beginning of the analysis so that it is not required to give them as arguments in each function call. Nevertheless, all the analysis functions can have these values passed as arguments. There are several metaoptions
values by default, some of them are shared between the three classes while others are unique.
The following are related to data analysis:
conditionCol
: which column in colData
indicates the different experiment conditions.timeCol
: which column in colData
indicates the different timepoints of the experiment.proteinCol
: which column in rowData
indicates the assigned protein to a peptide. (unique to SilacPeptideExperiment
)The following are related to SilacProteomicsExperiment
subsetting operations:
idColProt
: which column in rowDataProt
indicates unique protein IDs. (unique to SilacProteomicsExperiment
)idColPept
: which column in rowDataPept
indicates unique peptide IDs. (unique to SilacProteomicsExperiment
)linkedSubset
: logical that indicates if the subsetting operation should be applied to both data levels (protein and peptide). (unique to SilacProteomicsExperiment
)subsetMode
: which level should be used for reference when subsetting (protein or peptide). (unique to SilacProteomicsExperiment
)These can be extended by the user to accommodate other experiment design and data analysis operations.
To demonstrate the use of the package functions, a reduced version of the pulsed-SILAC dataset from Visscher et al. (2016) has been built into a SilacProteomicsExperiment
object. It is stored in an object named wormsPE
. This object has been constructed from the output files from MaxQuant ‘proteinGroups.txt’ and ‘peptides.txt’.
wormsPE
## class: SilacProteomicsExperiment
## metadata(7): idColProt idColPept ... timeCol proteinCol
## protein:
## |-- dim: 250 14
## |-- assays(4): int_total int_light int_heavy ratio
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(22): Protein.IDs Majority.protein.IDs ...
## Peptide.is.razor protein_id
## peptide:
## |-- dim: 3574 14
## |-- assays(4): int_total int_light int_heavy ratio
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
## |-- rownames: NULL
## |-- rowData names(46): Sequence N.term.cleavage.window ... id
## Protein.group.IDs
## colnames: NULL
## colData names(2): line timepoint
The dataset contains only the first 250 proteins and their corresponding peptides. It has 4 assays with the total intensity, light isotope and heavy isotope intensities and heavy/light isotope ratio. In the experiment there are 2 worm cell lines: OW40, which express \(\alpha\)-synuclein-YFP (model for Parkinson’s Disease) and OW450, which express YFP only. Worms were harvested after 4h, 6h, 8h, 13h, 24h, 28h and 32h of light to heavy isotope medium transition. Note that the analysis of the data presented in the paper by Visscher et al was not performed using this package. However, all data in that paper has been reanalyzed using this package, which resulted in general the same but also more extended conclusions.
To reliably model a protein’s turnover, several timepoints are required. Ideally a protein would be measured in all the timepoints of the experiment, but this is rarely the case. To count how many proteins/peptides have been measured in each sample the barplotCounts()
can be used.
barplotCounts(wormsPE, assayName = 'ratio')
There is some variability between samples but nothing too extreme. More importantly is knowing whether a protein has been measured across different timepoints. The function barplotTimeCoverage()
can be used to visualize in how many timepoints each protein has been measured per condition.
barplotTimeCoverage(wormsPE, assayName = 'ratio')
As it can be seen, most of the proteins are not measured in all timepoints. In this case there is a majority for proteins that have not been measured in any timepoint, this is because the dataset has been truncated for only the OW40 and OW450 lines. The original search was one with additional worm lines and the missing proteins were detected in the lines excluded from this vignette.
To filter proteins that have not been detected in sufficient timepoints the filterByMissingTimepoints()
function can be used. With maxMissing
the amount of missing values can be tuned. With strict
proteins have to meet the maxMissing criteria only in one condition or in all.
wormsPE2 <- filterByMissingTimepoints(wormsPE,
assayName = 'ratio',
maxMissing = 2,
strict = FALSE)
barplotTimeCoverage(wormsPE2, assayName = 'ratio')
Because strict
is set to FALSE
, there are still some proteins that have been detected in less than 5 timepoints in one of the conditions. To visualize the overlap between conditions, the upsetTimeCoverage()
function can be used. This can be especially useful when more the experiment contains more than two conditions. Here the overlap of proteins is shown.
upsetTimeCoverage(ProtExp(wormsPE2),
assayName = 'ratio',
maxMissing = 2)
In this case there are 68 proteins that have been measured in at least 5 out of 7 timepoints in both conditions. These would be proteins for which turnover models can be compared between conditions.
Filtering by missing timepoints can also be done at peptide level if desired by changing subsetMode
to peptide
.
Other filters can be easily applied using the subset
method. For example, proteins with a low number of unique peptides can be filtered out.
subsetProt(wormsPE, Unique.peptides > 2)
Also, proteins that are known as common contaminants or identified using the reverse database can be filtered out.
subsetProt(wormsPE, Potential.contaminant != '+')
subsetProt(wormsPE, Reverse != '+')
To model protein turnover the isotope fractions from each timepoint are used. The isotope fraction can be calculated in many ways. A simple approach is to derive them from isotope ratios, which many raw mass spectrometry data analysis software report. If not, one can use the calculateIsotopeRatio()
function, which will add a new assay named ratio
to the object. In this example, the ratio of newly synthesized protein (heavy isotope) over old protein (light isotope) will be used.
## calculate the ratio of new istope over old isotope
wormsPE <- calculateIsotopeRatio(x = wormsPE,
newIsotopeAssay = 'int_heavy',
oldIsotopeAssay = 'int_light')
\[ratio = \frac{isotope_{A}}{isotope_{B}}\]
And then the fraction can be calculated as follows:
\[fraction_{A} = \frac{ratio}{1 + ratio}\]
This can be easily done using the calculateIsotopeFraction()
function, which adds a new assay named fraction
. In this case the heavy isotope fraction is calculated, the light isotope fraction would just be 1-fractionA.
wormsPE <- calculateIsotopeFraction(wormsPE, ratioAssay = 'ratio')
assaysProt(wormsPE)
## List of length 5
## names(5): int_total int_light int_heavy ratio fraction
assaysPept(wormsPE)
## List of length 5
## names(5): int_total int_light int_heavy ratio fraction
Unfortunately, this approach might might lead to the loss some values because mass spectrometry has a bias towards the measurement of high abundance peptides. For example: if a protein has a very slow turnover rate, it might be that at the earliest timepoint, the new isotope signal was below the detection limit and a ratio could not be calculated. To also account for these proteins the calculateIsotopeFraction
function has additional arguments to indicate which assays contain each isotope’s expression data and which timepoints are considered “early” or “late”.
wormsPE <- calculateIsotopeFraction(wormsPE,
newIsoAssay = 'int_heavy',
oldIsoAssay = 'int_light',
earlyTimepoints = 1,
lateTimepoints = 7)
To visualize data stored in the assays
slot there are two functions plotDistributionAssay()
and scatterCompareAssays()
. The first one will plot the distributions of the data for a selected assay for each timepoint and condition. The latter will plot each protein/peptide value of a condition against the other for each timepoint.
plotDistributionAssay()
plotDistributionAssay(wormsPE, assayName = 'fraction')