Contents

1 Installation

To install this package, start R (version “3.6”) and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("pulsedSilac")

To install the development version of the package, please do it from the development branch in github.

BiocManager::install('marcpaga/pulsedSilac')

2 Introduction

SILAC (Stable Isotope Label by Amino acids in Cell culture) is a method that is often used to differentially compare protein expression between samples. Pulsed SILAC is a variant that is used to measure protein turnover. Rather than comparing samples that are labeled to completion with either heavy or light isotopes of amino acids, in pulsed SILAC cells are harvested at a series of time points after the growth media containing light amino acids was switched to media containing heavy amino acids. Each sample is then processed and analyzed by mass spectrometry. The isotope ratio each peptide at each time point can now be used to calculate the turnover rate of each measured protein in a proteome-wide fashion (Pratt et al. (2002), Doherty et al. (2005), Schwanhäusser et al. (2009)).

Figure 1: Pulsed SILAC overview

Figure 1: Pulsed SILAC overview

This package contains a set of functions to analyze pulsed SILAC quantitative data. Functions are provided to organize the data, calculate isotope ratios, isotope fractions, model protein turnover, compare turnover models, estimate cell growth and estimate isotope recycling. Several visualization tools are also included to perform basic data exploration, quality control, condition comparison, individual model inspection and model comparison.

3 Data organization

Many Bioconductor packages use SummarizedExperiment objects or derivatives from it to organize omics data. This is a rectangular class of object that contains assays (matrix-like organized data) in which rows represent features (genes, proteins, metabolites, …) and columns represent samples. If you are unfamiliar with this format, please read this introductory vignette.

This package uses two SummarizedExperiment derived classes: SilacProteinExperiment and SilacPeptideExperiment. It also introduces a new class, the SilacProteomicsExperiment, which encapsulates both the previous classes.

3.1 SilacProteinExperiment and SilacPeptideExperiment

The SilacProteinExperiment and SilacPeptideExperiment are, as their name indicates, dedicated to store either protein or peptide related data. These two objects are almost identical to a SummarizedExperiment object. Their only difference is the initialization of the metadata slot with what will be called metaoptions. These metaoptions contain values that are indicative of where certain information, required for the analysis, is located in the object. For example, the conditionCol value is used to indicate which column of colData has the information about the different experiment conditions.

Figure 2: SilacProteinExperiment/SilacPeptideExperiment object structure

Figure 2: SilacProteinExperiment/SilacPeptideExperiment object structure

3.1.1 Object construction

Constructing a SilacProteinExperiment or a SilacPeptideExperiment is extremely similar to constructing a SummarizedExperiment. The only additional arguments are the different metaoptions values.

Here is an example of constructing a SilacProteinExperiment:

require(pulsedSilac)

## assays
assays_protein <- list(expression = matrix(1:9, ncol = 3))

## colData
colData <- data.frame(sample = c('A1', 'A2', 'A3'),
                      condition = c('A', 'A', 'A'),
                      time = c(1, 2, 3))
## rowData
rowData_protein <- data.frame(prot_id = LETTERS[1:3])

## construct the SilacProteinExperiment
protExp <- SilacProteinExperiment(assays = assays_protein, 
                                  rowData = rowData_protein, 
                                  colData = colData, 
                                  conditionCol = 'condition', 
                                  timeCol = 'time')
protExp
## class: SilacProteinExperiment 
## dim: 3 3 
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time

Here is an example of constructing a SilacPeptideExperiment:

## assays
assays_peptide <- list(expression = matrix(1:15, ncol = 3))

## colData
colData <- data.frame(sample = c('A1', 'A2', 'A3'),
                      condition = c('A', 'A', 'A'),
                      time = c(1, 2, 3))
## rowData
rowData_peptide <- data.frame(pept_id = letters[1:5], 
                              prot_id = c('A', 'A', 'B', 'C', 'C')) 
## construct the SilacProteinExperiment
peptExp <- SilacPeptideExperiment(assays = assays_peptide, 
                                  rowData = rowData_peptide, 
                                  colData = colData, 
                                  conditionCol = 'condition', 
                                  timeCol = 'time')
peptExp
## class: SilacPeptideExperiment 
## dim: 5 3 
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

3.1.2 Accessor functions

The accessor functions are exactly the same as in a SummarizedExperiment object: rowData(), colData(), assays() and metadata(). To access the metaoptions slot the both the metaoptions() and metadata() functions can be used.

## assays
assays(protExp)
## List of length 1
## names(1): expression
assays(peptExp)
## List of length 1
## names(1): expression
## rowData
rowData(protExp)
## DataFrame with 3 rows and 1 column
##       prot_id
##   <character>
## 1           A
## 2           B
## 3           C
rowData(peptExp)
## DataFrame with 5 rows and 2 columns
##       pept_id     prot_id
##   <character> <character>
## 1           a           A
## 2           b           A
## 3           c           B
## 4           d           C
## 5           e           C
## colData
colData(protExp)
## DataFrame with 3 rows and 3 columns
##        sample   condition      time
##   <character> <character> <numeric>
## 1          A1           A         1
## 2          A2           A         2
## 3          A3           A         3
colData(peptExp)
## DataFrame with 3 rows and 3 columns
##        sample   condition      time
##   <character> <character> <numeric>
## 1          A1           A         1
## 2          A2           A         2
## 3          A3           A         3
## metaoptions
metaoptions(protExp)
## $conditionCol
## [1] "condition"
## 
## $timeCol
## [1] "time"
metaoptions(peptExp)[['proteinCol']] <- 'prot_id'
metaoptions(peptExp)
## $conditionCol
## [1] "condition"
## 
## $timeCol
## [1] "time"
## 
## $proteinCol
## [1] "prot_id"

3.1.3 Subsetting and aggregation

Similar to the accessor function, most of the SummarizedExperiment operations, such as subsetting and aggregating, can be applied to the two classes. For subsetting these are mainly: [, subset, $; and for aggregation these are mainly: cbind, rbind and merge. For a more detailed explanation please refer to the manual.

## subsetting by rows and columns
protExp[1, 1:2]
## class: SilacProteinExperiment 
## dim: 1 2 
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
peptExp[1, 1:2]
## class: SilacPeptideExperiment 
## dim: 1 2 
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## subsetting by rows based on rowData
subset(protExp, prot_id == 'A')
## class: SilacProteinExperiment 
## dim: 1 3 
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
subset(peptExp, pept_id %in% c('a', 'b'))
## class: SilacPeptideExperiment 
## dim: 2 3 
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## quick acces to colData
protExp$sample
## [1] "A1" "A2" "A3"
peptExp$condition
## [1] "A" "A" "A"
## combining by columns
cbind(protExp[, 1], protExp[, 2:3])
## class: SilacProteinExperiment 
## dim: 3 3 
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
## combining by rows
rbind(peptExp[1:3,], peptExp[4:5, ])
## class: SilacPeptideExperiment 
## dim: 5 3 
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## combine rows and columns
merge(peptExp[1:3, 1], peptExp[3:5, 2:3])
## class: SilacPeptideExperiment 
## dim: 5 3 
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

3.2 SilacProteomicsExperiment

The SilacProteomicsExperiment object contains both a SilacProteinExperiment and a SilacPeptideExperiment object. Therefore it can be used to store both levels of information in a single object. The protein and peptide level are aligned/linked by the colData slot and by the linkerDf slot. The linkerDf slot contains a data.frame that indicates which peptides are assigned to each protein and vice versa.

Figure 3: SilacProteomicsExperiment object structure

Figure 3: SilacProteomicsExperiment object structure

3.2.1 Object construction

To construct a SilacProteomicsExperiment we need both a SilacProteinExperiment object and a SilacPeptideExperiment object.

ProteomicsExp <- SilacProteomicsExperiment(SilacProteinExperiment = protExp, 
                                           SilacPeptideExperiment = peptExp)
ProteomicsExp
## class: SilacProteomicsExperiment 
## metadata(7): idColProt idColPept ... timeCol proteinCol
## protein: 
##   |-- dim: 3 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 5 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

In this case, the levels are only aligned by columns. There is no information on which are the relationships between proteins and peptides. This information can be added in the linkerDf slot. Which contains a data.frame that indicates the relationships between the two levels features. One can be constructed with the buildLinkerDf() function.

## list with the relationships
protein_to_peptide <- list(A = c('a', 'b'), B = c('c'), C = c('d', 'e'))
## function to build the data.frame
linkerDf <- buildLinkerDf(protIDs = LETTERS[1:3], 
                          pepIDs  = letters[1:5], 
                          protToPep = protein_to_peptide)
linkerDf
##   protID pepID protRow pepRow
## 1      A     a       1      1
## 2      A     b       1      2
## 3      B     c       2      3
## 4      C     d       3      4
## 5      C     e       3      5
ProteomicsExp <- SilacProteomicsExperiment(SilacProteinExperiment = protExp, 
                                           SilacPeptideExperiment = peptExp, 
                                           linkerDf = linkerDf)

Linked SilacProteomicsExperiment objects can be useful to apply linked subsetting operations or to recalculate protein level data from peptide data.

3.2.2 Accessor functions

If the slot is general to the SilacProteomicsExperiment object then the same functions as in a SummarizedExperiment can be used.

## colData
colData(ProteomicsExp)
## DataFrame with 3 rows and 3 columns
##        sample   condition      time
##   <character> <character> <numeric>
## 1          A1           A         1
## 2          A2           A         2
## 3          A3           A         3
## linkerDf
linkerDf(ProteomicsExp)
##   protID pepID protRow pepRow
## 1      A     a       1      1
## 2      A     b       1      2
## 3      B     c       2      3
## 4      C     d       3      4
## 5      C     e       3      5
## metaoptions
metaoptions(ProteomicsExp)
## $conditionCol
## [1] "condition"
## 
## $timeCol
## [1] "time"
## 
## $idColProt
## [1] NA
## 
## $idColPept
## [1] NA
## 
## $linkedSubset
## [1] TRUE
## 
## $subsetMode
## [1] "protein"
## 
## $proteinCol
## [1] NA

But for names shared between protein and peptide slots, like assays and rowData, there are two options to access them. Using the SummarizedExperiment generic will return a named list with the two elements.

## assays
assays(ProteomicsExp)
## $protein
## List of length 1
## names(1): expression
## 
## $peptide
## List of length 1
## names(1): expression
## rowData
rowData(ProteomicsExp)
## $protein
## DataFrame with 3 rows and 1 column
##       prot_id
##   <character>
## 1           A
## 2           B
## 3           C
## 
## $peptide
## DataFrame with 5 rows and 2 columns
##       pept_id     prot_id
##   <character> <character>
## 1           a           A
## 2           b           A
## 3           c           B
## 4           d           C
## 5           e           C

One can use the same functions and just add Prot or Pept as suffixes to specifically access a slot at the protein or the peptide level.

## assays of protein level
assaysProt(ProteomicsExp)
## List of length 1
## names(1): expression
## assays of peptide level
assaysPept(ProteomicsExp)
## List of length 1
## names(1): expression
## rowData of protein level
rowDataProt(ProteomicsExp)
## DataFrame with 3 rows and 1 column
##       prot_id
##   <character>
## 1           A
## 2           B
## 3           C
## rowData of peptide level
rowDataPept(ProteomicsExp)
## DataFrame with 5 rows and 2 columns
##       pept_id     prot_id
##   <character> <character>
## 1           a           A
## 2           b           A
## 3           c           B
## 4           d           C
## 5           e           C

Finally, to extract the SilacProteinExperiment and SilacPeptideExperiment one can use the ProtExp and PeptExp functions.

ProtExp(ProteomicsExp)
## class: SilacProteinExperiment 
## dim: 3 3 
## metadata(2): conditionCol timeCol
## assays(1): expression
## rownames: NULL
## rowData names(1): prot_id
## colnames: NULL
## colData names(3): sample condition time
PeptExp(ProteomicsExp)
## class: SilacPeptideExperiment 
## dim: 5 3 
## metadata(3): conditionCol timeCol proteinCol
## assays(1): expression
## rownames: NULL
## rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

3.2.3 Subsetting and aggregation

When subsetting row-wise on a SilacProteomicsExperiment there will be two main questions:

  • To which level (protein or peptide) is the subset applied?
  • Is the subset from one level applied also to the other? For example, when I subset a protein, do I also want to subset also only those peptides related to that protein.

The answer to these questions is written in the metaoptions values. The first one is subsetMode which can be protein or peptide. The second one is linkedSubset, which can be TRUE or FALSE. Moreover, if the latter is set to TRUE then the idColProt and idColPept will also be required.

Here are some examples of using the [ in different situations.

Subset at protein level without affecting the peptide level.

## indicate which rowDat columns have unique ids for proteins and peptides
metaoptions(ProteomicsExp)[['idColProt']] <- 'prot_id'
metaoptions(ProteomicsExp)[['idColPept']] <- 'pept_id'
## indicate that we want to apply the subset at protein level
metaoptions(ProteomicsExp)[['subsetMode']] <- 'protein'
## and not extend it to the peptide level
metaoptions(ProteomicsExp)[['linkedSubset']] <- FALSE

ProteomicsExp[1:2,]
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 2 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 5 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

Note that the protein level has two proteins while the peptide level kept all five peptides.

Subset at protein level and extending the subset to the peptide level.

## to extend we set the metaoption to TRUE
metaoptions(ProteomicsExp)[['linkedSubset']] <- TRUE

ProteomicsExp[1:2,]
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 2 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 3 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

In this case the protein level has two proteins and the peptide level has also been subset to the three peptides that are assigned to the two proteins in the linkerDf.

The same process can also be applied at peptide level.

## indicate that we want to apply the subset at protein level
metaoptions(ProteomicsExp)[['subsetMode']] <- 'peptide'
## to extend we set the metaoption to TRUE
metaoptions(ProteomicsExp)[['linkedSubset']] <- TRUE

ProteomicsExp[1:2,]
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 1 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 2 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

In this case, the first two peptides, which belong to the first protein, are selected along with that protein.

Other subset operations can be done using subsetProt() and subsetPep(), which call the generic subset() on the rowData slot of the protein and peptide level respectively. Note that subsetProt() and subsetPep() will behave differently depending on the metaoption linkedSubset as seen above with [.

## without linked Subset
metaoptions(ProteomicsExp)[['linkedSubset']] <- FALSE
subsetProt(ProteomicsExp, prot_id == 'B')
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 1 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 5 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time
## with linked Subset
metaoptions(ProteomicsExp)[['linkedSubset']] <- TRUE
subsetProt(ProteomicsExp, prot_id == 'B')
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 1 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 1 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

Finally, cbind, rbind and merge can be used for aggregation.

For cbind the number of proteins and peptides must be the same and they are expected to be in the same order.

## cbind
cbind(ProteomicsExp[, 1], ProteomicsExp[, 2])
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 3 2 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 5 2 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

For rbind the number of samples must be the same and they are expected to be in the same order.

## rbind
rbind(ProteomicsExp[1:2,], ProteomicsExp[3,])
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 2 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 3 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

merge can be used when samples have to be aggregated in one object but the proteins/peptides are not the same. It requires IDs that can be matched between experiments.

## merge
merge(ProteomicsExp[1:3, 1], ProteomicsExp[3:4, 2:3])
## class: SilacProteomicsExperiment 
## metadata(7): conditionCol timeCol ... subsetMode proteinCol
## protein: 
##   |-- dim: 3 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(1): prot_id
## peptide: 
##   |-- dim: 4 3 
##   |-- assays(1): expression
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(2): pept_id prot_id
## colnames: NULL
## colData names(3): sample condition time

3.3 metaoptions

The metaoptions are a set of “option” values used to indicate where critical analysis information is required. The idea is that these values are defined at the beginning of the analysis so that it is not required to give them as arguments in each function call. Nevertheless, all the analysis functions can have these values passed as arguments. There are several metaoptions values by default, some of them are shared between the three classes while others are unique.

The following are related to data analysis:

  • conditionCol: which column in colData indicates the different experiment conditions.
  • timeCol: which column in colData indicates the different timepoints of the experiment.
  • proteinCol: which column in rowData indicates the assigned protein to a peptide. (unique to SilacPeptideExperiment)

The following are related to SilacProteomicsExperiment subsetting operations:

  • idColProt: which column in rowDataProt indicates unique protein IDs. (unique to SilacProteomicsExperiment)
  • idColPept: which column in rowDataPept indicates unique peptide IDs. (unique to SilacProteomicsExperiment)
  • linkedSubset: logical that indicates if the subsetting operation should be applied to both data levels (protein and peptide). (unique to SilacProteomicsExperiment)
  • subsetMode: which level should be used for reference when subsetting (protein or peptide). (unique to SilacProteomicsExperiment)

These can be extended by the user to accommodate other experiment design and data analysis operations.

4 Example data

To demonstrate the use of the package functions, a reduced version of the pulsed-SILAC dataset from Visscher et al. (2016) has been built into a SilacProteomicsExperiment object. It is stored in an object named wormsPE. This object has been constructed from the output files from MaxQuant ‘proteinGroups.txt’ and ‘peptides.txt’.

wormsPE
## class: SilacProteomicsExperiment 
## metadata(7): idColProt idColPept ... timeCol proteinCol
## protein: 
##   |-- dim: 250 14 
##   |-- assays(4): int_total int_light int_heavy ratio
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(22): Protein.IDs Majority.protein.IDs ...
##     Peptide.is.razor protein_id
## peptide: 
##   |-- dim: 3574 14 
##   |-- assays(4): int_total int_light int_heavy ratio
## Warning in sprintf(fmt, length(vals), lbls): 2 arguments not used by format '|-- rownames: NULL
## '
##   |-- rownames: NULL
##   |-- rowData names(46): Sequence N.term.cleavage.window ... id
##     Protein.group.IDs
## colnames: NULL
## colData names(2): line timepoint

The dataset contains only the first 250 proteins and their corresponding peptides. It has 4 assays with the total intensity, light isotope and heavy isotope intensities and heavy/light isotope ratio. In the experiment there are 2 worm cell lines: OW40, which express \(\alpha\)-synuclein-YFP (model for Parkinson’s Disease) and OW450, which express YFP only. Worms were harvested after 4h, 6h, 8h, 13h, 24h, 28h and 32h of light to heavy isotope medium transition. Note that the analysis of the data presented in the paper by Visscher et al was not performed using this package. However, all data in that paper has been reanalyzed using this package, which resulted in general the same but also more extended conclusions.

5 Filtering

5.1 Measurements across time

To reliably model a protein’s turnover, several timepoints are required. Ideally a protein would be measured in all the timepoints of the experiment, but this is rarely the case. To count how many proteins/peptides have been measured in each sample the barplotCounts() can be used.

barplotCounts(wormsPE, assayName = 'ratio')

There is some variability between samples but nothing too extreme. More importantly is knowing whether a protein has been measured across different timepoints. The function barplotTimeCoverage() can be used to visualize in how many timepoints each protein has been measured per condition.

barplotTimeCoverage(wormsPE, assayName = 'ratio')

As it can be seen, most of the proteins are not measured in all timepoints. In this case there is a majority for proteins that have not been measured in any timepoint, this is because the dataset has been truncated for only the OW40 and OW450 lines. The original search was one with additional worm lines and the missing proteins were detected in the lines excluded from this vignette.

To filter proteins that have not been detected in sufficient timepoints the filterByMissingTimepoints() function can be used. With maxMissing the amount of missing values can be tuned. With strict proteins have to meet the maxMissing criteria only in one condition or in all.

wormsPE2 <- filterByMissingTimepoints(wormsPE, 
                                      assayName = 'ratio', 
                                      maxMissing = 2, 
                                      strict = FALSE)

barplotTimeCoverage(wormsPE2, assayName = 'ratio')

Because strict is set to FALSE, there are still some proteins that have been detected in less than 5 timepoints in one of the conditions. To visualize the overlap between conditions, the upsetTimeCoverage() function can be used. This can be especially useful when more the experiment contains more than two conditions. Here the overlap of proteins is shown.

upsetTimeCoverage(ProtExp(wormsPE2), 
                  assayName = 'ratio', 
                  maxMissing = 2)

In this case there are 68 proteins that have been measured in at least 5 out of 7 timepoints in both conditions. These would be proteins for which turnover models can be compared between conditions.

Filtering by missing timepoints can also be done at peptide level if desired by changing subsetMode to peptide.

5.2 Other filters

Other filters can be easily applied using the subset method. For example, proteins with a low number of unique peptides can be filtered out.

subsetProt(wormsPE, Unique.peptides > 2)

Also, proteins that are known as common contaminants or identified using the reverse database can be filtered out.

subsetProt(wormsPE, Potential.contaminant != '+')
subsetProt(wormsPE, Reverse != '+')

6 Ratio and fraction calculation

To model protein turnover the isotope fractions from each timepoint are used. The isotope fraction can be calculated in many ways. A simple approach is to derive them from isotope ratios, which many raw mass spectrometry data analysis software report. If not, one can use the calculateIsotopeRatio() function, which will add a new assay named ratio to the object. In this example, the ratio of newly synthesized protein (heavy isotope) over old protein (light isotope) will be used.

## calculate the ratio of new istope over old isotope
wormsPE <- calculateIsotopeRatio(x = wormsPE, 
                                 newIsotopeAssay = 'int_heavy', 
                                 oldIsotopeAssay = 'int_light')

\[ratio = \frac{isotope_{A}}{isotope_{B}}\]

And then the fraction can be calculated as follows:

\[fraction_{A} = \frac{ratio}{1 + ratio}\]

This can be easily done using the calculateIsotopeFraction() function, which adds a new assay named fraction. In this case the heavy isotope fraction is calculated, the light isotope fraction would just be 1-fractionA.

wormsPE <- calculateIsotopeFraction(wormsPE, ratioAssay = 'ratio')

assaysProt(wormsPE)
## List of length 5
## names(5): int_total int_light int_heavy ratio fraction
assaysPept(wormsPE)
## List of length 5
## names(5): int_total int_light int_heavy ratio fraction

Unfortunately, this approach might might lead to the loss some values because mass spectrometry has a bias towards the measurement of high abundance peptides. For example: if a protein has a very slow turnover rate, it might be that at the earliest timepoint, the new isotope signal was below the detection limit and a ratio could not be calculated. To also account for these proteins the calculateIsotopeFraction function has additional arguments to indicate which assays contain each isotope’s expression data and which timepoints are considered “early” or “late”.

wormsPE <- calculateIsotopeFraction(wormsPE, 
                                    newIsoAssay = 'int_heavy',
                                    oldIsoAssay = 'int_light',
                                    earlyTimepoints = 1,
                                    lateTimepoints = 7)

7 Plotting assays

To visualize data stored in the assays slot there are two functions plotDistributionAssay() and scatterCompareAssays(). The first one will plot the distributions of the data for a selected assay for each timepoint and condition. The latter will plot each protein/peptide value of a condition against the other for each timepoint.

7.1 plotDistributionAssay()

plotDistributionAssay(wormsPE, assayName = 'fraction')