This file contains all the commands performing a default SWATH ion library generation at the FGCZ. This document is usually triggered by the B-Fabric system (Panse, Christian, Trachsel, and Türker 2022) and is meant for training and reproducibility.
specL 1.34.0
In the first step, the peptide identification result is generated by a standard shotgun proteomics experiment and has to be processed using the bibliospec software (Frewen and MacCoss 2007).
For generating the ion library, the specL is used. The workflow is described in (???).
The following R package has to be installed on the compute box.
library(specL)
## Loading required package: DBI
## Loading required package: protViz
## Loading required package: RSQLite
## Loading required package: seqinr
##
## Attaching package: 'specL'
## The following objects are masked from 'package:protViz':
##
## plot.psm, plot.psmSet, summary.psmSet
This file can be rendered by using the following code snippet.
library(rmarkdown)
library(BiocStyle)
report_file <- tempfile(fileext='.Rmd');
file.copy(system.file("doc", "report.Rmd",
package = "specL"),
report_file);
rmarkdown::render(report_file,
output_format='html_document',
output_file='/tmp/report_specL.html')
If no INPUT
is defined, the report uses the specL package’s data and the following default parameters.
if(!exists("INPUT")){
INPUT <- list(FASTA_FILE
= system.file("extdata", "SP201602-specL.fasta.gz",
package = "specL"),
BLIB_FILTERED_FILE
= system.file("extdata", "peptideStd.sqlite",
package = "specL"),
BLIB_REDUNDANT_FILE
= system.file("extdata", "peptideStd_redundant.sqlite",
package = "specL"),
MIN_IONS = 5,
MAX_IONS = 6,
MZ_ERROR = 0.05,
MASCOTSCORECUTOFF = 17,
FRAGMENTIONMZRANGE = c(300, 1250),
FRAGMENTIONRANGE = c(5, 200),
NORMRTPEPTIDES = specL::iRTpeptides,
OUTPUT_LIBRARY_FILE = tempfile(fileext ='.csv'),
RDATA_LIBRARY_FILE = tempfile(fileext ='.RData'),
ANNOTATE = TRUE
)
}
The library generation workflow was performed using the following parameters:
parameter.values | |
---|---|
FASTA_FILE | /tmp/RtmpUmHZSP/Rinst263036532fbe25/specL/extdata/SP201602-specL.fasta.gz |
BLIB_FILTERED_FILE | /tmp/RtmpUmHZSP/Rinst263036532fbe25/specL/extdata/peptideStd.sqlite |
BLIB_REDUNDANT_FILE | /tmp/RtmpUmHZSP/Rinst263036532fbe25/specL/extdata/peptideStd_redundant.sqlite |
MIN_IONS | 5 |
MAX_IONS | 6 |
MZ_ERROR | 0.05 |
MASCOTSCORECUTOFF | 17 |
FRAGMENTIONMZRANGE | 300, 1250 |
FRAGMENTIONRANGE | 5, 200 |
OUTPUT_LIBRARY_FILE | /tmp/RtmpVeXafY/file2637b376397fa9.csv |
RDATA_LIBRARY_FILE | /tmp/RtmpVeXafY/file2637b37f601d8b.RData |
The following R helper function is used for composing the in-silico fragment ions using protViz.
fragmentIonFunction_specL <- function (b, y) {
Hydrogen <- 1.007825
Oxygen <- 15.994915
Nitrogen <- 14.003074
b1_ <- (b )
y1_ <- (y )
b2_ <- (b + Hydrogen) / 2
y2_ <- (y + Hydrogen) / 2
return( cbind(b1_, y1_, b2_, y2_) )
}
BLIB_FILTERED <- read.bibliospec(INPUT$BLIB_FILTERED_FILE)
## fetched 137 rows.
## assigning 28 modifications ...
summary(BLIB_FILTERED)
## Summary of a "psmSet" object.
## Number of precursor:
## 137
## Number of precursors in Filename(s)
## _methods\20140910_01_fetuin_400amol_1.raw 21
## _methods\20140910_07_fetuin_400amol_2.raw 116
## Number of annotated precursor:
## 0
BLIB_REDUNDANT <- read.bibliospec(INPUT$BLIB_REDUNDANT_FILE)
## fetched 184 rows.
## assigning 37 modifications ...
summary(BLIB_REDUNDANT)
## Summary of a "psmSet" object.
## Number of precursor:
## 184
## Number of precursors in Filename(s)
## _methods\20140910_01_fetuin_400amol_1.raw 32
## _methods\20140910_07_fetuin_400amol_2.raw 152
## Number of annotated precursor:
## 0
After processing the psm using bibliospec, the protein information is
gone. The read.fasta
function is provided by the CRAN package
seqinr.
if(INPUT$ANNOTATE){
FASTA <- read.fasta(INPUT$FASTA_FILE,
seqtype = "AA",
as.string = TRUE)
BLIB_FILTERED <- annotate.protein_id(BLIB_FILTERED,
fasta = FASTA)
}
## start protein annotation ...
## time taken: 0.00124747753143311 minutes
The following peptides are used for retention time (RT) normalization.
The last column indicates by FALSE|TRUE
if a peptide is included in the
data. The rows were ordered by the RT values.
peptide | rt | included | |
---|---|---|---|
1 | LGGNEQVTR | -24.92000 | FALSE |
21 | LGGNETQVR | -24.92000 | FALSE |
2 | GAGSSEPVTGLDAK | 0.00000 | TRUE |
22 | AGGSSEPVTGLADK | 0.00000 | FALSE |
3 | AAVYHHFISDGVR | 10.48963 | FALSE |
4 | VEATFGVDESNAK | 12.39000 | TRUE |
23 | VEATFGVDESANK | 12.39000 | FALSE |
5 | YILAGVENSK | 19.79000 | FALSE |
24 | YILAGVESNK | 19.79000 | FALSE |
6 | HIQNIDIQHLAGK | 23.93091 | FALSE |
7 | TPVISGGPYEYR | 28.71000 | TRUE |
25 | TPVISGGPYYER | 28.71000 | FALSE |
8 | TPVITGAPYEYR | 33.38000 | TRUE |
26 | TPVITGAPYYER | 33.38000 | FALSE |
9 | DGLDAASYYAPVR | 42.26000 | TRUE |
27 | GDLDAASYYAPVR | 42.26000 | FALSE |
10 | TEVSSNHVLIYLDK | 43.54062 | FALSE |
11 | ADVTPADFSEWSK | 54.62000 | TRUE |
28 | DAVTPADFSEWSK | 54.62000 | FALSE |
12 | LVAYYTLIGASGQR | 64.15480 | FALSE |
13 | GTFIIDPGGVIR | 70.52000 | TRUE |
29 | TGFIIDPGGVIR | 70.52000 | FALSE |
14 | TEHPFTVEEFVLPK | 74.50968 | FALSE |
15 | TTNIQGINLLFSSR | 84.36927 | FALSE |
16 | GTFIIDPAAVIR | 87.23000 | FALSE |
30 | GTFIIDPAAIVR | 87.23000 | FALSE |
17 | LFLQFGAQGSPFLK | 100.00000 | TRUE |
31 | FLLQFGAQGSPLFK | 100.00000 | FALSE |
18 | NQGNTWLTAFVLK | 104.06935 | FALSE |
19 | DSPVLIDFFEDTER | 112.63426 | FALSE |
20 | ITPNLAEFAFSLYR | 122.24622 | FALSE |
specLibrary <- specL::genSwathIonLib(
data = BLIB_FILTERED,
data.fit = BLIB_REDUNDANT,
max.mZ.Da.error = INPUT$MZ_ERROR,
topN = INPUT$MAX_IONS,
fragmentIonMzRange = INPUT$FRAGMENTIONMZRANGE,
fragmentIonRange = INPUT$FRAGMENTIONRANGE,
fragmentIonFUN = fragmentIonFunction_specL,
mascotIonScoreCutOFF = INPUT$MASCOTSCORECUTOFF,
iRT = INPUT$NORMRTPEPTIDES
)
Total Number of PSM’s with Mascot e-value < 0.05, in your search, is 184. The number of unique precursors is 137. The size of the generated ion library is 131. That means that 95.62 % of the unique precursors fulfilled the filtering criteria.
summary(specLibrary)
## Summary of a "specLSet" object.
##
## Parameter:
##
## Number of precursor (q1 and peptideModSeq) = 131
## Number of unique precursor
## (q1.in-silico and peptideModSeq) = 122
## Number of iRT peptide(s) = 8
## Which std peptides (iRTs) where found in which raw files:
## _methods\20140910_01_fetuin_400amol_1.raw GAGSSEPVTGLDAK
## _methods\20140910_01_fetuin_400amol_1.raw TPVITGAPYEYR
## _methods\20140910_01_fetuin_400amol_1.raw VEATFGVDESNAK
## _methods\20140910_07_fetuin_400amol_2.raw ADVTPADFSEWSK
## _methods\20140910_07_fetuin_400amol_2.raw DGLDAASYYAPVR
## _methods\20140910_07_fetuin_400amol_2.raw GTFIIDPGGVIR
## _methods\20140910_07_fetuin_400amol_2.raw LFLQFGAQGSPFLK
## _methods\20140910_07_fetuin_400amol_2.raw TPVISGGPYEYR
##
## Number of transitions frequency:
## 5 16
## 6 115
##
## Number of annotated precursor = 1855
## Number of file(s)
## 2
##
## Number of precursors in Filename(s)
## _methods\20140910_01_fetuin_400amol_1.raw 19
## _methods\20140910_07_fetuin_400amol_2.raw 112
##
## Misc:
##
## Memory usage = 763280 bytes
In the following two code snippets the first element of the ion library is displayed:
# slotNames(specLibrary@ionlibrary[[1]])
specLibrary@ionlibrary[[1]]
## An "specL" object.
##
##
## content:
## group_id = ADQPQC[+57.0]LSLAWSTDGQTLFAGYSDNTIR.3
## peptide_sequence = ADQPQCLSLAWSTDGQTLFAGYSDNTIR
## proteinInformation = sp|O18640|GBLP_DROME
## q1 = 1039.151
## q1.in_silico = 3172.464
## q3 = 925.436 1143.542 996.4756 705.3505 868.4149 503.2933
## q3.in_silico = 925.4374 1143.543 996.4745 705.3526 868.4159 503.2936
## prec_z = 3
## frg_type = y y y y y y
## frg_nr = 8 10 9 6 7 4
## frg_z = 1 1 1 1 1 1
## relativeFragmentIntensity = 100 56 56 35 14 11
## irt = 95.97
## peptideModSeq = ADQPQC[+57.0]LSLAWSTDGQTLFAGYSDNTIR
## mZ.error = 0.001407 0.001031 0.001095 0.002066 0.001004 0.000286
## \ctrachse_20140910_Nuclei_diff_extraction_methods\20140910_07_fetuin_400amol_2.raw
## score = 15.83609
##
## size:
## Memory usage: 4224 bytes
plot(specLibrary@ionlibrary[[1]])
plot(specLibrary)