1 Introduction

This R package implements statistical modelling of affinity purification–mass spectrometry (AP-MS) data to compute confidence scores to identify bona fide protein-protein interactions (PPI).

2 Prepare Input Data

Prepare input data into the dataframe datInput with the following format:

idRun idBait idPrey countPrey lenPrey
AP-MS run ID Bait ID Prey ID Prey peptide count Prey protein length
library(SMAD)
data("TestDatInput")
head(TestDatInput)
#>   idRun idBait   idPrey countPrey lenPrey
#> 1 70380  ISG20    ARPC2        17     300
#> 2 70380  ISG20     RPL4         8     427
#> 3 70380  ISG20 MARCKSL1         4     195
#> 4 70380  ISG20     RCN1         4     331
#> 5 70380  ISG20     YBX1         9     324
#> 6 70380  ISG20    BASP1         6     227

The test data is subset from the unfiltered BioPlex 2.0 data, which consists of apoptosis proteins as baits.

3 Methods

3.1 CompPASS

Comparative Proteomic Analysis Software Suite (CompPASS) is based on spoke model. This algorithm was developed by Dr. Mathew Sowa for defining the human deubiquitinating enzyme interaction landscape (Sowa, Mathew E., et al., 2009). The implementation of this algorithm was inspired by Dr. Sowa’s online tutorial. The output includes Z-score, S-score, D-score and WD-score. In its implementation in BioPlex 1.0 (Huttlin, Edward L., et al., 2015) and BioPlex 2.0 (Huttlin, Edward L., et al., 2017), a naive Bayes classifier that learns to distinguish true interacting proteins from non-specific background and false positive identifications was included in the compPASS pipline. This function was optimized from the source code.

scoreCompPASS <- CompPASS(TestDatInput)
head(scoreCompPASS)
#>   idBait idPrey AvePSM      scoreZ    scoreS    scoreD Entropy    scoreWD
#> 1  AIFM3   ACTB     19 -0.62875216  4.358899  4.358899       0 0.05209068
#> 2  AIFM3  ACTC1     15  0.03766105  4.135851  4.135851       0 0.06942625
#> 3  AIFM3  ACTN2      2 -0.36323536  2.081666  2.081666       0 0.05954275
#> 4  AIFM3  ACTN4      5 -1.00689296  2.271284  2.271284       0 0.05000186
#> 5  AIFM3   AHCY      6 -0.58996199  2.571025  2.571025       0 0.04553311
#> 6  AIFM3  AIFM1     20  1.95142442 10.871146 10.871146       0 0.41566770

Based on the scores, bait-prey interactions could be ranked and ready for downstream analyses.

3.2 HGScore

HGScore Scoring algorithm based on a hypergeometric distribution error model (Hart et al., 2007) with incorporation of NSAF (Zybailov, Boris, et al., 2006). This algorithm was first introduced to predict the protein complex network of Drosophila melanogaster (Guruharsha, K. G., et al., 2011). This scoring algorithm was based on matrix model. Unlike CompPASS, we need protein length for each prey in the additional column.

scoreHG <- HG(TestDatInput)
head(scoreHG)
#>   InteractorA InteractorB         HG
#> 1         A2M       ABCB1 8.27071788
#> 2         A2M       ABCC4 7.18157099
#> 3         A2M       ABCD3 0.75126610
#> 4         A2M      ACADSB 2.70799149
#> 5         A2M       ACAT1 0.09892875
#> 6         A2M        ACLY 0.38878476

Noted that HG scoring implements matrix models which leads to significant increase of inferred protein-protein interactions.