Fetching IUPred Predictions of Intrinsic Disorder

Quick Start

The functions iupred(), iupredAnchor(), and iupredRedox() are all designed to fetch predictions of intrinsic disorder from the IUPred2A REST API. To fetch results, a UniProt ID is needed.

Predictions are made on a scale of 0-1, where any residues with a score over 0.5 are predicted to be in a disordered region, and any residue scoring below 0.5 are predicted to be ordered.

iupredAnchor() and iupredRedox() provide contet-dependent predictions.

library(idpr) #Attach the package

p53_ID <- "P04637"
iupred(p53_ID,
       proteinName = "HUMAN P53")

If you use any iupred-based function, please cite the appropriate articles. The following are the most recent papers (as of 6/2020)

Background

The primary structure of a protein, also known as the amino acid sequence, can be an accurate predictor of protein folding. Because of this, many different tools have been developed to make probabilistic predictions of intrinsic disorder based on various known properties of Intrinsically Disordered Proteins (IDPs) (Li et al., 2015).

Some examples of these come from IUPred2A. IUPred2 analyzes an amino acid sequence and returns a score of intrinsic disorder depending on a model of the estimated energy potential for residue interactions (Mészáros, Erdős, & Dosztányi, 2018). This is because structured proteins have the ability to create a network of interactions, while IDPs lack abundant interactions. The reduced number of interactions leads to an IDP’s lack of secondary and tertiary structure (Dosztányi, 2018).

Predictions are made on a scale of 0-1, where any residues with a score over 0.5 are predicted to be in a disordered region, and any residue scoring below 0.5 are predicted to be ordered.

The IUPred2A website is located at https://iupred2a.elte.hu/. For detailed information on using IUPred2A, please refer to Erdős & Dosztányi (2020) “Analyzing protein disorder with IUPred2A”. Current Protocols in Bioinformatics, 70, e99. Additionally, please see Mészáros et al. (2019) for further information, theory, and applications of IUPred2A.

All iupred functions in idpr make a connection to the IUPred2A REST API based on the type of analysis and UniProt accession number within the function’s arguments. To make this connection, the user must provide the UniProt accession number for their protein of interest (UniProt Consortium, 2019). Additionally, the user will need a connection to the internet. The results are then formatted to match output of other idpr functions.

The results for all predictions fetched can be represented graphically, with plotResults = TRUE (default) or as a data frame with plotResults = FALSE. Both type of results will be shown for examples.

Installation

The idpr package can be installed from Bioconductor with the following line of code. It requires the BiocManager package to be installed.

#BiocManager::install("idpr")

The most recent version of the package can be installed with the following line of code. This requires the devtools package to be installed.

#devtools::install_github("wmm27/idpr")

iupred function and iupredType arguments.

iupred() has is the core prediction of intrinsic disorder. The argument iupredType = is important to specify depending on the goal of the analysis.

plotResults = FALSE returns a data frame with 3 columns. The first column is “Position”, which indicates the numeric position of the residue in the submitted sequence. The second column is “AA”, which indicates the amino acid residue as a single letter. The third column is “IUPred2”, which indicates the prediction of intrinsic disorder by IUPred2.

library(idpr) #Attach the package

iupredType = “long”

iupredType = “long” is the default setting, and the setting that is recommended for predicting intrinsic disorder in proteins (Dosztányi, 2018). This predits relevant disordered segments like those curated within the DisProt Database (Dosztányi, 2018; Hatos et al., 2019).

p53_ID <- "P04637"
iupred(p53_ID,
       proteinName = "HUMAN P53",
       iupredType = "long")

iupredLongDF <- iupred(p53_ID,
                       proteinName = "HUMAN P53",
                       iupredType = "long",
                       plotResults = FALSE)
head(iupredLongDF)
#>   Position AA  IUPred2
#> 1        1  M 0.980739
#> 2        2  E 0.984871
#> 3        3  E 0.987531
#> 4        4  P 0.969315
#> 5        5  Q 0.965700
#> 6        6  S 0.971713

iupredType = “short”

iupredType = “short” is the setting to predict small regions of intrinsic disorder in proteins, optimized for missing regions of protein structures saved to the Protein Databank (PDB). Its goal is to predict regions that are not represented in crystallographic experiments. It is important to note that this tends to favor disorder at the N- and C- terminus (Dosztányi, 2018).

p53_ID <- "P04637"
iupred(p53_ID,
       proteinName = "HUMAN P53",
       iupredType = "short")

iupredShortDF <- iupred(p53_ID,
                        iupredType = "short",
                        plotResults = FALSE)
head(iupredShortDF)
#>   Position AA IUPred2
#> 1        1  M  0.9935
#> 2        2  E  0.9904
#> 3        3  E  0.9867
#> 4        4  P  0.9624
#> 5        5  Q  0.9447
#> 6        6  S  0.9348

iupredType = “glob”

iupredType = “glob” is the setting that is to help reduce the noise of small disordered regions in otherwise ordered regions and to help identify sequences that are likely to have a specific and rigid fold. (Dosztányi, 2018).

p53_ID <- "P04637"
iupred(p53_ID,
       proteinName = "HUMAN P53",
       iupredType = "glob")

iupredGlobDF <- iupred(p53_ID,
                       iupredType = "glob",
                       plotResults = FALSE)
head(iupredGlobDF)
#>   Position AA  IUPred2
#> 1        1  M 0.971713
#> 2        2  E 0.976226
#> 3        3  E 0.976226
#> 4        4  P 0.936162
#> 5        5  Q 0.932927
#> 6        6  S 0.947281

iupredAnchor

IDPs and IDRs serve many important roles in a cell, one prominent role is the ability to act as a hub for protein-protein interactions (Uversky, 2013). Additionally, many disordered regions undergo what is known as “induced folding”. This is a phenomenon where under native conditions the IDP is unstructured, however when entering a specific environment, such as those that occur when binding to other proteins, higher-order structures may form and allow the IDP to execute its function (Kovacs, Szabo, Pancsa, & Tompa, 2013). It is important to note that not all IDPs experience induced folding.

ANCHOR2 is a context-dependent predictor of binding regions for protein-protein interactions (Mészáros et al., 2018). Similarly to IUPred2, ANCHOR2 gives a score of 0-1 indicating if a region is predicted to be involved in protein-protein interactions.

iupredAnchor() is used to combine the output of IUPred2 long (plot is the same as shown prior) with ANCHOR2 predictions (shown as a maroon line).

p53_ID <- "P04637"
iupredAnchor(p53_ID,
             proteinName = "HUMAN P53")

The data frame for iupredAnchor has a similar layout to iupred(), with an additional column for ANCHOR2 scores.

iupredAnchorDF <- iupredAnchor(p53_ID, 
                               plotResults = FALSE)
head(iupredAnchorDF)
#>   Position AA  IUPred2   ANCHOR2
#> 1        1  M 0.980739 0.8488478
#> 2        2  E 0.984871 0.8330916
#> 3        3  E 0.987531 0.8186782
#> 4        4  P 0.969315 0.8124137
#> 5        5  Q 0.965700 0.8040271
#> 6        6  S 0.971713 0.7987758

iupredRedox

Another factor influencing the environmental chemistry is the redox potential. As mentioned before, under native conditions IDPs are unstructured, however when entering a different environment higher-order structures may form and allow IDPs to execute their function (Kovacs et al., 2013).

iupredRedox() is used to predict redox-sensitive regions that may experience induced folding upon changing environments. This is a context-dependent predictor of disordered regions depending on a reducing (plus) or oxidizing (minus) environment. The prediction is done by replacing all cystine residues to serine when simulating a reducing or “redox-plus” environment. This eliminates any structural stabilization by disulfide bonds (Mészáros et al., 2018).

Redox-plus predictions are shown in blue, Redox-minus predictions are shown in purple. Any region identified as “Redox Sensitive” will be highlighted in light green (does not appear if there are no sensitive regions predicted).

p53_ID <- "P04637"
iupredRedox(p53_ID,
             proteinName = "HUMAN P53")

The data frame has two IUPred2 long scores. One in a redox-plus environment (Cys –> Ser) and a redox-minus environment (standard prediction). An additional column is provided of logical values indicating if a redox sensitive region was predicted. When redoxSensitive == TRUE, the residue is predicted to be in a redox sensitive region, when FALSE the residue is not predicted to be in a redox sensitive region.

iupredRedoxDF <- iupredRedox(p53_ID, 
                             plotResults = FALSE)
head(iupredRedoxDF)
#>   Position AA iupredPlus iupredMinus redoxSensitive
#> 1        1  M   0.980739    0.980739          FALSE
#> 2        2  E   0.984871    0.984871          FALSE
#> 3        3  E   0.987531    0.987531          FALSE
#> 4        4  P   0.969315    0.969315          FALSE
#> 5        5  Q   0.965700    0.965700          FALSE
#> 6        6  S   0.971713    0.971713          FALSE

Additional Example

While the aesthetics of the plots above are meant to represent a middleground of the graphics available on and the other plots generated by idpr, a user may wish to use the data frames for data analysis or unique graphics. Another way to represent the data is using the sequenceMap() function.

iupredLongDF <- iupred(p53_ID,
                       proteinName = "HUMAN P53",
                       iupredType = "long",
                       plotResults = FALSE)

sequenceMap(sequence = iupredLongDF$AA,
            property = iupredLongDF$IUPred2,
            customColors = c("darkolivegreen3", "grey65", "darkorchid1")) + 
  ggplot2::labs(title = "Prediction of Intrinsic Disorder in HUMAN P53",
                subtitle = "By IUPred2A long")

For further details, please refer to idpr’s “Sequence Map Vignette” file.

Getting the UniProt Accession

To make a connection to the IUPred2A REST API, a UniProt Accession ID is required. If a user does not have the ID, it is reccomended to first search for it via the UniProt website at https://www.uniprot.org/ . If a user does not have the protein name or info to search, a BLAST search on UniProt may be helpful at https://www.uniprot.org/blast/ (UniProt Consortium, 2019).

Use

Please note that these functions are only meant to access the IUPred2A REST API. The functions within idpr are not designed by the IUPred2A developers. The authors of idpr do not control, manage, or maintain any aspect of IUPred2A. Therefore, idpr is unable to guarantee access to the API.

The user MUST follow the IUPred2A Terms of Use in addition to the terms for use of idpr.

When publishing or using any data generated with IUPred2A, the user must cite the appropriate publication(s) for the IUPred2A service. This may change as the program updates or improves. idpr does not control updates to IUPred2A.

The current website (as of 10/15/20) for IUPred2A is found here: https://iupred2a.elte.hu/. The authors of idpr strongly recommend visiting this page to follow any updates and changes as well as confirming appropriate use per the IUPred2A terms of use.

References

Dosztányi, Z. (2018). Prediction of protein disorder based on IUPred. Protein Sci, 27(1), 331-340. doi:10.1002/pro.3334

Erdős, G., & Dosztányi, Z. (2020). Analyzing protein disorder with IUPred2A. Current Protocols in Bioinformatics, 70, e99. https://doi.org/10.1002/cpbi.99

Hatos, A., Hajdu-Soltész, B., Monzon, A. M., Palopoli, N., Álvarez, L., Aykac-Fas, B., . . . Piovesan, D. (2019). DisProt: intrinsic protein disorder annotation in 2020. Nucleic acids research, 48(D1), D269-D276. doi:10.1093/nar/gkz975

Kovacs, D., Szabo, B., Pancsa, R., & Tompa, P. (2013). Intrinsically disordered proteins undergo and assist folding transitions in the proteome. Archives of Biochemistry and Biophysics, 531(1), 80-89. doi:https://doi.org/10.1016/j.abb.2012.09.010

Li, J., Feng, Y., Wang, X., Li, J., Liu, W., Rong, L., & Bao, J. (2015). An Overview of Predictors for Intrinsically Disordered Proteins over 2010-2014. International journal of molecular sciences, 16(10), 23446-23462. doi:10.3390/ijms161023446

Mészáros, B., Erdős, G., & Dosztányi, Z. (2018). IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research, 46(W1), W329-W337.

UniProt Consortium. (2019). UniProt: a worldwide hub of protein knowledge. Nucleic acids research, 47(D1), D506-D515.

Uversky, V. N. (2013). Unusual biophysics of intrinsically disordered proteins. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 1834(5), 932-951. doi:https://doi.org/10.1016/j.bbapap.2012.12.008

citation("ggplot2")
#> 
#> To cite ggplot2 in publications, please use:
#> 
#>   H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
#>   Springer-Verlag New York, 2016.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Book{,
#>     author = {Hadley Wickham},
#>     title = {ggplot2: Elegant Graphics for Data Analysis},
#>     publisher = {Springer-Verlag New York},
#>     year = {2016},
#>     isbn = {978-3-319-24277-4},
#>     url = {https://ggplot2.tidyverse.org},
#>   }

Additional Information

R Version

R.version.string
#> [1] "R version 4.1.1 (2021-08-10)"

System Information

as.data.frame(Sys.info())
#>                                                 Sys.info()
#> sysname                                              Linux
#> release                                   5.4.0-88-generic
#> version        #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021
#> nodename                                         nebbiolo2
#> machine                                             x86_64
#> login                                            biocbuild
#> user                                             biocbuild
#> effective_user                                   biocbuild

citation()

To cite R in publications use:

R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

A BibTeX entry for LaTeX users is

@Manual{, title = {R: A Language and Environment for Statistical Computing}, author = {{R Core Team}}, organization = {R Foundation for Statistical Computing}, address = {Vienna, Austria}, year = {2021}, url = {https://www.R-project.org/}, }

We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also ‘citation(“pkgname”)’ for citing R packages.

IUPred Vignette

William McFadden

2021-10-26