analyzePFAM {IsoformSwitchAnalyzeR}R Documentation

Import Result of PFAM analysis

Description

Allows for easy integration of the result of Pfam (external sequence analysis of protein domains) in the IsoformSwitchAnalyzeR workflow. Please note that due to the 'removeNoncodinORFs' of the analyzeCPAT argument we recommend using analyzeCPAT before analyzePFAM and analyzeSignalP if you have predicted the ORFs with analyzeORF.

Usage

analyzePFAM(
    switchAnalyzeRlist,
    pathToPFAMresultFile,
    showProgress=TRUE,
    quiet=FALSE
)

Arguments

switchAnalyzeRlist

A switchAnalyzeRlist object

pathToPFAMresultFile

A string indicating the full path to the Pfam result file. See details for suggestion of how to run and obtain the result of the Pfam tool.

showProgress

A logic indicating whether to make a progress bar (if TRUE) or not (if FALSE). Default is TRUE.

quiet

A logic indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE

Details

Notes for how to run the external tools:
Use default paramters. If you want to use the webserver it is easily done as follows:. 1) Go to https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan 2) Switch to the the "Upload a File" tab. 3) Upload the amino acoid file (_AA) created with extractSequence file and add your mail adress - this is important beacue there is currently no way of downloading the web output so you need them to send the result to your email. 4) Check Pfam is selected in the "HMM database" window. 5) Submit your job. 6) Wait till you recieve the email with the result (usually quite fast). 7) Copy/paste the result part of the (ONLY what is below the line starting with "seq id") into an empty plain text document (notepad, sublimetext TextEdit or similar (not word)). 8) Save the document and supply the path to that document to analyzePFAM()

To run PFAM locally you should use the pfam_scan.pl script as described in the readme at ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/ and supply the path to the result file to analyzePFAM().

Protein domains are only added to isoforms annotated as having an ORF even if other isoforms exists in the file. This means if you quantify the same isoform many times you can just run pfam once on all isoforms and then supply the entire file to analyzePFAM().

Value

A colum called 'domain_identified' is added to isoformFeatures containing a binary indication (yes/no) of whether a transcript contains any protein domains or not. Furthermore the data.frame 'domainAnalysis' is added to the switchAnalyzeRlist containing the details about domain names(s) and position for each transcript (where domain(s) were found).

The data.frame added have one row per isoform and contains the columns:

Furthermore depending on the exact tool used (local vs web-server) additional collums are added with inforation such as E score and type.

Author(s)

Kristoffer Vitting-Seerup

References

See Also

createSwitchAnalyzeRlist
extractSequence
analyzeCPAT
analyzeSignalP
analyzeSwitchConsequences

Examples

### Load example data (matching the result files also store in IsoformSwitchAnalyzeR)
data("exampleSwitchListIntermediary")
exampleSwitchListIntermediary

### Add PFAM analysis
exampleSwitchListAnalyzed <- analyzePFAM(
    switchAnalyzeRlist   = exampleSwitchListIntermediary,
    pathToPFAMresultFile = system.file("extdata/pfam_results.txt", package = "IsoformSwitchAnalyzeR"),
    showProgress=FALSE
    )

exampleSwitchListAnalyzed

[Package IsoformSwitchAnalyzeR version 1.4.0 Index]