importRdata {IsoformSwitchAnalyzeR} | R Documentation |
A general-purpose interface to constructing a switchAnalyzeRlist from Standard R objects containing expression and annoatation information. The data needed for this function are
1
: Normalized biological replicate isoform expression data
2
: Isoform annotation (both genomic exon coordinates and which gene the isoform belongs to). This can also be supplied as the path to a GTF file where the information can be found.
3
: A design matrix indicating which samples belong to which condition
Furthermore it's possible to specify which comparisons to make using the comparisonsToMake
(default is all possible pairwise of the once indicated by the design matrix).
importRdata( isoformCountMatrix, isoformRepExpression, designMatrix, isoformExonAnnoation, comparisonsToMake=NULL, addAnnotatedORFs=TRUE, onlyConsiderFullORF=FALSE, removeNonConvensionalChr=FALSE, PTCDistance=50, foldChangePseudoCount=0.01, showProgress=TRUE, quiet=FALSE )
isoformCountMatrix |
A data.frame with unfiltered biological (not technical) replicate isoform (estimated) fragment counts. Must have a column called 'isoform_id' with the isoform_id that matches the isoform_id in |
isoformRepExpression |
Optional but recommended: A data.frame with unfiltered normalized biological (not technical) replicate isoform expression. Ideal for supplying quantification measured in Transcripts Per Million (TxPM) or RPKM/FPKM. Must have a column called 'isoform_id' with the isoform_id. The name of the expression columns must match the sample names in the |
designMatrix |
A data.frame with the information of which samples originate from which conditions. Must be a data.frame containing these two collums:
Additional columns can be used to describe other co-factors such as batch effects or patient ids (for paired sample analysis). Additional co-factors can only be analyzed with |
isoformExonAnnoation |
Can either be:
|
comparisonsToMake |
A data.frame indicating which pairwise comparisons the switchAnalyzeRlist created should contain. The two columns, called 'condition_1' and 'condition_2' indicate which conditions should be compared and the strings indicated here must match the strings in the |
addAnnotatedORFs |
Only used if a GTF file is supplied to |
onlyConsiderFullORF |
A logic indicating whether the ORFs added should only be added if they are fully annotated. Here fully annotated is defined as those that both have a annotated 'start_codon' codon in the 'type' column (column 3). This argument exists because these CDS regions are highly problematic and does not resemble true ORFs as >50% of CDS without a stop_codon annotated contain multiple stop codons (see Vitting-Seerup et al 2017 - supplementary materials). This argument is only considered if onlyConsiderFullORF=TRUE. Default is FALSE. |
removeNonConvensionalChr |
A logic indicating whether non-conventional chromosomes, here defined as chromosome names containing a '_'. These regions are typically used to annotate regions that cannot be assocaiated to a specific region (such as the human 'chr1_gl000191_random') or regions quite different due to different haplotypes (e.g. the 'chr6_cox_hap2'). This argument is only considered if a GTF file was supplied to |
PTCDistance |
Only used if a GTF file is supplied to |
foldChangePseudoCount |
A numeric indicating the pseudocount added to each of the average expression values before the log2 fold change is calculated. Done to prevent log2 fold changes of Inf or -Inf. Default is 0.01 |
showProgress |
A logic indicating whether to make a progress bar (if TRUE) or not (if FALSE). Default is FALSE. |
quiet |
A logic indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE |
For each gene in each replicate sample the expression of all isoforms belonging to that gene (as annotated in isoformExonAnnoation
) are summed to get the gene expression. It is therefore very important that the isoformRepExpression
is unfiltered. For each gene/isoform in each condition (as indicate by designMatrix
) the mean and standard error (of mean (measurement), s.e.m) are calculated. Since all samples are considered it is very important the isoformRepExpression
does not contain technical replicates. The comparison indicated comparisonsToMake
(or all pairwise if not supplied) is then constructed and the mean gene and isoform expression values are then used to calculate log2 fold changes (using foldChangePseudoCount
) and Isoform Fraction (IF) values. The whole analysis is then wrapped in a SwitchAnalyzeRlist.
Changes in isoform usage are measure as the difference in isoform fraction (dIF) values, where isoform fraction (IF) values are calculated as <isoform_exp> / <gene_exp>.
A SwitchAnalyzeRlist containing the data supplied stored into the SwitchAnalyzeRlist format (created by createSwitchAnalyzeRlist()
). For detials about the format see details of createSwitchAnalyzeRlist
.
If a GTF file was supplied to isoformExonAnnoation
and addAnnotatedORFs=TRUE
a data.frame
containing the details of the ORF analysis have been added to the switchAnalyzeRlist under the name 'orfAnalysis'. The data.frame added have one row pr isoform and contains 11 columns:
isoform_id
: The name of the isoform analyzed. Mathces the 'isoform_id' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist
orfTransciptStart
: The start position of the ORF in transcript cooridnats, here defined as the position of the 'A' in the 'AUG' start motif.
orfTransciptEnd
: The end position of the ORF in transcript coordinats, here defined as the last nucleotide before the STOP codon (meaning the stop codon is not included in these coordinates).
orfTransciptLength
: The length of the ORF
orfStarExon
: The exon in which the start codon is
orfEndExon
: The exon in which the stop codon is
orfStartGenomic
: The start position of the ORF in genomic cooridnats, here defined as the the position of the 'A' in the 'AUG' start motif.
orfEndGenomic
: The end position of the ORF in genomic coordinats, here defined as the last nucleotide before the STOP codon (meaning the stop codon is not included in these coordinates).
stopDistanceToLastJunction
: Distance from stop codon to the last exon-exon junction
stopIndex
: The index, counting from the last exon (which is 0), of which exon is the stop codon is in.
PTC
: A logic indicating whether the isoform is classified as having a Premature Termination Codon. This is defined as having a stop codon more than PTCDistance
(default is 50) nt upstream of the last exon exon junciton.
NA means no information was advailable aka no ORF (passing the minORFlength
filter) was found.
Kristoffer Vitting-Seerup
Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Mol. Cancer Res. (2017).
createSwitchAnalyzeRlist
importIsoformExpression
preFilter
### Construct data needed from example data in cummeRbund package ### (The recomended way of analyzing Cufflinks/Cuffdiff datat is via importCufflinksCummeRbund() ### This is jus an easy way to get some example data ). cuffDB <- prepareCuffExample() isoRepCount <- repCountMatrix(isoforms(cuffDB)) isoRepCount$isoform_id <- rownames(isoRepCount) ### Design matrix designMatrix <- cummeRbund::replicates(cuffDB)[,c('rep_name','sample_name')] colnames(designMatrix) <- c('sampleID','condition') localAnnotaion <- import(system.file("extdata/chr1_snippet.gtf", package="cummeRbund"))[,c('transcript_id','gene_id')] colnames(localAnnotaion@elementMetadata)[1] <- 'isoform_id' ### Please note # 1) The way of importing the GTF file in the example with # "system.file('pathToFile', package="cummeRbund") is # specialiced to access the sample data in the cummeRbund package # and not somhting you need to do - just supply the string e.g. # "/myAnnotation/myGenome/gersionQuantified.gtf" to the import function # 2) importRdata also supports direct import of a GTF file - just supply the # path to the isoformExonAnnoation argument ### Take a look at the data head(isoRepCount) head(designMatrix) head(localAnnotaion) ### Create switchAnalyzeRlist aSwitchList <- importRdata( isoformCountMatrix=isoRepCount, designMatrix=designMatrix, isoformExonAnnoation=localAnnotaion ) aSwitchList