buildSummarized {consensusDE} | R Documentation |
This function will create a summarized file, decribing reads from RNA-seq experiments that overlap a set of transcript features. Transcript features can be described as a gtf formatted table that is imported, or using a txdb. This is designed to be straightforward and with minimised parameters for first pass batch RNA-seq analyses.
buildSummarized(sample_table = NULL, bam_dir = NULL, gtf = NULL, tx_db = NULL, mapping_mode = "Union", read_format = NULL, ignore_strand = FALSE, fragments = TRUE, summarized = NULL, output_log = NULL, filter = FALSE, BamFileList_yiedsize = NA_integer_, n_cores = 1, force_build = FALSE, verbose = FALSE)
sample_table |
A data.frame describing samples. For paired mode it must contain 3 columns, with the names "file", "group" and "pairs". The filename is the name in the directory supplied with the "bam_dir" parameter below. This is not required if an existing summarized file is provided. Default=NULL |
bam_dir |
Full path to location of bam files listed in the "file" column in the sample_table provided above. This is not required if an existing summarized file is provided. Default=NULL |
gtf |
Full path to a gtf file describing the transcript coordinates to map the RNA-seq reads to. GTF file is not required if providing a pre-computed summarized experiment file previously generated using buildSummarized() OR a tx_db object (below). Default = NULL |
tx_db |
An R txdb object. E.g. TxDb.Dmelanogaster.UCSC.dm3.ensGene. Default = NULL |
mapping_mode |
Options are "Union", "IntersectionStrict" and "IntersectionNotEmpty". see "mode" in ?summarizeOverlaps for explanation. Default = "Union" |
read_format |
Are the reads from single-end or paired-end data? Option are "paired" or "single". An option must be selected. Default = NULL |
ignore_strand |
Ignore strand when mapping reads? see "ignore_strand" in ?summarizeOverlaps for explanation. Default=FALSE |
fragments |
When mapping_mode="paired", include reads from pairs that do not map with their corresponding pair? see "fragments" in ?summarizeOverlaps for explanation. Default = TRUE |
summarized |
Full path to a summarized experiment file. If buildSummarized() has already been performed, the output summarized file, saved in "/output_log/se.R" can be used as the input (e.g. if filtering is to be done). Default = NULL |
output_log |
Full path to directory for output of log files and saved summarized experiment generated. |
filter |
Perform filtering of low count and missing data from the summarized experiment file? This uses default filtering via "filterByExpr". See ?filterByExpr for further information. Default=FALSE |
BamFileList_yiedsize |
If running into memory problems. Set the number of lines to an integer value. See "yieldSize" description in ?BamFileList for an explanation. |
n_cores |
Number of cores to utilise for reading in Bam files. Use with caution as can create memory issues if BamFileList_yiedsize is not parameterised. Default = 1 |
force_build |
If the sample_table contains less than two replicates per group, force a summarizedExperiment object to be built. Otherwise buildSummarized will halt. Default = FALSE. |
verbose |
Verbosity ON/OFF. Default=FALSE |
A summarized experiment
## Extract summarized following example in the vignette ## The example below will return a summarized experiment ## tx_db is obtained from TxDb.Dmelanogaster.UCSC.dm3.ensGene library library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) ## bam files are obtained from the GenomicAlignments package ## 1. Build a sample table that lists files and groupings ## - obtain list of files file_list <- list.files(system.file("extdata", package="GenomicAlignments"), recursive = TRUE, pattern = "*bam$", full = TRUE) bam_dir <- as.character(gsub(basename(file_list)[1], "", file_list[1])) ## - create a sample table to be used with buildSummarized() below ## must be comprised of a minimum of two columns, named "file" and "group", sample_table <- data.frame("file" = basename(file_list), "group" = c("treat", "untreat")) summarized_dm3 <- buildSummarized(sample_table = sample_table, bam_dir = bam_dir, tx_db = TxDb.Dmelanogaster.UCSC.dm3.ensGene, read_format = "paired", force_build = TRUE)