get_phix_genome {ORFik}R Documentation

Download genome (fasta), annotation (GTF) and contaminants

Description

This function automatically downloads (if files not already exists) genomes and contaminants specified for genome alignment. Will create a R transcript database (TxDb object) from the annotation.
It will also index the genome for you
If you misspelled something or crashed, delete wrong files and run again.
Do remake = TRUE, to do it all over again.

Usage

get_phix_genome(phix, output.dir, gunzip)

Arguments

phix

logical, default FALSE, download phix sequence to filter out with. Phix is used as a contaminant genome. Only use if illumina sequencing. Phix is used in Illumina sequencers for sequencing quality control. Genome is: refseq, Escherichia virus phiX174

output.dir

directory to save downloaded data

gunzip

logical, default TRUE, uncompress downloaded files that are zipped when downloaded, should be TRUE!

Details

If you want custom genome or gtf from you hard drive, assign it after you run this function, like this:
annotation <- getGenomeAndAnnotation(GTF = FALSE, genome = FALSE)
annotation["genome"] = "path/to/genome.fasta"
annotation["gtf"] = "path/to/gtf.gtf"

Value

a named character vector of path to genomes and gtf downloaded, and additional contaminants if used. If merge_contaminants is TRUE, will not give individual fasta files to contaminants, but only the merged one.

See Also

Other STAR: STAR.align.folder(), STAR.align.single(), STAR.allsteps.multiQC(), STAR.index(), STAR.install(), STAR.multiQC(), STAR.remove.crashed.genome(), install.fastp()

Examples


## Get Saccharomyces cerevisiae genome and gtf (create txdb for R)
#getGenomeAndAnnotation("Saccharomyces cerevisiae", tempdir(), assembly_type = "toplevel")
## Get Danio rerio genome and gtf (create txdb for R)
#getGenomeAndAnnotation("Danio rerio", tempdir())

output.dir <- "/Bio_data/references/zebrafish"
## Get Danio rerio and Phix contamints to deplete during alignment
#getGenomeAndAnnotation("Danio rerio", output.dir, phix = TRUE)

## Optimize for ORFik (speed up for large annotations like human or zebrafish)
#getGenomeAndAnnotation("Danio rerio", tempdir(), optimize = TRUE)

## How to save malformed refseq gffs:
## First run function and let it crash:
#annotation <- getGenomeAndAnnotation(organism = "Arabidopsis thaliana", output.dir = "~/Desktop/test_plant/",
#  assembly_type = "primary_assembly", db = "refseq")
## Then apply a fix (example for linux, too long rows):
# \code{system("cat ~/Desktop/test_plant/Arabidopsis_thaliana_genomic_refseq.gff | awk '{ if (length($0) < 32768) print }' > ~/Desktop/test_plant/Arabidopsis_thaliana_genomic_refseq_trimmed2.gff")}
## Then updated arguments:
annotation <- c("~/Desktop/test_plant/Arabidopsis_thaliana_genomic_refseq_trimmed.gff",
 "~/Desktop/test_plant/Arabidopsis_thaliana_genomic_refseq.fna")
names(annotation) <- c("gtf", "genome")
# Make the txdb (for faster R use)
# makeTxdbFromGenome(annotation["gtf"], annotation["genome"], organism = "Arabidopsis thaliana")


[Package ORFik version 1.12.13 Index]