GScores-class {GenomicScores} | R Documentation |
The goal of the GenomicScores
package is to provide support to store
and retrieve genomic scores associated to physical nucleotide positions along
a genome. This is achieved through the GScores
class of objects, which
is a container for genomic score values.
The GScores
class attempts to provide a compact storage and efficient
retrieval of genomic score values that have been typically processed and
stored using some form of lossy compression. This class is currently based
on a former version of the SNPlocs
class defined in the
BSgenome
package, with the following slots:
provider
(character
), the data provider such as UCSC.
provider_version
(character
), the version of the data
as given by the data provider, typically a date in some compact format.
download_url
(character
), the URL of the data provider
from where the original data were downloaded.
download_date
(character
), the date on which the data
were downloaded.
reference_genome
(GenomeDescription
), object with
information about the reference genome whose physical positions have
the genomic scores.
data_pkgname
(character
), name given to the set
of genomic scores associated to a particular genome. When the genomic
scores are stored within an annotation package, then this corresponds to
the name of that package.
data_dirpath
(character
), absolute path to the local
directory where the genomic scores are stored in one file per genome
sequence.
data_serialized_objnames
(character
), named vector of
filenames pointing to files containing the genomic scores in one file per
genome sequence. The names of this vector correspond to the genome sequence
names.
.data_cache
(environment
), data structure where objects
storing genomic scores are cached into main memory.
The goal of the design behind the GScores
class is to load into main
memory only the objects associated with the queried sequences to minimize the
memory footprint, which may be advantageous in workflows that parallelize the
access to genomic scores by genome sequence.
GScores
objects are created either from AnnotationHub
resources
or when loading specific annotation packages that store genomic score values.
Two such annotation packages are:
phastCons100way.UCSC.hg19
Nucleotide-level phastCons conservation scores from the UCSC Genome Browser calculated from multiple genome alignments from the human genome version hg19 to 99 vertebrate species.
phastCons100way.UCSC.hg38
Nucleotide-level phastCons conservation scores from the UCSC Genome Browser calculated from multiple genome alignments from the human genome version hg38 to 99 vertebrate species.
GScores(provider, provider_version, download_url,
download_date, reference_genome, data_pkgname, data_dirpath,
data_serialized_objnames)
:
Creates a GScores
object. In principle, the end-user needs not to call
this function.
provider
character, containing the data provider.
provider_version
character, containing the version of the data as given by the data provider.
download_url
character, containing the URL of the data provider from where the original data were downloaded.
reference_genome
GenomeDescription, storing the information about the associated reference genome.
data_pkgname
character, name given to the set of genomic scores stored through this object.
data_dirpath
character, absolute path to the local directory where the genomic scores are stored.
data_serialized_objname
character vector, containing filenames where the genomic scores are stored.
name(x)
: get the name of the set of genomic scores.
type(x)
: get the substring of the name of the set of genomic scores
comprised between the first character until the first period. This should
typically match the type of genomic scores such as, phastCons
,
phyloP
, etc.
provider(x)
: get the data provider.
providerVersion(x)
: get the provider version.
organism(x)
: get the organism associated with the genomic scores.
referenceGenome(x)
: get the GenomeDescription
object
associated with the genome on which the genomic scores are defined.
seqlevelsStyle(x)
: get the genome sequence style.
seqinfo(x)
: get the genome sequence information.
seqnames(x)
: get the genome sequence names.
seqlengths(x)
: get the genome sequence lengths.
qfun(x)
: get the quantizer function.
dqfun(x)
: get the dequantizer function.
citation(x)
: get citation information for the genomic scores data
in the form of a bibentry
object.
R. Castelo
scores()
phastCons100way.UCSC.hg19
phastCons100way.UCSC.hg38
## supporting annotation packages with genomic scores if (require(phastCons100way.UCSC.hg19)) { library(GenomicRanges) gsco <- phastCons100way.UCSC.hg19 gsco scores(gsco, GRanges(seqnames="chr7", IRanges(start=117232380, width=5))) } ## supporting AnnotationHub resources ## Not run: availableGScores() gsco <- getGScores("phastCons100way.UCSC.hg19") gsco scores(gsco, GRanges(seqnames="chr7", IRanges(start=117232380, width=5))) ## End(Not run) ## meta data from a GScores object name(gsco) type(gsco) provider(gsco) providerVersion(gsco) organism(gsco) referenceGenome(gsco) seqlevelsStyle(gsco) seqinfo(gsco) head(seqnames(gsco)) head(seqlengths(gsco)) qfun(gsco) dqfun(gsco) citation(gsco)