alabaster.matrix 1.2.0
The alabaster.matrix package implements methods to save matrix-like objects to file artifacts and load them back into R. Check out the alabaster.base for more details on the motivation and the alabaster framework.
Given an array-like object, we can use stageObject()
to save it inside a staging directory:
library(Matrix)
y <- rsparsematrix(1000, 100, density=0.05)
library(alabaster.matrix)
tmp <- tempfile()
dir.create(tmp)
meta <- stageObject(y, tmp, "my_sparse_matrix")
library(alabaster.base)
.writeMetadata(meta, tmp)
## $type
## [1] "local"
##
## $path
## [1] "my_sparse_matrix/matrix.h5"
list.files(tmp, recursive=TRUE)
## [1] "my_sparse_matrix/matrix.h5" "my_sparse_matrix/matrix.h5.json"
We then load it back into our R session with loadObject()
.
This creates a HDF5-backed DelayedArray
that can be easily coerced into the desired format, e.g., a dgCMatrix
.
meta <- acquireMetadata(tmp, "my_sparse_matrix/matrix.h5")
roundtrip <- loadObject(meta, tmp)
class(roundtrip)
## [1] "H5SparseMatrix"
## attr(,"package")
## [1] "HDF5Array"
This process is supported for all base arrays, Matrix objects and DelayedArray objects.
For DelayedArray
s, we may instead choose to save the delayed operations themselves to file, using the chihaya package.
This creates a HDF5 file following the chihaya format, containing the delayed operations rather than the results of their evaluation.
library(DelayedArray)
y <- DelayedArray(rsparsematrix(1000, 100, 0.05))
y <- log1p(abs(y) / 1:100) # adding some delayed ops.
preserveDelayedOperations(TRUE)
meta <- stageObject(y, tmp, "delayed")
.writeMetadata(meta, tmp)
## $type
## [1] "local"
##
## $path
## [1] "delayed/delayed.h5"
meta <- acquireMetadata(tmp, "delayed/delayed.h5")
roundtrip <- loadObject(meta, tmp)
class(roundtrip)
## [1] "DelayedMatrix"
## attr(,"package")
## [1] "DelayedArray"
However, it is probably best to avoid preserving delayed operations for file-backed DelayedArray
s if you want the artifacts to be re-usable on different filesystems.
For example, HDF5Array
s will be saved with a reference to an absolute file path, which will not be portable.
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DelayedArray_0.28.0 SparseArray_1.2.0 S4Arrays_1.2.0
## [4] abind_1.4-5 IRanges_2.36.0 S4Vectors_0.40.0
## [7] MatrixGenerics_1.14.0 matrixStats_1.0.0 BiocGenerics_0.48.0
## [10] alabaster.base_1.2.0 alabaster.matrix_1.2.0 Matrix_1.6-1.1
## [13] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.7 compiler_4.3.1 BiocManager_1.30.22
## [4] crayon_1.5.2 Rcpp_1.0.11 rhdf5filters_1.14.0
## [7] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [10] lattice_0.22-5 jsonvalidate_1.3.2 R6_2.5.1
## [13] XVector_0.42.0 curl_5.1.0 knitr_1.44
## [16] chihaya_1.2.0 bookdown_0.36 bslib_0.5.1
## [19] rlang_1.1.1 V8_4.4.0 cachem_1.0.8
## [22] HDF5Array_1.30.0 xfun_0.40 sass_0.4.7
## [25] cli_3.6.1 Rhdf5lib_1.24.0 zlibbioc_1.48.0
## [28] digest_0.6.33 grid_4.3.1 alabaster.schemas_1.2.0
## [31] rhdf5_2.46.0 evaluate_0.22 rmarkdown_2.25
## [34] tools_4.3.1 htmltools_0.5.6.1