TileDBArray 1.12.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.929558309 0.708750712 0.129839139 . -1.4233618 -0.5089131
## [2,] -0.330073990 -0.004954285 -0.897382562 . 0.1480774 -1.5188298
## [3,] 1.277900056 -2.182809214 0.810853572 . -0.1299423 -1.0009916
## [4,] 0.378528881 1.245569941 0.628655059 . -1.0941735 0.1916714
## [5,] -1.106048251 0.426457456 0.908533227 . 0.3022614 0.7996078
## ... . . . . . .
## [96,] -1.36966047 -0.58822976 -0.30957282 . 0.5297854 1.0339910
## [97,] -2.21072003 1.39651195 -0.07258274 . 1.0293996 0.8019768
## [98,] -0.69860623 -0.61462320 -0.72082249 . -0.9296716 -1.6808507
## [99,] 1.64174971 -1.10833100 -2.16041507 . 0.7794137 0.1662348
## [100,] 1.79358187 -0.63200737 -0.60942013 . 0.1944432 0.1266513
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.929558309 0.708750712 0.129839139 . -1.4233618 -0.5089131
## [2,] -0.330073990 -0.004954285 -0.897382562 . 0.1480774 -1.5188298
## [3,] 1.277900056 -2.182809214 0.810853572 . -0.1299423 -1.0009916
## [4,] 0.378528881 1.245569941 0.628655059 . -1.0941735 0.1916714
## [5,] -1.106048251 0.426457456 0.908533227 . 0.3022614 0.7996078
## ... . . . . . .
## [96,] -1.36966047 -0.58822976 -0.30957282 . 0.5297854 1.0339910
## [97,] -2.21072003 1.39651195 -0.07258274 . 1.0293996 0.8019768
## [98,] -0.69860623 -0.61462320 -0.72082249 . -0.9296716 -1.6808507
## [99,] 1.64174971 -1.10833100 -2.16041507 . 0.7794137 0.1662348
## [100,] 1.79358187 -0.63200737 -0.60942013 . 0.1944432 0.1266513
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.00 0.00
## [2,] 0 0 0 . 0.00 0.00
## [3,] 0 0 0 . 0.00 0.00
## [4,] 0 0 0 . 0.00 0.00
## [5,] 0 0 0 . 0.17 0.00
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . TRUE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.929558309 0.708750712 0.129839139 . -1.4233618 -0.5089131
## GENE_2 -0.330073990 -0.004954285 -0.897382562 . 0.1480774 -1.5188298
## GENE_3 1.277900056 -2.182809214 0.810853572 . -0.1299423 -1.0009916
## GENE_4 0.378528881 1.245569941 0.628655059 . -1.0941735 0.1916714
## GENE_5 -1.106048251 0.426457456 0.908533227 . 0.3022614 0.7996078
## ... . . . . . .
## GENE_96 -1.36966047 -0.58822976 -0.30957282 . 0.5297854 1.0339910
## GENE_97 -2.21072003 1.39651195 -0.07258274 . 1.0293996 0.8019768
## GENE_98 -0.69860623 -0.61462320 -0.72082249 . -0.9296716 -1.6808507
## GENE_99 1.64174971 -1.10833100 -2.16041507 . 0.7794137 0.1662348
## GENE_100 1.79358187 -0.63200737 -0.60942013 . 0.1944432 0.1266513
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.9295583 -0.3300740 1.2779001 0.3785289 -1.1060483 0.5542262
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.929558309 0.708750712 0.129839139 -1.732498856 -0.441560405
## GENE_2 -0.330073990 -0.004954285 -0.897382562 -0.848580979 1.467152348
## GENE_3 1.277900056 -2.182809214 0.810853572 0.992073694 0.864299543
## GENE_4 0.378528881 1.245569941 0.628655059 2.210148108 -0.067710818
## GENE_5 -1.106048251 0.426457456 0.908533227 -0.939379261 0.486224190
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.859116617 1.417501425 0.259678279 . -2.8467235 -1.0178263
## GENE_2 -0.660147980 -0.009908571 -1.794765125 . 0.2961548 -3.0376596
## GENE_3 2.555800113 -4.365618429 1.621707145 . -0.2598846 -2.0019833
## GENE_4 0.757057763 2.491139881 1.257310118 . -2.1883470 0.3833428
## GENE_5 -2.212096502 0.852914912 1.817066454 . 0.6045228 1.5992156
## ... . . . . . .
## GENE_96 -2.7393209 -1.1764595 -0.6191456 . 1.0595708 2.0679820
## GENE_97 -4.4214401 2.7930239 -0.1451655 . 2.0587993 1.6039536
## GENE_98 -1.3972125 -1.2292464 -1.4416450 . -1.8593431 -3.3617014
## GENE_99 3.2834994 -2.2166620 -4.3208301 . 1.5588275 0.3324697
## GENE_100 3.5871637 -1.2640147 -1.2188403 . 0.3888864 0.2533026
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -5.7026963 -0.4681445 1.6300003 18.9307014 6.8542918 3.7101758 2.2767689
## SAMP_8 SAMP_9 SAMP_10
## -3.2228711 0.5993411 -4.9288266
out %*% runif(ncol(out))
## <100 x 1> DelayedMatrix object of type "double":
## y
## GENE_1 0.3393299
## GENE_2 0.1624330
## GENE_3 1.4314493
## GENE_4 2.6782522
## GENE_5 -0.1515377
## ... .
## GENE_96 -1.2643402
## GENE_97 2.0813893
## GENE_98 -0.5008821
## GENE_99 -0.9803960
## GENE_100 2.5481826
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.160527647 2.721774203 -0.378241384 . 0.15203287 -0.63644596
## [2,] 0.878587396 -0.997936216 0.725517131 . -0.08073936 -0.22325749
## [3,] 1.349267131 0.726267819 -0.004527957 . -0.24787455 0.80861855
## [4,] 0.527563793 0.235495643 0.797130187 . -0.22757613 -1.19722246
## [5,] -0.107688189 -0.100832789 1.186051322 . -0.26265585 -1.10890469
## ... . . . . . .
## [96,] 2.1658066 -0.4625497 -0.5238744 . 1.21220096 -0.66602999
## [97,] 0.3822704 -1.7331442 0.8078355 . -0.05236811 0.57188141
## [98,] -1.0765114 0.3610272 -0.6542989 . -0.56628450 -1.08775994
## [99,] -0.7896832 1.8521445 1.2013592 . -0.89275899 0.98626253
## [100,] 0.6839981 0.6419564 -0.4801739 . 1.34689220 -0.03361355
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.160527647 2.721774203 -0.378241384 . 0.15203287 -0.63644596
## [2,] 0.878587396 -0.997936216 0.725517131 . -0.08073936 -0.22325749
## [3,] 1.349267131 0.726267819 -0.004527957 . -0.24787455 0.80861855
## [4,] 0.527563793 0.235495643 0.797130187 . -0.22757613 -1.19722246
## [5,] -0.107688189 -0.100832789 1.186051322 . -0.26265585 -1.10890469
## ... . . . . . .
## [96,] 2.1658066 -0.4625497 -0.5238744 . 1.21220096 -0.66602999
## [97,] 0.3822704 -1.7331442 0.8078355 . -0.05236811 0.57188141
## [98,] -1.0765114 0.3610272 -0.6542989 . -0.56628450 -1.08775994
## [99,] -0.7896832 1.8521445 1.2013592 . -0.89275899 0.98626253
## [100,] 0.6839981 0.6419564 -0.4801739 . 1.34689220 -0.03361355
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.14 TileDBArray_1.12.0 DelayedArray_0.28.0
## [4] SparseArray_1.2.0 S4Arrays_1.2.0 abind_1.4-5
## [7] IRanges_2.36.0 S4Vectors_0.40.0 MatrixGenerics_1.14.0
## [10] matrixStats_1.0.0 BiocGenerics_0.48.0 Matrix_1.6-1.1
## [13] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.7 compiler_4.3.1
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.11
## [7] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [10] lattice_0.22-5 R6_2.5.1 RcppCCTZ_0.2.12
## [13] XVector_0.42.0 tiledb_0.21.1 knitr_1.44
## [16] bookdown_0.36 bslib_0.5.1 rlang_1.1.1
## [19] cachem_1.0.8 xfun_0.40 sass_0.4.7
## [22] bit64_4.0.5 cli_3.6.1 zlibbioc_1.48.0
## [25] spdl_0.0.5 digest_0.6.33 grid_4.3.1
## [28] data.table_1.14.8 evaluate_0.22 nanotime_0.3.7
## [31] zoo_1.8-12 rmarkdown_2.25 tools_4.3.1
## [34] htmltools_0.5.6.1