TileDBArray 1.14.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.5360689 0.1397672 0.2189454 . 0.04877428 0.90738301
## [2,] 0.3007553 -0.8572769 -1.1337691 . -2.31167914 0.82336928
## [3,] 1.1246496 0.3495153 -1.0028358 . 1.22388503 0.02501761
## [4,] 1.2024192 -0.8437763 0.4306445 . -2.68908550 0.68673577
## [5,] -2.2300744 1.1625106 -1.1886471 . 0.83981536 1.82377805
## ... . . . . . .
## [96,] 2.10734550 0.46903990 0.12467023 . 0.3968614 -0.3377328
## [97,] -0.06838776 0.61681610 -2.12193735 . 0.3429669 -1.9061811
## [98,] 0.98507599 -2.48575813 -2.47251598 . -0.3618537 -0.7340382
## [99,] -0.52058777 -0.23206590 0.37281305 . -0.5259811 -0.4005703
## [100,] 1.16810488 0.78618043 -0.09277216 . -1.2632079 -0.6354131
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.5360689 0.1397672 0.2189454 . 0.04877428 0.90738301
## [2,] 0.3007553 -0.8572769 -1.1337691 . -2.31167914 0.82336928
## [3,] 1.1246496 0.3495153 -1.0028358 . 1.22388503 0.02501761
## [4,] 1.2024192 -0.8437763 0.4306445 . -2.68908550 0.68673577
## [5,] -2.2300744 1.1625106 -1.1886471 . 0.83981536 1.82377805
## ... . . . . . .
## [96,] 2.10734550 0.46903990 0.12467023 . 0.3968614 -0.3377328
## [97,] -0.06838776 0.61681610 -2.12193735 . 0.3429669 -1.9061811
## [98,] 0.98507599 -2.48575813 -2.47251598 . -0.3618537 -0.7340382
## [99,] -0.52058777 -0.23206590 0.37281305 . -0.5259811 -0.4005703
## [100,] 1.16810488 0.78618043 -0.09277216 . -1.2632079 -0.6354131
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.5360689 0.1397672 0.2189454 . 0.04877428 0.90738301
## GENE_2 0.3007553 -0.8572769 -1.1337691 . -2.31167914 0.82336928
## GENE_3 1.1246496 0.3495153 -1.0028358 . 1.22388503 0.02501761
## GENE_4 1.2024192 -0.8437763 0.4306445 . -2.68908550 0.68673577
## GENE_5 -2.2300744 1.1625106 -1.1886471 . 0.83981536 1.82377805
## ... . . . . . .
## GENE_96 2.10734550 0.46903990 0.12467023 . 0.3968614 -0.3377328
## GENE_97 -0.06838776 0.61681610 -2.12193735 . 0.3429669 -1.9061811
## GENE_98 0.98507599 -2.48575813 -2.47251598 . -0.3618537 -0.7340382
## GENE_99 -0.52058777 -0.23206590 0.37281305 . -0.5259811 -0.4005703
## GENE_100 1.16810488 0.78618043 -0.09277216 . -1.2632079 -0.6354131
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.5360689 0.3007553 1.1246496 1.2024192 -2.2300744 -0.7102085
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.53606886 0.13976716 0.21894536 0.09267435 -0.92346741
## GENE_2 0.30075533 -0.85727694 -1.13376911 -0.51981226 1.14640411
## GENE_3 1.12464957 0.34951534 -1.00283580 0.29463367 0.51979095
## GENE_4 1.20241919 -0.84377629 0.43064453 0.54912941 1.30502240
## GENE_5 -2.23007440 1.16251061 -1.18864712 0.87272744 1.69239969
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.0721377 0.2795343 0.4378907 . 0.09754857 1.81476602
## GENE_2 0.6015107 -1.7145539 -2.2675382 . -4.62335828 1.64673856
## GENE_3 2.2492991 0.6990307 -2.0056716 . 2.44777006 0.05003523
## GENE_4 2.4048384 -1.6875526 0.8612891 . -5.37817101 1.37347154
## GENE_5 -4.4601488 2.3250212 -2.3772942 . 1.67963073 3.64755610
## ... . . . . . .
## GENE_96 4.2146910 0.9380798 0.2493405 . 0.7937227 -0.6754655
## GENE_97 -0.1367755 1.2336322 -4.2438747 . 0.6859338 -3.8123622
## GENE_98 1.9701520 -4.9715163 -4.9450320 . -0.7237074 -1.4680764
## GENE_99 -1.0411755 -0.4641318 0.7456261 . -1.0519622 -0.8011406
## GENE_100 2.3362098 1.5723609 -0.1855443 . -2.5264157 -1.2708263
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7 SAMP_8
## 12.005122 18.503028 -4.676772 -3.900299 17.042758 3.442345 10.413245 -7.276163
## SAMP_9 SAMP_10
## -9.802012 1.518822
out %*% runif(ncol(out))
## [,1]
## GENE_1 -2.84370865
## GENE_2 -0.23407065
## GENE_3 1.42402392
## GENE_4 0.41828695
## GENE_5 -1.69710633
## GENE_6 -0.41683110
## GENE_7 2.71071317
## GENE_8 -0.32513287
## GENE_9 -0.65611117
## GENE_10 1.62480002
## GENE_11 0.10949163
## GENE_12 0.06137767
## GENE_13 -0.15890686
## GENE_14 2.73941583
## GENE_15 -0.67859873
## GENE_16 -2.97459274
## GENE_17 0.58613523
## GENE_18 1.97892158
## GENE_19 2.51398769
## GENE_20 1.28904798
## GENE_21 -1.10293470
## GENE_22 1.86555358
## GENE_23 -2.28839950
## GENE_24 0.67890054
## GENE_25 -0.23251827
## GENE_26 0.53080803
## GENE_27 -1.21867294
## GENE_28 0.21406248
## GENE_29 -0.88864915
## GENE_30 -0.38957598
## GENE_31 1.10068654
## GENE_32 3.69728278
## GENE_33 0.39766559
## GENE_34 1.97941418
## GENE_35 1.48706162
## GENE_36 1.22305532
## GENE_37 0.05067343
## GENE_38 1.90227960
## GENE_39 0.61995533
## GENE_40 -2.64548043
## GENE_41 -0.96742804
## GENE_42 2.56411077
## GENE_43 1.33607395
## GENE_44 -1.29399208
## GENE_45 1.94189808
## GENE_46 -1.22510550
## GENE_47 1.48135568
## GENE_48 -1.37303545
## GENE_49 0.46901304
## GENE_50 3.14812294
## GENE_51 -0.17212068
## GENE_52 0.61566608
## GENE_53 1.86159699
## GENE_54 -2.37712460
## GENE_55 1.30630118
## GENE_56 -2.04640663
## GENE_57 -1.69199365
## GENE_58 0.42874058
## GENE_59 -2.46609755
## GENE_60 0.35093615
## GENE_61 1.47423973
## GENE_62 -0.31796701
## GENE_63 0.39779706
## GENE_64 0.71563781
## GENE_65 0.38681402
## GENE_66 -2.20169817
## GENE_67 2.61095714
## GENE_68 0.16314784
## GENE_69 0.96141014
## GENE_70 2.76262598
## GENE_71 -0.62149681
## GENE_72 -0.64782251
## GENE_73 -2.17101375
## GENE_74 -0.89033297
## GENE_75 1.07854904
## GENE_76 1.07634318
## GENE_77 -1.36119516
## GENE_78 -1.55253865
## GENE_79 0.20439659
## GENE_80 -0.47973134
## GENE_81 -0.23045470
## GENE_82 3.32226908
## GENE_83 1.26040789
## GENE_84 2.10107834
## GENE_85 -1.89863003
## GENE_86 0.85911837
## GENE_87 0.72923355
## GENE_88 0.77705334
## GENE_89 -2.00083888
## GENE_90 2.01249531
## GENE_91 -1.24043698
## GENE_92 0.20208439
## GENE_93 2.74388576
## GENE_94 -1.98036618
## GENE_95 -0.59615374
## GENE_96 3.06687565
## GENE_97 -1.63543334
## GENE_98 -1.82657385
## GENE_99 -0.52077559
## GENE_100 2.22946975
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.61933623 0.50920242 -2.57588003 . 1.06956460 1.61617798
## [2,] -1.48618187 -3.59013448 -0.06194473 . 0.50551103 -0.29782158
## [3,] 1.66522026 0.53906928 0.62300733 . -0.68578017 -0.24103089
## [4,] -1.68506937 0.72779564 1.27543364 . 0.08738388 -0.15612812
## [5,] 1.64484470 1.80474329 0.73916292 . 0.46445956 0.59238828
## ... . . . . . .
## [96,] 0.7954457 0.6882108 0.5258207 . -0.438338855 0.728252276
## [97,] 0.2489504 1.2034226 0.3587021 . 0.623229750 0.304156782
## [98,] -1.3292487 0.3013278 0.2265666 . 0.521731959 -1.024866051
## [99,] -0.7834908 0.1250877 2.8451783 . -0.266856282 0.849217512
## [100,] 0.5376486 -1.4056882 0.7575569 . -0.328033930 0.001560227
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.61933623 0.50920242 -2.57588003 . 1.06956460 1.61617798
## [2,] -1.48618187 -3.59013448 -0.06194473 . 0.50551103 -0.29782158
## [3,] 1.66522026 0.53906928 0.62300733 . -0.68578017 -0.24103089
## [4,] -1.68506937 0.72779564 1.27543364 . 0.08738388 -0.15612812
## [5,] 1.64484470 1.80474329 0.73916292 . 0.46445956 0.59238828
## ... . . . . . .
## [96,] 0.7954457 0.6882108 0.5258207 . -0.438338855 0.728252276
## [97,] 0.2489504 1.2034226 0.3587021 . 0.623229750 0.304156782
## [98,] -1.3292487 0.3013278 0.2265666 . 0.521731959 -1.024866051
## [99,] -0.7834908 0.1250877 2.8451783 . -0.266856282 0.849217512
## [100,] 0.5376486 -1.4056882 0.7575569 . -0.328033930 0.001560227
sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.14.1 DelayedArray_0.30.1
## [4] SparseArray_1.4.8 S4Arrays_1.4.1 abind_1.4-8
## [7] IRanges_2.38.1 S4Vectors_0.42.1 MatrixGenerics_1.16.0
## [10] matrixStats_1.4.1 BiocGenerics_0.50.0 Matrix_1.7-0
## [13] BiocStyle_2.32.1
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.5.0.1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.44.0 tiledb_0.30.0
## [16] knitr_1.48 bookdown_0.40 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.47
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.3
## [25] zlibbioc_1.50.0 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.1 lifecycle_1.0.4 data.table_1.16.0
## [31] evaluate_0.24.0 nanotime_0.3.9 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.4.1 htmltools_0.5.8.1