1 Background

Ordinarily, direct support for a matrix representation would require the appropriate methods to be defined in beachmat at compile time. This is the case for the most widely used matrix classes but is somewhat restrictive for other community-contributed matrix representations. Fortunately, R provides a mechanism to link across shared libraries from different packages. This means that package developers who define a R matrix representation can also define C++ methods for native read/write support in beachmat-dependent code. By doing so, we can improve efficiency of access to these new classes by avoiding the need for block processing via R.

A functioning demonstration of this approach is available in the extensions test package. This vignette will provide an explanation of the code in extensions, and we suggest examining the source code at the same time:

system.file("extensions", package="beachmat")
## [1] "/tmp/RtmpddqHqW/Rinst3543339c4c6f/beachmat/extensions"

2 External linkage for input

2.1 Setting up in R

Assume that we have already defined a new matrix-like S4 class (here, AaronMatrix). To notify the beachmat API that direct input support is available, we need to:

  • define a method for the supportCppAccess() generic (from beachmat) for this class. This should return TRUE if direct support is available (obviously).
  • define a method for the type() generic from the DelayedArray package. This should return the type of the matrix, i.e., integer, logical, numeric or character.

It is possible to only have direct support for particular data types of the given matrix representation. The example in extensions only directly supports integer and character AaronMatrix objects1 Because I was too lazy to add all of them. and will only return TRUE for such types.

2.2 Setting up in C++

We will use integer matrices for demonstration, though it is simple to generalize this to all types by replacing _integer with, e.g., _character2 Some understanding of C++ templates will greatly simplify the definition of the same methods for different types.. First, we define a create() function that takes a SEXP object and returns a void pointer. This should presumably point to some C++ class that can contain intermediate data structures for efficient access.

void * ptr = AaronMatrix_integer_input_create(in /* SEXP */);

We define a clone() function that performs a deep copy of the aforementioned pointer.

void * ptr_copy = AaronMatrix_integer_input_clone(ptr /* void* */);

We define a destroy() function that frees the memory pointed to by ptr.

AaronMatrix_integer_input_destroy(ptr /* void* */);

We define a get_dim() function that records the number of rows and columns in the object pointed to by ptr. Note the pointers for nrow and ncol.

AaronMatrix_integer_input_dim(
    ptr, /* void* */
    nrow, /* size_t* */
    ncol /* size_t* */
);

A systematic naming scheme is used for all functions, consisting of:

  • The name of the matrix representation, i.e., AaronMatrix.
  • The data type, i.e., integer.
  • Whether it is an input or output class.
  • The purpose of the function, e.g., destroy.

2.3 Defining getter methods

2.3.1 For all types

In general, the getter functions follow the same structure as that described for the input API. We expect a get function to obtain a specified entry of the matrix:

AaronMatrix_integer_intput_get(
    ptr, /* void* */
    r, /* size_t */
    c, /* size_t */
    val /* int* */
);

Note that val is a pointer to the matrix type. For example, val should be a Rcpp::String* for character matrices, a double* for numeric matrices, and an int* for logical matrices.

Developers can assume that r and c are valid, i.e., within [0, nrow) and [0, ncol) respectively. These checks are performed by beachmat and do not have to be repeated within developer-defined functions3 Obviously, the dimensions of the matrix pointed to by ptr should not change!.

2.3.2 For non-numeric types

Here, we will use character matrices4 Character matrices tend to require some special attention, as character arrays need to be coerced to Rcpp::String objects to be returned in in. as an example. We expect a getCol function to obtain a column of the matrix:

AaronMatrix_character_input_getCol(
    ptr, /* void* */
    c, /* size_t */
    in, /* Rcpp::StringVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

… and another getRow function to obtain a row of the matrix:

AaronMatrix_character_input_getRow(
    ptr, /* void* */
    r, /* size_t */
    in*, /* Rcpp::StringVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

These are in camelcase to simplify parsing of the function names. We further expect a getCols function to obtain multiple columns:

AaronMatrix_character_input_getCols(
    ptr, /* void* */
    c, /* size_t */
    indices, /* Rcpp::IntegerVector::iterator* */
    n, /* size_t */
    in, /* Rcpp::StringVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

… and a getRows function to obtain multiple rows:

AaronMatrix_character_input_getRows(
    ptr, /* void* */
    r, /* size_t */
    indices, /* Rcpp::IntegerVector::iterator* */
    n, /* size_t */
    in, /* Rcpp::StringVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

In all cases, first and last can be assumed to be valid, i.e., first <= last and both in [0, nrow) or [0, ncol) (for column and row access, respectively). Indices in indices can also be assumed to be valid, i.e., within matrix dimensions and strictly increasing.

We stress that the various iterator arguments are pointers to iterators rather than the iterators themselves. This is to avoid potential issues with C++ classes when using C-style linkage via R’s R_GetCCallable() framework.

2.3.3 Numeric types

For integer, logical or numeric matrices, we need to account for type conversions. This is done by defining the following functions (using integer matrices as an example):

  • AaronMatrix_integer_input_getCol_integer, for getting a single column’s values as integers.
  • AaronMatrix_integer_input_getCol_numeric, for getting a single column’s values as double-precision values.
  • AaronMatrix_integer_input_getRow_integer, for getting a single row’s values as integers.
  • AaronMatrix_integer_input_getRow_numeric, for getting a single row’s values as double-precision values.
  • AaronMatrix_integer_input_getCols_integer, for getting multiple columns’ values as integers.
  • AaronMatrix_integer_input_getCols_numeric, for getting multiple columns’ values as double-precision values.
  • AaronMatrix_integer_input_getRows_integer, for getting multiple rows’ values as integers.
  • AaronMatrix_integer_input_getRows_numeric, for getting multiple rows’ values as double-precision values.

Taking the single-column getter as an example:

AaronMatrix_integer_input_getCol_integer(
    ptr, /* void* */
    c, /* size_t */
    in, /* Rcpp::IntegerVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

AaronMatrix_integer_input_getCol_numeric(
    ptr, /* void* */
    c, /* size_t */
    in, /* Rcpp::NumericVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

The function name now has an additional suffix to denote the destination type. We explicitly define conversions here as the cross-library linking framework does not support templating or overloading of in.

3 External linkage for output

3.1 Setting up in R

To notify the beachmat API that direct output support is available, we need to define flags in our package’s namespace.

beachmat_AaronMatrix_integer_output <- TRUE
beachmat_AaronMatrix_character_output <- TRUE

This indicates that support is available for AaronMatrix integer and character outputs. Missing or FALSE flags indicate that no support is available, in which case beachmat will write to an ordinary matrix by default.

3.2 Setting up in C++

Again, we will use integer matrices for demonstration. The required functions are mostly similar to the input case. For creation, we expect to have the number of rows nr and columns nc:

void * ptr = AaronMatrix_integer_output_create(
   nr /* size_t */,
   nc /* size_t */
);

We define a clone() function to perform a deep copy:

void * ptr_copy = AaronMatrix_integer_output_clone(ptr /* void* */);

We also define a destroy() function to free memory:

AaronMatrix_integer_output_destroy(ptr /* void* */);

In all cases, we use _output_ to indicate that we are dealing with an output matrix class.

3.3 Defining setter methods

3.3.1 For all types

In general, the setter functions follow the same structure as that described for the out API. We expect a set function to obtain a specified entry of the matrix:

AaronMatrix_integer_intput_set(
    ptr, /* void* */
    r, /* size_t */
    c, /* size_t */
    val /* int* */
);

Again, note that val is a pointer to the matrix type.

3.3.2 For non-numeric types

Here, we will use character matrices5 Character matrices tend to require some special attention, as character arrays need to be coerced to Rcpp::String objects to be returned in in. as an example. We expect a setCol function to obtain a column of the matrix:

AaronMatrix_character_output_setCol(
    ptr, /* void* */
    c, /* size_t */
    in, /* Rcpp::StringVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

… and another setRow function to obtain a row of the matrix:

AaronMatrix_character_output_setRow(
    ptr, /* void* */
    r, /* size_t */
    in*, /* Rcpp::StringVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

These are in camelcase to simplify parsing of the function names. We further expect a setColIndexed function to set specific elements of a column:

AaronMatrix_character_output_setColIndexed(
    ptr, /* void */
    c, /* size_t */
    n, /* size_t */
    idx, /* Rcpp::IntegerVector::iterator */
    in /* Rcpp::StringVector::iterator */
)

… where idx points to an array of n zero-indexed row indices and val points to an array of values. The function should assign each value to the corresponding row at column c of the output matrix.

Similarly, we expect a setRowIndexed function to set specific elements of a row:

AaronMatrix_character_output_setRowIndexed(
    ptr, /* void */
    r, /* size_t */
    n, /* size_t */
    idx, /* Rcpp::IntegerVector::iterator */
    in /* Rcpp::StringVector::iterator */
)

… where idx now contains column indices.

3.3.3 Numeric types

For integer, logical or numeric matrices, we need to account for type conversions. This is done by defining the following functions (using integer matrices as an example):

  • AaronMatrix_integer_output_setCol_integer, for setting a single column’s values from integers.
  • AaronMatrix_integer_output_setCol_numeric, for setting a single column’s values from double-precision values.
  • AaronMatrix_integer_output_setRow_integer, for setting a single row’s values from integers.
  • AaronMatrix_integer_output_setRow_numeric, for setting a single row’s values from double-precision values.
  • AaronMatrix_integer_output_setColIndexed_integer, for indexed setting of a single column’s values from integers.
  • AaronMatrix_integer_output_setColIndexed_numeric, for indexed setting of a single column’s values from double-precision values.
  • AaronMatrix_integer_output_setRowIndexed_integer, for indexed setting of a single row’s values from integers.
  • AaronMatrix_integer_output_setRowIndexed_numeric, for indexed setting of a single row’s values from double-precision values

Taking the single-column setter as an example:

AaronMatrix_integer_output_setCol_integer(
    ptr, /* void* */
    c, /* size_t */
    in, /* Rcpp::IntegerVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

AaronMatrix_integer_output_setCol_numeric(
    ptr, /* void* */
    c, /* size_t */
    in, /* Rcpp::NumericVector::iterator* */
    first, /* size_t */
    last /* size_t */
);

3.4 Defining getter methods

All single-element and single-row/column getters should be supported:

  • AaronMatrix_character_output_get
  • AaronMatrix_character_output_getRow
  • AaronMatrix_character_output_getCol

For numeric types, convertible getters should also be supported:

  • AaronMatrix_character_output_get
  • AaronMatrix_character_output_getRow_integer
  • AaronMatrix_character_output_getRow_numeric
  • AaronMatrix_character_output_getCol_integer
  • AaronMatrix_character_output_getCol_numeric

4 Ensuring discoverability

We use the R_RegisterCCallable() function from the R API to register the above functions (see here for an explanation). This ensures that they can be found by beachmat when an AaronMatrix instance is encountered. Note that the functions must be defined with C-style linkage in order for this procedure to work properly, hence the use of extern "C" in the extensions test package.

Needless to say, the NAMESPACE should contain an appropriate useDynLib command. This means that shared library will be loaded along with the package, allowing beachmat to access the registered routines within. However, the supportCppAccess method and output flags do not need to be exported, as these will be directly recovered from the package’s namespace.

5 Testing

We suggest using the beachtest package to test correct input and output via external linkage to a custom matrix representation. When using the testthat framework, this can be added to setup.R:

testpkg <- system.file("testpkg", package="beachmat")
devtools::install(testpkg, quick=TRUE)
library(beachtest)

It is simple to write test scripts using functions like check_read_all and check_write_all to quickly verify that linkage works correctly. Developers are again referred to the extensions test package for a working example.