combi package: vignette

Introduction

This package implements a novel data integration model for sample-wise integration of different views. It accounts for compositionality and employs a non-parametric mean-variance trend for sequence count data. The resulting model can be conveniently plotted to allow for explorative visualization of variability shared over different views.

Installation

The package can be installed and loaded using the following commands:

library(BiocManager)
BiocManager::install("combi", update = FALSE)
library(devtools)
install_github("CenterForStatistics-UGent/combi")
suppressPackageStartupMessages(library(combi))
cat("combi package version", as.character(packageVersion("combi")), "\n")
## combi package version 1.5.0
data(Zhang)

Unconstrained integration

For an unconstrained ordination, a named list of datasets with overlapping samples must be supplied. The datasets can currently be supplied as a raw data matrix (with features in the columns), or as a phyloseq, SummarizedExperiment or ExpressionSet object. In addition, information on the required distribution (“quasi” for quasi-likelihood fitting, “gaussian” for normal data) and compositional nature (TRUE/FALSE) should be supplied

One can print basic infor about the ordination

## Unconstrained combi ordination of 2 dimensions on 2 views with 42 samples.
## Views and number of features were:
##  microbiome: 130
##  metabolomics: 174
##  Importance parameters of dimensions 1 to 2 are 117 and 44.9

A simple plot function is available for the result, for samples and shapes, a data frame should also be supplied

By default, only the most important features (furthest away from the origin) are shown. To show all features, one can resort to point cloud plots or density plots as follows:

The drawback is that now no feature labels are shown.

Adding projections

As an aid to interpretation of compositional views, links between features can be plotted and projected onto samples by providing their names or approximate coordinates

Coordinates

Finally, one can extract the coordinates for use in third-party software

Constrained integration

For a constrained ordination also a data frame of sample variables should be supplied

Also here we can get a quick overview

## Constrained combi ordination of 2 dimensions on 2 views with 42 samples.
## Views and number of features were:
##  microbiome: 130
##  metabolomics: 174
##  Number of sample variables included was 4,
## for which 6 parameters were estimated per dimension.
##  Importance parameters of dimensions 1 to 2 are 34.2 and 21.4

and plot the ordination

Diagnostics

Convergence of the iterative algorithm can be assessed as follows:

Influence of the different views can be investigated through

Session info

This vignette was generated with following version of R:

sessionInfo()
## R version 4.1.0 beta (2021-05-03 r80259)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] combi_1.5.0
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-152                bitops_1.0-7               
##   [3] matrixStats_0.58.0          phyloseq_1.37.0            
##   [5] progress_1.2.2              GenomeInfoDb_1.29.0        
##   [7] numDeriv_2016.8-1.1         tools_4.1.0                
##   [9] bslib_0.2.5.1               vegan_2.5-7                
##  [11] utf8_1.2.1                  R6_2.5.0                   
##  [13] mgcv_1.8-35                 DBI_1.1.1                  
##  [15] BiocGenerics_0.39.0         colorspace_2.0-1           
##  [17] permute_0.9-5               rhdf5filters_1.5.0         
##  [19] ade4_1.7-16                 tidyselect_1.1.1           
##  [21] prettyunits_1.1.1           compiler_4.1.0             
##  [23] quantreg_5.85               Biobase_2.53.0             
##  [25] formatR_1.9                 SparseM_1.81               
##  [27] alabama_2015.3-1            isoband_0.2.4              
##  [29] DelayedArray_0.19.0         labeling_0.4.2             
##  [31] sass_0.4.0                  scales_1.1.1               
##  [33] quadprog_1.5-8              stringr_1.4.0              
##  [35] digest_0.6.27               rmarkdown_2.8              
##  [37] XVector_0.33.0              pkgconfig_2.0.3            
##  [39] htmltools_0.5.1.1           MatrixGenerics_1.5.0       
##  [41] highr_0.9                   limma_3.49.0               
##  [43] rlang_0.4.11                farver_2.1.0               
##  [45] jquerylib_0.1.4             generics_0.1.0             
##  [47] jsonlite_1.7.2              dplyr_1.0.6                
##  [49] RCurl_1.98-1.3              magrittr_2.0.1             
##  [51] GenomeInfoDbData_1.2.6      biomformat_1.21.0          
##  [53] Matrix_1.3-3                Rcpp_1.0.6                 
##  [55] munsell_0.5.0               S4Vectors_0.31.0           
##  [57] Rhdf5lib_1.15.0             fansi_0.4.2                
##  [59] ape_5.5                     lifecycle_1.0.0            
##  [61] stringi_1.6.2               yaml_2.2.1                 
##  [63] nleqslv_3.3.2               MASS_7.3-54                
##  [65] SummarizedExperiment_1.23.0 zlibbioc_1.39.0            
##  [67] BB_2019.10-1                rhdf5_2.37.0               
##  [69] plyr_1.8.6                  grid_4.1.0                 
##  [71] parallel_4.1.0              crayon_1.4.1               
##  [73] lattice_0.20-44             Biostrings_2.61.0          
##  [75] splines_4.1.0               tensor_1.5                 
##  [77] multtest_2.49.0             hms_1.1.0                  
##  [79] knitr_1.33                  pillar_1.6.1               
##  [81] igraph_1.2.6                GenomicRanges_1.45.0       
##  [83] reshape2_1.4.4              codetools_0.2-18           
##  [85] stats4_4.1.0                glue_1.4.2                 
##  [87] evaluate_0.14               cobs_1.3-4                 
##  [89] data.table_1.14.0           vctrs_0.3.8                
##  [91] foreach_1.5.1               MatrixModels_0.5-0         
##  [93] gtable_0.3.0                purrr_0.3.4                
##  [95] assertthat_0.2.1            ggplot2_3.3.3              
##  [97] xfun_0.23                   survival_3.2-11            
##  [99] tibble_3.1.2                conquer_1.0.2              
## [101] iterators_1.0.13            IRanges_2.27.0             
## [103] cluster_2.1.2               ellipsis_0.3.2