1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SingleCellMultiModal")

1.1 Load

library(SingleCellMultiModal)
library(MultiAssayExperiment)

2 scNMT

The dataset was graciously provided by Argelaguet et al. (2019).

Scripts used to process the raw data were written and maintained by Argelaguet and colleagues and reside on GitHub: https://github.com/rargelaguet/scnmt_gastrulation

For more information on the protocol, see Clark et al. (2018).

2.1 Downloading datasets

The user can see the available dataset by using the default options

scNMT("mouse_gastrulation", mode = "*", version = "1.0.0", dry.run = TRUE)
## snapshotDate(): 2020-10-27
##     ah_id         mode file_size rdataclass rdatadateadded rdatadateremoved
## 1  EH3738      acc_cgi      7 Mb     matrix     2020-09-03             <NA>
## 2  EH3739     acc_CTCF    1.2 Mb     matrix     2020-09-03             <NA>
## 3  EH3740      acc_DHS    0.3 Mb     matrix     2020-09-03             <NA>
## 4  EH3741 acc_genebody   49.6 Mb     matrix     2020-09-03             <NA>
## 5  EH3742     acc_p300    0.2 Mb     matrix     2020-09-03             <NA>
## 6  EH3743 acc_promoter   27.2 Mb     matrix     2020-09-03             <NA>
## 7  EH3745      met_cgi    4.6 Mb     matrix     2020-09-03             <NA>
## 8  EH3746     met_CTCF    0.1 Mb     matrix     2020-09-03             <NA>
## 9  EH3747      met_DHS    0.1 Mb     matrix     2020-09-03             <NA>
## 10 EH3748 met_genebody   26.8 Mb     matrix     2020-09-03             <NA>
## 11 EH3749     met_p300    0.1 Mb     matrix     2020-09-03             <NA>
## 12 EH3750 met_promoter   11.5 Mb     matrix     2020-09-03             <NA>
## 13 EH3751          rna   18.6 Mb     matrix     2020-09-03             <NA>

Or by simply running:

scNMT("mouse_gastrulation", version = "1.0.0")
## snapshotDate(): 2020-10-27
##     ah_id         mode file_size rdataclass rdatadateadded rdatadateremoved
## 1  EH3738      acc_cgi      7 Mb     matrix     2020-09-03             <NA>
## 2  EH3739     acc_CTCF    1.2 Mb     matrix     2020-09-03             <NA>
## 3  EH3740      acc_DHS    0.3 Mb     matrix     2020-09-03             <NA>
## 4  EH3741 acc_genebody   49.6 Mb     matrix     2020-09-03             <NA>
## 5  EH3742     acc_p300    0.2 Mb     matrix     2020-09-03             <NA>
## 6  EH3743 acc_promoter   27.2 Mb     matrix     2020-09-03             <NA>
## 7  EH3745      met_cgi    4.6 Mb     matrix     2020-09-03             <NA>
## 8  EH3746     met_CTCF    0.1 Mb     matrix     2020-09-03             <NA>
## 9  EH3747      met_DHS    0.1 Mb     matrix     2020-09-03             <NA>
## 10 EH3748 met_genebody   26.8 Mb     matrix     2020-09-03             <NA>
## 11 EH3749     met_p300    0.1 Mb     matrix     2020-09-03             <NA>
## 12 EH3750 met_promoter   11.5 Mb     matrix     2020-09-03             <NA>
## 13 EH3751          rna   18.6 Mb     matrix     2020-09-03             <NA>

2.2 Data versions

A more recent release of the ‘mouse_gastrulation’ dataset has been provided by Argelaguet and colleagues. This dataset includes additional cells that did not pass the original quality metrics as imposed for the version 1.0.0 dataset.

Use the version argument to indicate the newer dataset version (2.0.0):

scNMT("mouse_gastrulation", version = '2.0.0', dry.run = TRUE)
## snapshotDate(): 2020-10-27
##     ah_id         mode file_size rdataclass rdatadateadded rdatadateremoved
## 1  EH3753      acc_cgi   21.1 Mb     matrix     2020-09-03             <NA>
## 2  EH3754     acc_CTCF    1.2 Mb     matrix     2020-09-03             <NA>
## 3  EH3755      acc_DHS   16.2 Mb     matrix     2020-09-03             <NA>
## 4  EH3756 acc_genebody   60.1 Mb     matrix     2020-09-03             <NA>
## 5  EH3757     acc_p300    0.2 Mb     matrix     2020-09-03             <NA>
## 6  EH3758 acc_promoter   33.8 Mb     matrix     2020-09-03             <NA>
## 7  EH3760      met_cgi   12.1 Mb     matrix     2020-09-03             <NA>
## 8  EH3761     met_CTCF    0.1 Mb     matrix     2020-09-03             <NA>
## 9  EH3762      met_DHS    3.9 Mb     matrix     2020-09-03             <NA>
## 10 EH3763 met_genebody   33.9 Mb     matrix     2020-09-03             <NA>
## 11 EH3764     met_p300    0.1 Mb     matrix     2020-09-03             <NA>
## 12 EH3765 met_promoter   18.7 Mb     matrix     2020-09-03             <NA>
## 13 EH3766          rna   43.5 Mb     matrix     2020-09-03             <NA>

2.3 Actual Data

Example with actual data:

nmt <- scNMT("mouse_gastrulation", mode = c("*_DHS", "*_cgi", "*_genebody"),
    version = "1.0.0", dry.run = FALSE)
nmt
## A MultiAssayExperiment object of 6 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 6:
##  [1] acc_DHS: matrix with 290 rows and 826 columns
##  [2] met_DHS: matrix with 66 rows and 826 columns
##  [3] acc_cgi: matrix with 4459 rows and 826 columns
##  [4] met_cgi: matrix with 5536 rows and 826 columns
##  [5] acc_genebody: matrix with 17139 rows and 826 columns
##  [6] met_genebody: matrix with 15837 rows and 826 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save all data to files

2.4 Exploring the data structure

Check row annotations:

rownames(nmt)
## CharacterList of length 6
## [["acc_DHS"]] ESC_DHS_118970 ESC_DHS_118919 ... ESC_DHS_68996 ESC_DHS_109494
## [["met_DHS"]] ESC_DHS_20778 ESC_DHS_14504 ... ESC_DHS_72133 ESC_DHS_72129
## [["acc_cgi"]] CGI_5278 CGI_6058 CGI_10627 ... CGI_7832 CGI_11329 CGI_10964
## [["met_cgi"]] CGI_3481 CGI_8941 CGI_956 CGI_9461 ... CGI_2867 CGI_3499 CGI_365
## [["acc_genebody"]] ENSMUSG00000036181 ENSMUSG00000071862 ... ENSMUSG00000025576
## [["met_genebody"]] ENSMUSG00000059334 ENSMUSG00000024026 ... ENSMUSG00000078302

Take a peek at the sampleMap:

sampleMap(nmt)
## DataFrame with 4956 rows and 3 columns
##             assay                primary                colname
##          <factor>            <character>            <character>
## 1    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 2    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 3    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 4    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 5    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## ...           ...                    ...                    ...
## 4952      acc_DHS       PS_VE_Plate9_G05       PS_VE_Plate9_G05
## 4953      acc_DHS       PS_VE_Plate9_G08       PS_VE_Plate9_G08
## 4954      acc_DHS       PS_VE_Plate9_G09       PS_VE_Plate9_G09
## 4955      acc_DHS       PS_VE_Plate9_G12       PS_VE_Plate9_G12
## 4956      acc_DHS       PS_VE_Plate9_H08       PS_VE_Plate9_H08

2.5 Chromatin Accessibility

See the accessibilty levels (as proportions) for DNase Hypersensitive Sites:

head(assay(nmt, "acc_DHS"))[, 1:4]
##                E4.5-5.5_new_Plate1_A02 E4.5-5.5_new_Plate1_A04
## ESC_DHS_118970              0.66666667                      NA
## ESC_DHS_118919              0.76190476                      NA
## ESC_DHS_66330               0.81818182               0.7142857
## ESC_DHS_43318                       NA               0.8000000
## ESC_DHS_6229                0.85714286               0.8000000
## ESC_DHS_9413                0.06666667               0.6800000
##                E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08
## ESC_DHS_118970                      NA               0.2631579
## ESC_DHS_118919               0.3636364               0.8421053
## ESC_DHS_66330                0.7391304               0.6086957
## ESC_DHS_43318                0.5000000               0.8888889
## ESC_DHS_6229                 0.3333333               0.7142857
## ESC_DHS_9413                 0.2142857               0.5217391

2.6 DNA Methylation

See the methylation percentage / proportion:

head(assay(nmt, "met_DHS"))[, 1:4]
##                E4.5-5.5_new_Plate1_A02 E4.5-5.5_new_Plate1_A04
## ESC_DHS_20778                0.8000000                      NA
## ESC_DHS_14504                0.8000000                     0.8
## ESC_DHS_112143                      NA                     0.4
## ESC_DHS_34593                0.6666667                     0.6
## ESC_DHS_20747                0.4000000                     0.6
## ESC_DHS_33671                       NA                     0.6
##                E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08
## ESC_DHS_20778                0.8571429               0.8000000
## ESC_DHS_14504                0.8000000               0.6000000
## ESC_DHS_112143               0.5714286               0.5000000
## ESC_DHS_34593                0.7142857               0.8000000
## ESC_DHS_20747                       NA               0.6000000
## ESC_DHS_33671                0.8333333               0.6666667

For protocol information, see the references below.

3 sessionInfo

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] rhdf5_2.34.0                scater_1.18.3              
##  [3] ggplot2_3.3.3               scran_1.18.3               
##  [5] SingleCellExperiment_1.12.0 SingleCellMultiModal_1.2.4 
##  [7] MultiAssayExperiment_1.16.0 SummarizedExperiment_1.20.0
##  [9] Biobase_2.50.0              GenomicRanges_1.42.0       
## [11] GenomeInfoDb_1.26.2         IRanges_2.24.1             
## [13] S4Vectors_0.28.1            BiocGenerics_0.36.0        
## [15] MatrixGenerics_1.2.0        matrixStats_0.57.0         
## [17] BiocStyle_2.18.1           
## 
## loaded via a namespace (and not attached):
##   [1] SpatialExperiment_1.0.0       ggbeeswarm_0.6.0             
##   [3] colorspace_2.0-0              HCAMatrixBrowser_1.0.1       
##   [5] ellipsis_0.3.1                scuttle_1.0.4                
##   [7] bluster_1.0.0                 futile.logger_1.4.3          
##   [9] XVector_0.30.0                BiocNeighbors_1.8.2          
##  [11] farver_2.0.3                  bit64_4.0.5                  
##  [13] RSpectra_0.16-0               interactiveDisplayBase_1.28.0
##  [15] AnnotationDbi_1.52.0          codetools_0.2-18             
##  [17] sparseMatrixStats_1.2.0       cachem_1.0.1                 
##  [19] knitr_1.31                    jsonlite_1.7.2               
##  [21] dbplyr_2.0.0                  uwot_0.1.10                  
##  [23] HDF5Array_1.18.0              shiny_1.6.0                  
##  [25] BiocManager_1.30.10           compiler_4.0.3               
##  [27] httr_1.4.2                    dqrng_0.2.1                  
##  [29] assertthat_0.2.1              Matrix_1.3-2                 
##  [31] fastmap_1.1.0                 limma_3.46.0                 
##  [33] later_1.1.0.1                 BiocSingular_1.6.0           
##  [35] formatR_1.7                   htmltools_0.5.1.1            
##  [37] tools_4.0.3                   rsvd_1.0.3                   
##  [39] igraph_1.2.6                  gtable_0.3.0                 
##  [41] glue_1.4.2                    GenomeInfoDbData_1.2.4       
##  [43] dplyr_1.0.3                   rappdirs_0.3.2               
##  [45] Rcpp_1.0.6                    rapiclient_0.1.3             
##  [47] rhdf5filters_1.2.0            vctrs_0.3.6                  
##  [49] ExperimentHub_1.16.0          DelayedMatrixStats_1.12.2    
##  [51] AnVIL_1.2.0                   xfun_0.20                    
##  [53] stringr_1.4.0                 beachmat_2.6.4               
##  [55] mime_0.9                      lifecycle_0.2.0              
##  [57] irlba_2.3.3                   statmod_1.4.35               
##  [59] AnnotationHub_2.22.0          edgeR_3.32.1                 
##  [61] zlibbioc_1.36.0               scales_1.1.1                 
##  [63] promises_1.1.1                lambda.r_1.2.4               
##  [65] yaml_2.2.1                    curl_4.3                     
##  [67] memoise_2.0.0                 gridExtra_2.3                
##  [69] UpSetR_1.4.0                  stringi_1.5.3                
##  [71] RSQLite_2.2.3                 highr_0.8                    
##  [73] BiocVersion_3.12.0            BiocParallel_1.24.1          
##  [75] rlang_0.4.10                  pkgconfig_2.0.3              
##  [77] bitops_1.0-6                  evaluate_0.14                
##  [79] lattice_0.20-41               Rhdf5lib_1.12.1              
##  [81] purrr_0.3.4                   labeling_0.4.2               
##  [83] cowplot_1.1.1                 bit_4.0.4                    
##  [85] tidyselect_1.1.0              RcppAnnoy_0.0.18             
##  [87] plyr_1.8.6                    magrittr_2.0.1               
##  [89] bookdown_0.21                 R6_2.5.0                     
##  [91] magick_2.6.0                  generics_0.1.0               
##  [93] DelayedArray_0.16.1           DBI_1.1.1                    
##  [95] pillar_1.4.7                  withr_2.4.1                  
##  [97] RCurl_1.98-1.2                tibble_3.0.5                 
##  [99] crayon_1.3.4                  futile.options_1.0.1         
## [101] BiocFileCache_1.14.0          rmarkdown_2.6                
## [103] viridis_0.5.1                 locfit_1.5-9.4               
## [105] grid_4.0.3                    blob_1.2.1                   
## [107] digest_0.6.27                 xtable_1.8-4                 
## [109] httpuv_1.5.5                  munsell_0.5.0                
## [111] beeswarm_0.2.3                viridisLite_0.3.0            
## [113] vipor_0.4.5

References

Argelaguet, Ricard, Stephen J Clark, Hisham Mohammed, L Carine Stapel, Christel Krueger, Chantriolnt-Andreas Kapourani, Ivan Imaz-Rosshandler, et al. 2019. “Multi-Omics Profiling of Mouse Gastrulation at Single-Cell Resolution.” Nature 576 (7787): 487–91.

Clark, Stephen J, Ricard Argelaguet, Chantriolnt-Andreas Kapourani, Thomas M Stubbs, Heather J Lee, Celia Alda-Catalinas, Felix Krueger, et al. 2018. “scNMT-seq Enables Joint Profiling of Chromatin Accessibility DNA Methylation and Transcription in Single Cells.” Nat. Commun. 9 (1): 781.