Contents

1 Introduction

2 SingleCellExperiment class

scfind is built on top of the Bioconductor’s SingleCellExperiment class. scfind operates on objects of class SingleCellExperiment and writes all of its results back to the the object.

3 scfind Input

If you already have an SCESet object, then proceed to the next chapter.

If you have a matrix or a data frame containing expression data then you first need to create an SingleCellExperiment object containing your data. For illustrative purposes we will use an example expression matrix provided with scfind. The dataset (yan) represents FPKM gene expression of 90 cells derived from human embryo. The authors (Yan et al.) have defined developmental stages of all cells in the original publication (ann data frame). We will use these stages in projection later.

library(SingleCellExperiment)
library(scfind)

head(ann)
##                 cell_type1
## Oocyte..1.RPKM.     zygote
## Oocyte..2.RPKM.     zygote
## Oocyte..3.RPKM.     zygote
## Zygote..1.RPKM.     zygote
## Zygote..2.RPKM.     zygote
## Zygote..3.RPKM.     zygote
yan[1:3, 1:3]
##          Oocyte..1.RPKM. Oocyte..2.RPKM. Oocyte..3.RPKM.
## C9orf152             0.0             0.0             0.0
## RPS11             1219.9          1021.1           931.6
## ELMO2                7.0            12.2             9.3

Note that the cell type information has to be stored in the cell_type1 column of the rowData slot of the SingleCellExperiment object.

Now let’s create a SingleCellExperiment object of the yan dataset:

sce <- SingleCellExperiment(assays = list(normcounts = as.matrix(yan)), colData = ann)
# this is needed to calculate dropout rate for feature selection
# important: normcounts have the same zeros as raw counts (fpkm)
counts(sce) <- normcounts(sce)
logcounts(sce) <- log2(normcounts(sce) + 1)
# use gene names as feature symbols
rowData(sce)$feature_symbol <- rownames(sce)
isSpike(sce, "ERCC") <- grepl("^ERCC-", rownames(sce))
## Warning: 'isSpike<-' is deprecated.
## Use 'isSpike<-' instead.
## See help("Deprecated")
## Warning: 'spikeNames' is deprecated.
## See help("Deprecated")
## Warning: 'isSpike' is deprecated.
## See help("Deprecated")
# remove features with duplicated names
sce <- sce[!duplicated(rownames(sce)), ]
sce
## class: SingleCellExperiment 
## dim: 20214 90 
## metadata(0):
## assays(3): normcounts counts logcounts
## rownames(20214): C9orf152 RPS11 ... CTSC AQP7
## rowData names(1): feature_symbol
## colnames(90): Oocyte..1.RPKM. Oocyte..2.RPKM. ...
##   Late.blastocyst..3..Cell.7.RPKM. Late.blastocyst..3..Cell.8.RPKM.
## colData names(1): cell_type1
## reducedDimNames(0):
## spikeNames(1): ERCC
## altExpNames(0):

6 sessionInfo()

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scfind_1.8.0                SingleCellExperiment_1.8.0 
##  [3] SummarizedExperiment_1.16.0 DelayedArray_0.12.0        
##  [5] BiocParallel_1.20.0         matrixStats_0.55.0         
##  [7] Biobase_2.46.0              GenomicRanges_1.38.0       
##  [9] GenomeInfoDb_1.22.0         IRanges_2.20.0             
## [11] S4Vectors_0.24.0            BiocGenerics_0.32.0        
## [13] knitr_1.25                  BiocStyle_2.14.0           
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2             plyr_1.8.4             pillar_1.4.2          
##  [4] compiler_3.6.1         BiocManager_1.30.9     XVector_0.26.0        
##  [7] bitops_1.0-6           tools_3.6.1            zlibbioc_1.32.0       
## [10] digest_0.6.22          bit_1.1-14             tibble_2.1.3          
## [13] evaluate_0.14          lattice_0.20-38        pkgconfig_2.0.3       
## [16] rlang_0.4.1            Matrix_1.2-17          yaml_2.2.0            
## [19] xfun_0.10              GenomeInfoDbData_1.2.2 stringr_1.4.0         
## [22] dplyr_0.8.3            tidyselect_0.2.5       grid_3.6.1            
## [25] glue_1.3.1             R6_2.4.0               hash_2.2.6.1          
## [28] rmarkdown_1.16         bookdown_0.14          reshape2_1.4.3        
## [31] purrr_0.3.3            magrittr_1.5           codetools_0.2-16      
## [34] htmltools_0.4.0        assertthat_0.2.1       stringi_1.4.3         
## [37] RCurl_1.95-4.12        crayon_1.3.4