The CELLxGENE data portal (https://cellxgene.cziscience.com/) provides a graphical user interface to collections of single-cell sequence data processed in standard ways to ‘count matrix’ summaries. The cellxgenedp package provides an alternative, R-based inteface, allowing flexible data discovery, viewing, and retrieval.
cellxgenedp 1.12.0
NOTE: The interface to CELLxGENE has changed; versions of cellxgenedp prior to 1.4.1 / 1.5.2 will cease to work when CELLxGENE removes the previous interface. See the vignette section ‘API changes’ for additional details.
This package is available in Bioconductor version 3.15 and later. The following code installs cellxgenedp from Bioconductor
if (!"BiocManager" %in% rownames(installed.packages()))
    install.packages("BiocManager", repos = "https://CRAN.R-project.org")
BiocManager::install("cellxgenedp")Alternatively, install the ‘development’ version from GitHub (see GitHub.io for current documentation)
if (!"remotes" %in% rownames(installed.packages()))
    install.packages("remotes", repos = "https://CRAN.R-project.org")
remotes::install_github("mtmorgan/cellxgenedp")To also install additional packages required for this vignette, use
pkgs <- c("tidyr", "zellkonverter", "SingleCellExperiment", "HDF5Array")
required_pkgs <- pkgs[!pkgs %in% rownames(installed.packages())]
BiocManager::install(required_pkgs)Load the package into your current R session. We make extensive use of the dplyr packages, and at the end of the vignette use SingleCellExperiment and zellkonverter, so load those as well.
library(zellkonverter)
library(SingleCellExperiment) # load early to avoid masking dplyr::count()
library(dplyr)
library(cellxgenedp)cxg() Provides a ‘shiny’ interfaceThe following sections outline how to use the cellxgenedp package
in an R script; most functionality is also available in the cxg()
shiny application, providing an easy way to identify, download, and
visualize one or several datasets. Start the app
cxg()choose a project on the first tab, and a dataset for visualization, or one or more datasets for download!
Retrieve metadata about resources available at the cellxgene data
portal using db():
db <- db()## Collections ■                                  1% | ETA:  2m## Collections ■■■                                5% | ETA:  1m## Collections ■■■■                              10% | ETA:  1m## Collections ■■■■■                             14% | ETA:  1m## Collections ■■■■■■■                           19% | ETA:  1m## Collections ■■■■■■■■                          22% | ETA:  1m
## Collections ■■■■■■■■                          22% | ETA:  1m## Collections ■■■■■■■■■                         26% | ETA:  1m## Collections ■■■■■■■■■■                        31% | ETA:  1m## Collections ■■■■■■■■■■■                       34% | ETA:  1m## Collections ■■■■■■■■■■■■■                     39% | ETA: 49s## Collections ■■■■■■■■■■■■■■                    45% | ETA: 42s## Collections ■■■■■■■■■■■■■■■                   48% | ETA: 40s## Collections ■■■■■■■■■■■■■■■■                  51% | ETA: 39s## Collections ■■■■■■■■■■■■■■■■■                 53% | ETA: 38s## Collections ■■■■■■■■■■■■■■■■■■                57% | ETA: 35s## Collections ■■■■■■■■■■■■■■■■■■■■              62% | ETA: 30s## Collections ■■■■■■■■■■■■■■■■■■■■■             67% | ETA: 26s## Collections ■■■■■■■■■■■■■■■■■■■■■■            70% | ETA: 23s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■          75% | ETA: 19s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■         79% | ETA: 17s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■■        82% | ETA: 14s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■■        85% | ETA: 12s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■■■       87% | ETA: 11s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      91% | ETA:  8s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     95% | ETA:  4s## Collections ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  100% | ETA:  0sPrinting the db object provides a brief overview of the available
data, as well as hints, in the form of functions like collections(),
for further exploration.
db## cellxgene_db
## number of collections(): 298
## number of datasets(): 1789
## number of files(): 1815The portal organizes data hierarchically, with ‘collections’ (research studies, approximately), ‘datasets’, and ‘files’. Discover data using the corresponding functions.
collections(db)## # A tibble: 298 × 18
##    collection_id    collection_version_id collection_url consortia contact_email
##    <chr>            <chr>                 <chr>          <list>    <chr>        
##  1 e02201d7-f49f-4… f540732d-8fe8-4af1-8… https://cellx… <chr [1]> richard.smit…
##  2 ad2149fc-19c5-4… c31775d0-37ad-4d0a-9… https://cellx… <chr [2]> li.wang@ucsf…
##  3 33d19f34-87f5-4… 46f90bbb-80e9-49da-b… https://cellx… <chr [1]> fabian.theis…
##  4 5900dda8-2dc3-4… 673d6b45-e5dd-4c34-8… https://cellx… <chr [1]> ruichen@bcm.…
##  5 28e9d721-6816-4… 18e6e9fc-46d4-4c17-9… https://cellx… <lgl [1]> Tom_Mariani@…
##  6 e9cf4e8d-05ed-4… 6ebe6c8e-34d7-471b-8… https://cellx… <chr [1]> lungdrcho@sn…
##  7 7d7cabfd-1d1f-4… 0cf01a7c-c64f-4747-8… https://cellx… <chr [2]> jimmie.ye@uc…
##  8 b52eb423-5d0d-4… 81c94ceb-9410-43ab-9… https://cellx… <chr [2]> st9@sanger.a…
##  9 77446b76-1c2d-4… 07ea4287-932d-4873-a… https://cellx… <chr [1]> hnakshat@iu.…
## 10 74e10dc4-cbb2-4… 07dbcc54-9d4b-4609-9… https://cellx… <chr [1]> charlotte.sc…
## # ℹ 288 more rows
## # ℹ 13 more variables: contact_name <chr>, curator_name <chr>,
## #   description <chr>, doi <chr>, links <list>, name <chr>,
## #   publisher_metadata <list>, revising_in <lgl>, revision_of <lgl>,
## #   visibility <chr>, created_at <date>, published_at <date>, revised_at <date>datasets(db)## # A tibble: 1,789 × 33
##    dataset_id   dataset_version_id collection_id donor_id assay  batch_condition
##    <chr>        <chr>              <chr>         <list>   <list> <list>         
##  1 82f6af6d-53… 6649d5f0-3226-441… e02201d7-f49… <chr>    <list> <lgl [1]>      
##  2 39ed7d98-67… 1b43bc92-9c17-4b9… e02201d7-f49… <chr>    <list> <lgl [1]>      
##  3 aa369999-a0… 1c08baea-56fd-434… ad2149fc-19c… <chr>    <list> <lgl [1]>      
##  4 4dd00779-7f… b37b1d6d-2f95-4b0… 33d19f34-87f… <chr>    <list> <lgl [1]>      
##  5 963a08cb-d9… 9fd6a2ab-c2f0-450… 5900dda8-2dc… <chr>    <list> <lgl [1]>      
##  6 6e00ccf7-07… a77fc250-40b8-442… 28e9d721-681… <chr>    <list> <lgl [1]>      
##  7 ef1a4b05-54… 993c6e92-2ae9-4d7… e9cf4e8d-05e… <chr>    <list> <lgl [1]>      
##  8 01ad3cd7-39… 1ee5d935-ec19-4b5… 7d7cabfd-1d1… <chr>    <list> <lgl [1]>      
##  9 f75f2ff4-28… 646f0e6b-419c-4f8… b52eb423-5d0… <chr>    <list> <lgl [1]>      
## 10 ed852810-a0… b96ec6ff-0394-4d8… b52eb423-5d0… <chr>    <list> <lgl [1]>      
## # ℹ 1,779 more rows
## # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
## #   default_embedding <chr>, development_stage <list>, disease <list>,
## #   embeddings <list>, explorer_url <chr>, feature_biotype <list>,
## #   feature_count <int>, feature_reference <list>, is_primary_data <list>,
## #   mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
## #   raw_data_location <chr>, schema_version <chr>, …files(db)## # A tibble: 1,815 × 4
##    dataset_id                              filesize filetype      url           
##    <chr>                                      <dbl> <chr>         <chr>         
##  1 82f6af6d-5313-439a-9936-5e844be49a70   158537379 H5AD          https://datas…
##  2 39ed7d98-676d-4b8d-9d0a-0f3b60914ead   238784948 H5AD          https://datas…
##  3 aa369999-a034-4ad9-b459-8a93c86ea77b 43498695556 ATAC_FRAGMENT https://datas…
##  4 aa369999-a034-4ad9-b459-8a93c86ea77b     6057171 ATAC_INDEX    https://datas…
##  5 aa369999-a034-4ad9-b459-8a93c86ea77b  2794417546 H5AD          https://datas…
##  6 4dd00779-7f73-4f50-89bb-e2d3c6b71b18   224364348 H5AD          https://datas…
##  7 963a08cb-d995-4862-9614-f9d38a7d7540 39907225440 ATAC_FRAGMENT https://datas…
##  8 963a08cb-d995-4862-9614-f9d38a7d7540     5843958 ATAC_INDEX    https://datas…
##  9 963a08cb-d995-4862-9614-f9d38a7d7540  3090854453 H5AD          https://datas…
## 10 6e00ccf7-0749-46ef-a999-dba785630d52    48266501 H5AD          https://datas…
## # ℹ 1,805 more rowsEach of these resources has a unique primary identifier (e.g.,
file_id) as well as an identifier describing the relationship of the
resource to other components of the database (e.g.,
dataset_id). These identifiers can be used to ‘join’ information
across tables.
facets() provides information on ‘levels’ present in specific columnsNotice that some columns are ‘lists’ rather than atomic vectors like ‘character’ or ‘integer’.
datasets(db) |>
    select(where(is.list))## # A tibble: 1,789 × 16
##    donor_id   assay      batch_condition cell_type   development_stage disease
##    <list>     <list>     <list>          <list>      <list>            <list> 
##  1 <chr [1]>  <list [1]> <lgl [1]>       <list [3]>  <list [1]>        <list> 
##  2 <chr [1]>  <list [1]> <lgl [1]>       <list [4]>  <list [1]>        <list> 
##  3 <chr [27]> <list [1]> <lgl [1]>       <list [29]> <list [20]>       <list> 
##  4 <chr [12]> <list [2]> <lgl [1]>       <list [23]> <list [1]>        <list> 
##  5 <chr [14]> <list [1]> <lgl [1]>       <list [28]> <list [10]>       <list> 
##  6 <chr [2]>  <list [1]> <lgl [1]>       <list [11]> <list [1]>        <list> 
##  7 <chr [2]>  <list [1]> <lgl [1]>       <list [3]>  <list [2]>        <list> 
##  8 <chr [80]> <list [1]> <lgl [1]>       <list [11]> <list [43]>       <list> 
##  9 <chr [14]> <list [2]> <lgl [1]>       <list [1]>  <list [4]>        <list> 
## 10 <chr [14]> <list [2]> <lgl [1]>       <list [13]> <list [4]>        <list> 
## # ℹ 1,779 more rows
## # ℹ 10 more variables: embeddings <list>, feature_biotype <list>,
## #   feature_reference <list>, is_primary_data <list>, organism <list>,
## #   self_reported_ethnicity <list>, sex <list>, spatial <list>,
## #   suspension_type <list>, tissue <list>This indicates that at least some of the datasets had more than one
type of assay, cell_type, etc. The facets() function provides a
convenient way of discovering possible levels of each column, e.g.,
assay, organism, self_reported_ethnicity, or sex, and the
number of datasets with each label.
facets(db, "assay")## # A tibble: 45 × 4
##    facet label                             ontology_term_id     n
##    <chr> <chr>                             <chr>            <int>
##  1 assay 10x 3' v3                         EFO:0009922        855
##  2 assay 10x 3' v2                         EFO:0009899        381
##  3 assay Visium Spatial Gene Expression V1 EFO:0022857        317
##  4 assay Slide-seqV2                       EFO:0030062        240
##  5 assay 10x 5' v1                         EFO:0011025        120
##  6 assay Smart-seq2                        EFO:0008931         78
##  7 assay 10x multiome                      EFO:0030059         73
##  8 assay 10x 5' v2                         EFO:0009900         62
##  9 assay sci-RNA-seq3                      EFO:0030028         49
## 10 assay Drop-seq                          EFO:0008722         27
## # ℹ 35 more rowsfacets(db, "self_reported_ethnicity")## # A tibble: 40 × 4
##    facet                   label                          ontology_term_id     n
##    <chr>                   <chr>                          <chr>            <int>
##  1 self_reported_ethnicity unknown                        unknown            759
##  2 self_reported_ethnicity European                       HANCESTRO:0005     700
##  3 self_reported_ethnicity na                             na                 433
##  4 self_reported_ethnicity Asian                          HANCESTRO:0008     219
##  5 self_reported_ethnicity African American               HANCESTRO:0568     116
##  6 self_reported_ethnicity Hispanic or Latin American     HANCESTRO:0014     116
##  7 self_reported_ethnicity Native American,Hispanic or L… HANCESTRO:0013,…    50
##  8 self_reported_ethnicity African American or Afro-Cari… HANCESTRO:0016      43
##  9 self_reported_ethnicity South Asian                    HANCESTRO:0006      31
## 10 self_reported_ethnicity African                        HANCESTRO:0010      28
## # ℹ 30 more rowsfacets(db, "sex")## # A tibble: 3 × 4
##   facet label   ontology_term_id     n
##   <chr> <chr>   <chr>            <int>
## 1 sex   male    PATO:0000384      1282
## 2 sex   female  PATO:0000383      1091
## 3 sex   unknown unknown            297Suppose we were interested in finding datasets from the 10x 3’ v3
assay (ontology_term_id of EFO:0009922) containing individuals of
African American ethnicity, and female sex. Use the facets_filter()
utility function to filter data sets as needed
african_american_female <-
    datasets(db) |>
    filter(
        facets_filter(assay, "ontology_term_id", "EFO:0009922"),
        facets_filter(self_reported_ethnicity, "label", "African American"),
        facets_filter(sex, "label", "female")
    )Use nrow(african_american_female) to find the number of datasets
satisfying our criteria. It looks like there are up to
african_american_female |>
    summarise(total_cell_count = sum(cell_count))## # A tibble: 1 × 1
##   total_cell_count
##              <int>
## 1         24078487cells sequenced (each dataset may contain cells from several
ethnicities, as well as males or individuals of unknown gender, so we
do not know the actual number of cells available without downloading
files). Use left_join to identify the corresponding collections:
## collections
left_join(
    african_american_female |> select(collection_id) |> distinct(),
    collections(db),
    by = "collection_id"
)## # A tibble: 35 × 18
##    collection_id    collection_version_id collection_url consortia contact_email
##    <chr>            <chr>                 <chr>          <list>    <chr>        
##  1 21bbfaec-6958-4… 87e9c5ee-a0d0-4c1e-9… https://cellx… <lgl [1]> Thomas.pranz…
##  2 cee845e3-ec04-4… 1f502a5f-c16a-453c-9… https://cellx… <chr [1]> yuw1@chop.edu
##  3 c9706a92-0e5f-4… 3367da8a-774d-4bb1-b… https://cellx… <chr [1]> hnakshat@iup…
##  4 ec691f5f-0aac-4… 89fad7a4-9a83-4c7c-a… https://cellx… <chr [1]> df2396@cumc.…
##  5 72d37bc9-76cc-4… 554c82b5-8c37-416b-b… https://cellx… <chr [1]> m.sepp@zmbh.…
##  6 6b701826-37bb-4… 9d92543e-3093-4962-9… https://cellx… <chr [1]> astreets@ber…
##  7 f17b9205-f61f-4… d72779ea-069d-462b-a… https://cellx… <chr [1]> genevieve.ko…
##  8 61e422dd-c9cd-4… fe0b55e0-1496-439e-a… https://cellx… <lgl [1]> arul@med.umi…
##  9 de379e5f-52d0-4… 58cccb76-e600-48c7-9… https://cellx… <chr [1]> barbara.treu…
## 10 58e85c2f-d52e-4… e7d46cb4-1f8e-4fb2-b… https://cellx… <lgl [1]> efthymios.mo…
## # ℹ 25 more rows
## # ℹ 13 more variables: contact_name <chr>, curator_name <chr>,
## #   description <chr>, doi <chr>, links <list>, name <chr>,
## #   publisher_metadata <list>, revising_in <lgl>, revision_of <lgl>,
## #   visibility <chr>, created_at <date>, published_at <date>, revised_at <date>Many collections include publication information and other external
data. This information is available in the return value of
collections(), but the helper function publisher_metadata(),
authors(), and links() may facilitate access.
Suppose one is interested in the publication “A single-cell atlas of the healthy breast tissues reveals clinically relevant clusters of breast epithelial cells”. Discover it in the collections
title_of_interest <- paste(
    "A single-cell atlas of the healthy breast tissues reveals clinically",
    "relevant clusters of breast epithelial cells"
)
collection_of_interest <-
    collections(db) |>
    dplyr::filter(startsWith(name, title_of_interest))
collection_of_interest |>
    glimpse()## Rows: 1
## Columns: 18
## $ collection_id         <chr> "c9706a92-0e5f-46c1-96d8-20e42467f287"
## $ collection_version_id <chr> "3367da8a-774d-4bb1-b5b8-5dc3e5dbbed7"
## $ collection_url        <chr> "https://cellxgene.cziscience.com/collections/c9…
## $ consortia             <list> "CZI Cell Science"
## $ contact_email         <chr> "hnakshat@iupui.edu"
## $ contact_name          <chr> "Harikrishna Nakshatri"
## $ curator_name          <chr> "Jennifer Yu-Sheng Chien"
## $ description           <chr> "Single-cell RNA sequencing (scRNA-seq) is an ev…
## $ doi                   <chr> "10.1016/j.xcrm.2021.100219"
## $ links                 <list> [["", "RAW_DATA", "https://explore.data.humancel…
## $ name                  <chr> "A single-cell atlas of the healthy breast tiss…
## $ publisher_metadata    <list> [[["Bhat-Nakshatri", "Poornima"], ["Gao", "Hongy…
## $ revising_in           <lgl> NA
## $ revision_of           <lgl> NA
## $ visibility            <chr> "PUBLIC"
## $ created_at            <date> 2025-03-30
## $ published_at          <date> 2021-03-25
## $ revised_at            <date> 2025-04-14Use the collection_id to extract publisher metadata (including a DOI
if available) and author information
collection_id_of_interest <- pull(collection_of_interest, "collection_id")
publisher_metadata(db) |>
    filter(collection_id == collection_id_of_interest) |>
    glimpse()## Rows: 1
## Columns: 9
## $ collection_id   <chr> "c9706a92-0e5f-46c1-96d8-20e42467f287"
## $ name            <chr> "A single-cell atlas of the healthy breast tissues rev…
## $ is_preprint     <lgl> FALSE
## $ journal         <chr> "Cell Reports Medicine"
## $ published_year  <int> 2021
## $ published_month <int> 3
## $ published_day   <int> 1
## $ published_at    <date> 2021-03-01
## $ doi             <chr> NAauthors(db) |>
    filter(collection_id == collection_id_of_interest)## # A tibble: 12 × 4
##    collection_id                        family         given       consortium
##    <chr>                                <chr>          <chr>       <chr>     
##  1 c9706a92-0e5f-46c1-96d8-20e42467f287 Bhat-Nakshatri Poornima    <NA>      
##  2 c9706a92-0e5f-46c1-96d8-20e42467f287 Gao            Hongyu      <NA>      
##  3 c9706a92-0e5f-46c1-96d8-20e42467f287 Sheng          Liu         <NA>      
##  4 c9706a92-0e5f-46c1-96d8-20e42467f287 McGuire        Patrick C.  <NA>      
##  5 c9706a92-0e5f-46c1-96d8-20e42467f287 Xuei           Xiaoling    <NA>      
##  6 c9706a92-0e5f-46c1-96d8-20e42467f287 Wan            Jun         <NA>      
##  7 c9706a92-0e5f-46c1-96d8-20e42467f287 Liu            Yunlong     <NA>      
##  8 c9706a92-0e5f-46c1-96d8-20e42467f287 Althouse       Sandra K.   <NA>      
##  9 c9706a92-0e5f-46c1-96d8-20e42467f287 Colter         Austyn      <NA>      
## 10 c9706a92-0e5f-46c1-96d8-20e42467f287 Sandusky       George      <NA>      
## 11 c9706a92-0e5f-46c1-96d8-20e42467f287 Storniolo      Anna Maria  <NA>      
## 12 c9706a92-0e5f-46c1-96d8-20e42467f287 Nakshatri      Harikrishna <NA>Collections may have links to additional external data, in this case a
DOI and two links to RAW_DATA.
external_links <- links(db)
external_links## # A tibble: 998 × 4
##    collection_id                        link_name link_type   link_url          
##    <chr>                                <chr>     <chr>       <chr>             
##  1 e02201d7-f49f-401f-baf0-1eb1406546c0 phs001272 RAW_DATA    https://www.ncbi.…
##  2 ad2149fc-19c5-41de-8cfe-44710fbada73 <NA>      RAW_DATA    https://assets.ne…
##  3 ad2149fc-19c5-41de-8cfe-44710fbada73 <NA>      DATA_SOURCE https://cell.ucsf…
##  4 ad2149fc-19c5-41de-8cfe-44710fbada73 <NA>      PROTOCOL    https://github.co…
##  5 33d19f34-87f5-455b-8ca5-9023a2e5453d <NA>      OTHER       https://explore.d…
##  6 33d19f34-87f5-455b-8ca5-9023a2e5453d <NA>      OTHER       https://singlecel…
##  7 33d19f34-87f5-455b-8ca5-9023a2e5453d <NA>      OTHER       https://cells.ucs…
##  8 5900dda8-2dc3-4770-b604-084eac1c2c82 <NA>      RAW_DATA    https://explore.d…
##  9 5900dda8-2dc3-4770-b604-084eac1c2c82 GSE268630 RAW_DATA    https://www.ncbi.…
## 10 28e9d721-6816-48a2-8d0b-43bf0b0c0ebc <NA>      PROTOCOL    https://www.proto…
## # ℹ 988 more rowsexternal_links |>
    count(link_type)## # A tibble: 5 × 2
##   link_type       n
##   <chr>       <int>
## 1 DATA_SOURCE    69
## 2 LAB_WEBSITE    48
## 3 OTHER         430
## 4 PROTOCOL       55
## 5 RAW_DATA      396external_links |>
    filter(collection_id == collection_id_of_interest)## # A tibble: 2 × 4
##   collection_id                        link_name link_type link_url             
##   <chr>                                <chr>     <chr>     <chr>                
## 1 c9706a92-0e5f-46c1-96d8-20e42467f287 <NA>      RAW_DATA  https://explore.data…
## 2 c9706a92-0e5f-46c1-96d8-20e42467f287 <NA>      RAW_DATA  https://www.ncbi.nlm…Conversely, knowledge of a DOI, etc., can be used to discover details of the corresponding collection.
doi_of_interest <- "https://doi.org/10.1016/j.stem.2018.12.011"
links(db) |>
    filter(link_url == doi_of_interest) |>
    left_join(collections(db), by = "collection_id") |>
    glimpse()## Rows: 1
## Columns: 21
## $ collection_id         <chr> "b1a879f6-5638-48d3-8f64-f6592c1b1561"
## $ link_name             <chr> "PSC-ATO protocol"
## $ link_type             <chr> "PROTOCOL"
## $ link_url              <chr> "https://doi.org/10.1016/j.stem.2018.12.011"
## $ collection_version_id <chr> "5cb111e3-d2c7-4308-896d-9fd91c0d37f1"
## $ collection_url        <chr> "https://cellxgene.cziscience.com/collections/b1…
## $ consortia             <list> <"CZI Cell Science", "Wellcome HCA Strategic Sci…
## $ contact_email         <chr> "st9@sanger.ac.uk"
## $ contact_name          <chr> "Sarah Teichmann"
## $ curator_name          <chr> "Jennifer Yu-Sheng Chien"
## $ description           <chr> "Single-cell genomics studies have decoded the i…
## $ doi                   <chr> "10.1126/science.abo0510"
## $ links                 <list> [["scVI Models", "DATA_SOURCE", "https://develop…
## $ name                  <chr> "Mapping the developing human immune system acro…
## $ publisher_metadata    <list> [[["Suo", "Chenqu"], ["Dann", "Emma"], ["Goh", "…
## $ revising_in           <lgl> NA
## $ revision_of           <lgl> NA
## $ visibility            <chr> "PUBLIC"
## $ created_at            <date> 2025-04-01
## $ published_at          <date> 2022-10-04
## $ revised_at            <date> 2025-04-14cellxgeneVisualization is straight-forward once dataset_id is available. For
example, to visualize the first dataset in african_american_female,
use
african_american_female |>
    ## use criteria to identify a single dataset (here just the
    ## 'first' dataset), then visualize
    slice(1) |>
    datasets_visualize()Visualization is an interactive process, so datasets_visualize()
will only open up to 5 browser tabs per call.
Datasets usually contain H5AD (files produced by the python AnnData
module), and Rds (serialized files produced by the R Seurat
package). The Rds files may be unreadable if the version of Seurat
used to create the file is different from the version used to read the
file. We therefore focus on the H5AD files.
For illustration, we find all files associated with studies with African American females
download one of our selected files.
selected_files <-
    left_join(
        african_american_female |> select(dataset_id),
        files(db),
        by = "dataset_id"
    )And then choose a single dataset and its H5AD file for download
local_file <-
    selected_files |>
    filter(
        dataset_id == "de985818-285f-4f59-9dbd-d74968fddba3",
        filetype == "H5AD"
    ) |>
    files_download(dry.run = FALSE)
basename(local_file)## [1] "51ba9fdd-3b84-404b-91df-ee630823d716.h5ad"These are downloaded to a local cache (use the internal function
cellxgenedp:::.cellxgenedb_cache_path() for the location of the
cache), so the process is only time-consuming the first time.
H5AD files can be converted to R / Bioconductor objects using
the zellkonverter package.
h5ad <- readH5AD(local_file, use_hdf5 = TRUE, reader = "R")
h5ad## class: SingleCellExperiment 
## dim: 33159 31696 
## metadata(5): citation default_embedding schema_reference schema_version
##   title
## assays(1): X
## rownames(33159): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(6): feature_is_filtered feature_name ... feature_length
##   feature_type
## colnames(31696): CMGpool_AAACCCAAGGACAACC CMGpool_AAACCCACAATCTCTT ...
##   K109064_TTTGTTGGTTGCATCA K109064_TTTGTTGGTTGGACCC
## colData names(45): mapped_reference_annotation donor_id ...
##   development_stage observation_joinid
## reducedDimNames(3): X_pca X_tsne X_umap
## mainExpName: NULL
## altExpNames(0):The SingleCellExperiment object is a matrix-like object with rows
corresponding to genes and columns to cells. Thus we can easily
explore the cells present in the data.
h5ad |>
    colData(h5ad) |>
    as_tibble() |>
    count(sex, donor_id)## # A tibble: 7 × 3
##   sex    donor_id                     n
##   <fct>  <fct>                    <int>
## 1 female D1                        2303
## 2 female D2                         864
## 3 female D3                        2517
## 4 female D4                        1771
## 5 female D5                        2244
## 6 female D11                       7454
## 7 female pooled [D9,D7,D8,D10,D6] 14543The Orchestrating Single-Cell Analysis with Bioconductor
online resource provides an excellent introduction to analysis and
visualization of single-cell data in R / Bioconductor. Extensive
opportunities for working with AnnData objects in R but using the
native python interface are briefly described in, e.g., ?AnnData2SCE
help page of zellkonverter.
The hca package provides programmatic access to the Human Cell Atlas data portal, allowing retrieval of primary as well as derived single-cell data files.
Data access provided by CELLxGENE has changed to a new ‘Discover’ API. The main functionality of the cellxgenedp package has not changed, but specific columns have been removed, replaced or added, as follows:
collections()
access_type, data_submission_policy_versionupdated_at replaced with revised_atcollection_version_id, collection_url, doi,
revising_in, revision_ofdatasets()
is_valid, processing_status, published, revision,
created_atdataset_deployments replaced with explorer_url, name
replaced with title, updated_at replaced with revised_atdataset_version_id, batch_condition,
x_approximate_distributionfiles()
file_id, filename, s3_uri, user_submitted,
created_at, updated_atfilesize, url## R version 4.5.0 RC (2025-04-04 r88126 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] cellxgenedp_1.12.0          dplyr_1.1.4                
##  [3] SingleCellExperiment_1.30.1 SummarizedExperiment_1.38.1
##  [5] Biobase_2.68.0              GenomicRanges_1.60.0       
##  [7] GenomeInfoDb_1.44.0         IRanges_2.42.0             
##  [9] S4Vectors_0.46.0            BiocGenerics_0.54.0        
## [11] generics_0.1.4              MatrixGenerics_1.20.0      
## [13] matrixStats_1.5.0           zellkonverter_1.18.0       
## [15] BiocStyle_2.36.0           
## 
## loaded via a namespace (and not attached):
##  [1] dir.expiry_1.16.0       xfun_0.52               bslib_0.9.0            
##  [4] htmlwidgets_1.6.4       rhdf5_2.52.0            lattice_0.22-7         
##  [7] rhdf5filters_1.20.0     vctrs_0.6.5             rjsoncons_1.3.2        
## [10] tools_4.5.0             curl_6.2.2              parallel_4.5.0         
## [13] tibble_3.2.1            pkgconfig_2.0.3         Matrix_1.7-3           
## [16] lifecycle_1.0.4         GenomeInfoDbData_1.2.14 compiler_4.5.0         
## [19] httpuv_1.6.16           htmltools_0.5.8.1       sass_0.4.10            
## [22] yaml_2.3.10             later_1.4.2             pillar_1.10.2          
## [25] crayon_1.5.3            jquerylib_0.1.4         DT_0.33                
## [28] DelayedArray_0.34.1     cachem_1.1.0            abind_1.4-8            
## [31] mime_0.13               basilisk_1.20.0         tidyselect_1.2.1       
## [34] digest_0.6.37           bookdown_0.43           fastmap_1.2.0          
## [37] grid_4.5.0              cli_3.6.5               SparseArray_1.8.0      
## [40] magrittr_2.0.3          S4Arrays_1.8.0          h5mread_1.0.1          
## [43] utf8_1.2.5              withr_3.0.2             promises_1.3.2         
## [46] filelock_1.0.3          UCSC.utils_1.4.0        rmarkdown_2.29         
## [49] XVector_0.48.0          httr_1.4.7              reticulate_1.42.0      
## [52] png_0.1-8               HDF5Array_1.36.0        shiny_1.10.0           
## [55] evaluate_1.0.3          knitr_1.50              basilisk.utils_1.20.0  
## [58] rlang_1.1.6             Rcpp_1.0.14             xtable_1.8-4           
## [61] glue_1.8.0              BiocManager_1.30.25     jsonlite_2.0.0         
## [64] Rhdf5lib_1.30.0         R6_2.6.1