Contents

library(gemma.R)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(SummarizedExperiment)
library(pheatmap)
library(viridis)

1 About Gemma

Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles. Gemma contains data from thousands of public studies, referencing thousands of published papers. Every dataset in Gemma has passed a rigorous curation process that re-annotates the expression platform at the sequence level, which allows for more consistent cross-platform comparisons and meta-analyses.

For detailed information on the curation process, read this page or the latest publication.

2 Package cheat sheet

3 Installation instructions

3.1 Bioconductor

You can install gemma.R through Bioconductor with the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("gemma.R")

4 Searching for datasets of interest in Gemma

The package includes various functions to search for datasets fitting desired criteria.

All datasets belonging to a taxon of interest can be accessed by using get_taxon_datasets while the function search_datasets can be used to further limit the results by a specified query containing key words, experiment names or ontology term URIs

get_taxon_datasets(taxon = 'human') %>%  # all human datasets 
    select(experiment.ShortName, experiment.Name,taxon.Name) %>% head
   experiment.ShortName
1:              GSE2018
2:              GSE4036
3:              GSE3489
4:              GSE1923
5:               GSE361
6:               GSE492
                                                                          experiment.Name
1:                                                            Human Lung Transplant - BAL
2:                                                                perro-affy-human-186940
3: Patterns of gene dysregulation in the frontal cortex of patients with HIV encephalitis
4: Identification of PDGF-dependent patterns of gene expression in U87 glioblastoma cells
5:                                                   Mammary epithelial cell transduction
6:                               Effect of prostaglandin analogs on aqueous humor outflow
   taxon.Name
1:      human
2:      human
3:      human
4:      human
5:      human
6:      human
search_datasets('bipolar',taxon = 'human') %>% # human datasets mentioning bipolar
    select(experiment.ShortName, experiment.Name,taxon.Name) %>% head
   experiment.ShortName
1:              GSE5389
2:              GSE4030
3:              GSE5388
4:              GSE7036
5:   McLean Hippocampus
6:           McLean_PFC
                                                                                                           experiment.Name
1:           Adult postmortem brain tissue (orbitofrontal cortex) from subjects with bipolar disorder and healthy controls
2:                                                                                                 bunge-affy-arabi-162779
3: Adult postmortem brain tissue (dorsolateral prefrontal cortex) from subjects with bipolar disorder and healthy controls
4:                                               Expression profiling in monozygotic twins discordant for bipolar disorder
5:                                                                                                      McLean Hippocampus
6:                                                                                                              McLean_PFC
   taxon.Name
1:      human
2:      human
3:      human
4:      human
5:      human
6:      human
search_datasets('http://purl.obolibrary.org/obo/DOID_3312', # ontology term URI for the bipolar disorder
                taxon = 'human') %>% 
    select(experiment.ShortName, experiment.Name,taxon.Name) %>% head
   experiment.ShortName
1:              GSE5389
2:              GSE5388
3:              GSE7036
4:   McLean Hippocampus
5:           McLean_PFC
6:     stanley_feinberg
                                                                                                           experiment.Name
1:           Adult postmortem brain tissue (orbitofrontal cortex) from subjects with bipolar disorder and healthy controls
2: Adult postmortem brain tissue (dorsolateral prefrontal cortex) from subjects with bipolar disorder and healthy controls
3:                                               Expression profiling in monozygotic twins discordant for bipolar disorder
4:                                                                                                      McLean Hippocampus
5:                                                                                                              McLean_PFC
6:                                                                     Stanley consortium collection Cerebellum - Feinberg
   taxon.Name
1:      human
2:      human
3:      human
4:      human
5:      human
6:      human

Note that a single call of these functions will only return 20 results by default and a 100 results maximum, controlled by the limit argument. In order to get all available results, offset argument should be used to make multiple calls.

To see how many available results are there, you can look at the attributes of the output objects where additional information from the API response is appended.

# a quick call with a limit of 1 to get the result count
result <- get_taxon_datasets(taxon = 'human',limit = 1) 
print(attributes(result)$totalElements)
[1] 5979

Since the maximum limit is 100 getting all results available will require multiple calls.

count = attributes(result)$totalElements
all_results <- lapply(seq(0, count, 100), function(offset){
    get_taxon_datasets(taxon = 'human',limit = 100, offset = offset)
}) %>% do.call(rbind,.) %>% 
    select(experiment.ShortName, experiment.Name,taxon.Name) %>% 
    head()
   experiment.ShortName
1:              GSE2018
2:              GSE4036
3:              GSE3489
4:              GSE1923
5:               GSE361
6:               GSE492
                                                                          experiment.Name
1:                                                            Human Lung Transplant - BAL
2:                                                                perro-affy-human-186940
3: Patterns of gene dysregulation in the frontal cortex of patients with HIV encephalitis
4: Identification of PDGF-dependent patterns of gene expression in U87 glioblastoma cells
5:                                                   Mammary epithelial cell transduction
6:                               Effect of prostaglandin analogs on aqueous humor outflow
   taxon.Name
1:      human
2:      human
3:      human
4:      human
5:      human
6:      human

See larger queries section for more details. To keep this vignette simpler we will keep using the first 20 results returned by default in examples below.

Information provided about the datasets by these functions include details about the quality and design of the study that can be used to judge if it is suitable for your use case. For instance geeq.batchEffect column will be set to -1 if Gemma’s preprocessing has detected batch effects that were unable to be resolved by batch correction and experiment.SampleCount will include the number of samples used in the experiment. More information about these and other fields can be found at the function documentation.

get_taxon_datasets(taxon = 'human') %>% # get human datasets
    filter(geeq.batchEffect !=-1 & experiment.SampleCount > 4) %>% # filter for datasets without batch effects and with a sample count more than 4
    select(experiment.ShortName, experiment.Name,taxon.Name) %>% head
   experiment.ShortName
1:              GSE2018
2:              GSE4036
3:              GSE3489
4:              GSE1923
5:               GSE361
6:               GSE492
                                                                          experiment.Name
1:                                                            Human Lung Transplant - BAL
2:                                                                perro-affy-human-186940
3: Patterns of gene dysregulation in the frontal cortex of patients with HIV encephalitis
4: Identification of PDGF-dependent patterns of gene expression in U87 glioblastoma cells
5:                                                   Mammary epithelial cell transduction
6:                               Effect of prostaglandin analogs on aqueous humor outflow
   taxon.Name
1:      human
2:      human
3:      human
4:      human
5:      human
6:      human

Gemma uses multiple ontologies when annotating datasets and using the term URIs instead of free text to search can lead to more specific results. search_annotations function allows searching for annotation terms that might be relevant to your query.

search_annotations('bipolar') %>% 
    head
   category.Name                         category.URI         value.Name
1:          <NA>                                 <NA>            Bipolar
2:       disease http://www.ebi.ac.uk/efo/EFO_0000408 bipolar I disorder
3:       disease http://www.ebi.ac.uk/efo/EFO_0000408   Bipolar Disorder
4:       disease http://www.ebi.ac.uk/efo/EFO_0000408 bipolar II disoder
5:       disease http://www.ebi.ac.uk/efo/EFO_0000408  Bipolar depressed
6:          <NA>                                 <NA>   Bipolar Disorder
                                   value.URI
1:                                      <NA>
2: http://purl.obolibrary.org/obo/DOID_14042
3:  http://purl.obolibrary.org/obo/DOID_3312
4:      http://www.ebi.ac.uk/efo/EFO_0009964
5:                                      <NA>
6:                                      <NA>

5 Downloading expression data

Upon identifying datasets of interest, more information about specific ones can be requested. In this example we will be using GSE46416 which includes samples taken from healthy donors along with manic/euthymic phase bipolar disorder patients.

The data associated with specific experiments can be accessed by using get_datasets_by_ids

get_datasets_by_ids("GSE46416") %>%
    select(experiment.ShortName, experiment.Name, experiment.ID, experiment.Description)
   experiment.ShortName
1:             GSE46416
                                                   experiment.Name
1: State- and trait-specific gene expression in euthymia and mania
   experiment.ID
1:          8997
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       experiment.Description
1: Gene expression profiles of bipolar disorder (BD) patients were assessed during both a manic and a euthymic phase and compared both intra-individually, and with the gene expression profiles of controls.\nLast Updated (by provider): Sep 05 2014\nContributors:  Christian C Witt Benedikt Brors Dilafruz Juraeva Jens Treutlein Carsten Sticht Stephanie H Witt Jana Strohmaier Helene Dukal Josef Frank Franziska Degenhardt Markus M Nöthen Sven Cichon Maren Lang Marcella Rietschel Sandra Meier Manuel Mattheisen

To access the expression data in a convenient form, you can use get_dataset_object. It is a high-level wrapper that combines various endpoint calls to return lists of annotated SummarizedExperiment or ExpressionSet objects that are compatible with other Bioconductor packages or a tidyverse-friendly long form tibble for downstream analyses. These include the expression matrix along with the experimental design, and ensure the sample names match between both when transforming/subsetting data.

dat <- get_dataset_object("GSE46416",
                          type = 'se') # SummarizedExperiment is the default output type

Note that the tidy format is less memory efficient but allows easy visualization and exploration with ggplot2 and the rest of the tidyverse.

To show how subsetting works, we’ll keep the “manic phase” data and the reference_subject_roles, which refers to the control samples in Gemma datasets.

# Check the levels of the disease factor
dat[[1]]$disease %>% unique()
[1] "euthymic phase,Bipolar Disorder" "reference subject role"         
[3] "bipolar disorder,manic phase"   
# Subset patients during manic phase and controls
manic <- dat[[1]][, dat[[1]]$disease == "bipolar disorder,manic phase" | 
        dat[[1]]$disease == "reference subject role"]
manic
class: SummarizedExperiment 
dim: 21410 21 
metadata(8): title abstract ... GemmaSuitabilityScore taxon
assays(1): counts
rownames(21410): 2315430 2315554 ... 7385683 7385696
rowData names(4): Probe GeneSymbol GeneName NCBIid
colnames(21): Control, 15 Control, 8 ... Control, 2_DE40 Bipolar
  disorder patient manic phase, 37
colData names(3): factorValues block disease

Let’s take a look at sample to sample correlation in our subset.

# Get Expression matrix
manicExpr <- assay(manic, "counts")


manicExpr %>% 
    cor %>% 
    pheatmap(col =viridis(10),border_color = NA,angle_col = 45,fontsize = 7)
Sample to sample correlations of bipolar patients during manic phase and controls.

Figure 1: Sample to sample correlations of bipolar patients during manic phase and controls

You can also use get_dataset_expression to only get the expression matrix, and get_dataset_design to get the experimental design matrix.

6 Platform Annotations

Expression data in Gemma comes with annotations for the gene each expression profile corresponds to. Using the get_platform_annotations function, these annotations can be retrieved independently of the expression data, along with additional annotations such as Gene Ontology terms.

Examples:

head(get_platform_annotations('GPL96'))
     ProbeName   GeneSymbols                                     GeneNames
1: 211750_x_at TUBA1C|TUBA1A             tubulin alpha 1c|tubulin alpha 1a
2:   216678_at                                                            
3:   216345_at        ZSWIM8            zinc finger SWIM-type containing 8
4:   207273_at                                                            
5: 216025_x_at        CYP2C9 cytochrome P450 family 2 subfamily C member 9
6: 218191_s_at        LMBRD1                     LMBR1 domain containing 1
                                                                                                                                                                                                                                                                                                                                                                      GOTerms
1:                                                                                                                          GO:0005737|GO:0005525|GO:0051301|GO:0005515|GO:0000226|GO:0005634|GO:0007017|GO:0000278|GO:0005874|GO:0005200|GO:0005198|GO:0005881|GO:0030705|GO:0015630|GO:0031982|GO:0070062|GO:0050807|GO:0036464|GO:0042802|GO:0055037|GO:0005829|GO:0031594
2:                                                                                                                                                                                                                                                                                                                                                                           
3:                                                                                                                                                                                                                                                                                    GO:0008270|GO:0043161|GO:0016567|GO:0031463|GO:0005829|GO:0031462|GO:1990756|GO:2000627
4:                                                                                                                                                                                                                                                                                                                                                                           
5: GO:0005737|GO:0052741|GO:0008405|GO:0008404|GO:0020037|GO:0008203|GO:0008202|GO:0043231|GO:0008392|GO:0034875|GO:0070330|GO:0019373|GO:0006805|GO:0005506|GO:0005789|GO:0097267|GO:0005886|GO:0032787|GO:0042178|GO:0004497|GO:0046456|GO:0008210|GO:0008395|GO:0016712|GO:0019627|GO:0042759|GO:0006082|GO:0070989|GO:0043603|GO:0018676|GO:0018675|GO:0016491|GO:0016098
6:                                                                                                                                                GO:0031419|GO:0072583|GO:0005789|GO:0005515|GO:0005886|GO:0005765|GO:0030136|GO:0005774|GO:0005158|GO:0007369|GO:0045334|GO:0061462|GO:0140318|GO:0043231|GO:0038016|GO:0032050|GO:0035612|GO:0072665|GO:0016020|GO:0016021
        GemmaIDs    NCBIids
1: 360802|172797 84790|7846
2:                         
3:        235733      23053
4:                         
5:         32964       1559
6:        303717      55788
head(get_platform_annotations('Generic_human'))
      ProbeName  GeneSymbols
1:        LCN10        LCN10
2:     STAG3L5P     STAG3L5P
3: LOC101059976 LOC101059976
4:         GAB3         GAB3
5: LOC100287155 LOC100287155
6:       RASSF2       RASSF2
                                                                 GeneNames
1:                                                            lipocalin 10
2:                                     stromal antigen 3-like 5 pseudogene
3: arf-GAP with GTPase, ANK repeat and PH domain-containing protein 2-like
4:                                       GRB2 associated binding protein 3
5:                                       hypothetical protein LOC100287155
6:                                  Ras association domain family member 2
                                                                                                                                                                                                                                                                                                    GOTerms
1:                                                                                                                                                                                                                                                                                    GO:0005576|GO:0036094
2:                                                                                                                                                                                                                                                                                                         
3:                                                                                                                                                                                                                                                                                                         
4:                                                                                                                                                                                                                                                              GO:0005737|GO:0030225|GO:0035591|GO:0007165
5:                                                                                                                                                                                                                                                                                                         
6: GO:0045670|GO:0005737|GO:0005515|GO:0005634|GO:0050821|GO:0005654|GO:0005794|GO:1901222|GO:1901223|GO:0046849|GO:0032991|GO:0048872|GO:0046330|GO:0001501|GO:0000776|GO:0005886|GO:0031954|GO:0007049|GO:0045667|GO:0004672|GO:0043065|GO:0045860|GO:0007165|GO:0033137|GO:0001503|GO:0005829|GO:0038168
   GemmaIDs   NCBIids
1:   441399    414332
2:  8799043 101735302
3:  8779607 101059976
4:   389635    139716
5:  8090381 100287155
6:   201914      9770

If you are interested in a particular gene, you can see which platforms include it using get_gene_probes. Note that functions to search gene work best with unambigious identifiers rather than symbols.

# lists genes in gemma matching the symbol or identifier
get_genes('Eno2')
   gene.Symbol       gene.Ensembl gene.NCBI                      gene.Name
1:        ENO2    ENSG00000111674      2026                      enolase 2
2:        Eno2 ENSMUSG00000004267     13807      enolase 2, gamma neuronal
3:        Eno2 ENSRNOG00000013141     24334                      enolase 2
4:        ENO2               <NA>    856579 phosphopyruvate hydratase ENO2
5:        eno2 ENSDARG00000014287    402874                      enolase 2
   gene.MFX.Rank taxon.Name         taxon.Scientific taxon.ID taxon.NCBI
1:     0.9284287      human             Homo sapiens        1       9606
2:     0.8634338      mouse             Mus musculus        2      10090
3:     0.8698729        rat        Rattus norvegicus        3      10116
4:     0.6226795      yeast Saccharomyces cerevisiae       11       4932
5:     0.8956117  zebrafish              Danio rerio       12       7955
   taxon.Database.Name taxon.Database.ID
1:                hg38                87
2:                mm10                81
3:                 rn6                86
4:                <NA>                NA
5:                <NA>                NA
# ncbi id for human ENO2
probes <- get_gene_probes(2026)

# remove the description for brevity of output
head(probes[,.SD, .SDcols = !colnames(probes) %in% c('mapping.Description','platform.Description')])
   mapping.Name platform.ShortName
1:        20016            GPL3093
2:        20024            GPL3092
3:        20024       lymphochip-2
4:         1639             GPL962
5:        35850         NHGRI-6.5k
6:    201313_at              GPL96
                                              platform.Name platform.ID
1:                                                    LC-25         211
2:                                                    LC-19         212
3:                                           Lymphochip 37k         229
4:                                               CHUGAI 41K          36
5:                                               NHGRI-6.5k         150
6: Affymetrix GeneChip Human Genome U133 Array Set HG-U133A           1
   platform.Type platform.Troubled taxon.Name taxon.Scientific taxon.ID
1:      TWOCOLOR             FALSE      human     Homo sapiens        1
2:      TWOCOLOR             FALSE      human     Homo sapiens        1
3:      TWOCOLOR             FALSE      human     Homo sapiens        1
4:      TWOCOLOR             FALSE      human     Homo sapiens        1
5:      TWOCOLOR             FALSE      human     Homo sapiens        1
6:      ONECOLOR             FALSE      human     Homo sapiens        1
   taxon.NCBI taxon.Database.Name taxon.Database.ID
1:       9606                hg38                87
2:       9606                hg38                87
3:       9606                hg38                87
4:       9606                hg38                87
5:       9606                hg38                87
6:       9606                hg38                87

7 Differential expression analyses

Gemma contains precomputed differential expression analyses for most of its datasets. Analyses can involve more than one factor, such as “sex” as well as “disease”. Some datasets contain more than one analysis to account for different factors and their interactions. The results are stored as resultSets, each corresponding to one factor (or their interaction). You can access them using get_differential_expression_values. From here on, we can explore and visualize the data to find the most differentially-expressed genes

Note that get_differential_expression_values can return multiple differentials per study if a study has multiple factors to contrast. Since GSE46416 only has one extracting the first element of the returned list is all we need.

dif_exp <- get_differential_expression_values('GSE46416')
(dif_exp[[1]])
         Probe    NCBIid GeneSymbol
    1: 2982730      4018        LPA
    2: 2787851    166752      FREM3
    3: 2477558                     
    4: 2910917                     
    5: 3983537    140886     PABPC5
   ---                             
21957: 3301011     64318      NOC3L
21958: 2461654 100130249  LINC02961
21959: 2360346      1141     CHRNB2
21960: 2391172      7293    TNFRSF4
21961: 2525718                     
                                            GeneName  pvalue corrected_pvalue
    1:                                lipoprotein(a) 0.91850           0.9521
    2:          FRAS1 related extracellular matrix 3 0.58500           0.7348
    3:                                               0.39650           0.5931
    4:                                               0.65060           0.7815
    5:         poly(A) binding protein cytoplasmic 5 0.10640           0.3227
   ---                                                                       
21957:           NOC3 like DNA replication regulator 0.06358           0.2647
21958:   long intergenic non-protein coding RNA 2961 0.25470           0.4735
21959: cholinergic receptor nicotinic beta 2 subunit 0.01213           0.1288
21960:             TNF receptor superfamily member 4 0.13140           0.3516
21961:                                               0.75260           0.8463
          rank contrast_113004_log2fc contrast_113004_tstat
    1: 0.96470               -0.02471              -0.35870
    2: 0.79600                0.18950               1.01200
    3: 0.66850                0.09179               1.13900
    4: 0.83250                0.15640               0.67390
    5: 0.32990                0.18450               2.03200
   ---                                                     
21957: 0.24020                0.14810               0.71780
21958: 0.53800               -0.17630              -1.53600
21959: 0.09417                0.10290               1.12300
21960: 0.37370               -0.00863              -0.06482
21961: 0.88930               -0.11050              -0.59600
       contrast_113004_pvalue contrast_113005_log2fc contrast_113005_tstat
    1:                0.72210               -0.02495               -0.3622
    2:                0.31910                0.14030                0.7495
    3:                0.26330                0.10160                1.2600
    4:                0.50520                0.20960                0.9032
    5:                0.05047                0.16030                1.7660
   ---                                                                    
21957:                0.47810               -0.33430               -1.6210
21958:                0.13440               -0.02347               -0.2045
21959:                0.26980                0.28670                3.1280
21960:                0.94870                0.23120                1.7370
21961:                0.55530               -0.13160               -0.7101
       contrast_113005_pvalue
    1:               0.719600
    2:               0.459000
    3:               0.216600
    4:               0.373100
    5:               0.086940
   ---                       
21957:               0.114800
21958:               0.839300
21959:               0.003721
21960:               0.091960
21961:               0.482700

By default the columns names of the output correspond to contrast IDs. To see what conditions these IDs correspond to we can either use get_dataset_differential_expression_analyses to get the metadata about differentials of a given dataset, or setreadableContrasts argument of get_differential_expression_values to TRUE. The former approach is usually better for a large scale systematic analysis while the latter is easier to read in an interactive session.

get_dataset_differential_expression_analyses function returns structured metadata about the differentials.

(contrasts <- get_dataset_differential_expression_analyses('GSE46416'))
   result.ID contrast.id experiment.ID baseline.category
1:    550248      113004          8997           disease
2:    550248      113005          8997           disease
                   baseline.categoryURI   baseline.factorValue
1: http://www.ebi.ac.uk/efo/EFO_0000408 reference subject role
2: http://www.ebi.ac.uk/efo/EFO_0000408 reference subject role
                      baseline.factorValueURI         experimental.factorValue
1: http://purl.obolibrary.org/obo/OBI_0000220    bipolar disorder, manic phase
2: http://purl.obolibrary.org/obo/OBI_0000220 euthymic phase, Bipolar Disorder
                experimental.factorValueURI subsetFactor.subset
1: http://purl.obolibrary.org/obo/DOID_3312               FALSE
2:                                     <NA>               FALSE
   subsetFactor.factorValue subsetFactor.factorValueURI
1:                     <NA>                        <NA>
2:                     <NA>                        <NA>
   subsetFactor.description subsetFactor.category subsetFactor.categoryURI
1:                     <NA>                    NA                       NA
2:                     <NA>                    NA                       NA
   subsetFactor.measurement subsetFactor.type probes.Analyzed genes.Analyzed
1:                       NA                NA           21961          18959
2:                       NA                NA           21961          18959

contrast.id column corresponds to the column names in the output of get_differential_expression_values while result.ID corresponds to the name of the differential in the output object. Using them together will let one to access differentially expressed gene counts for each condition contrast

# using result.ID and contrast.ID of the output above, we can access specific
# results. Note that one study may have multiple contrast objects
seq_len(nrow(contrasts)) %>% sapply(function(i){
    result_set = dif_exp[[as.character(contrasts[i,]$result.ID)]]
    p_values = result_set[[glue::glue("contrast_{contrasts[i,]$contrast.id}_pvalue")]]
    
    # multiple testing correction
    sum(p.adjust(p_values,method = 'BH') < 0.05)
}) -> dif_exp_genes

data.frame(result.ID = contrasts$result.ID,
           contrast.id = contrasts$contrast.id,
           baseline.factorValue = contrasts$baseline.factorValue,
           experimental.factorValue = contrasts$experimental.factorValue,
           n_diff = dif_exp_genes)
  result.ID contrast.id   baseline.factorValue         experimental.factorValue
1    550248      113004 reference subject role    bipolar disorder, manic phase
2    550248      113005 reference subject role euthymic phase, Bipolar Disorder
  n_diff
1      3
2   1389

Alternatively we, since we are only looking at one dataset and one contrast manually, we can simply use readableContrasts.

(de <- get_differential_expression_values("GSE46416",readableContrasts = TRUE)[[1]])
         Probe    NCBIid GeneSymbol
    1: 2982730      4018        LPA
    2: 2787851    166752      FREM3
    3: 2477558                     
    4: 2910917                     
    5: 3983537    140886     PABPC5
   ---                             
21957: 3301011     64318      NOC3L
21958: 2461654 100130249  LINC02961
21959: 2360346      1141     CHRNB2
21960: 2391172      7293    TNFRSF4
21961: 2525718                     
                                            GeneName  pvalue corrected_pvalue
    1:                                lipoprotein(a) 0.91850           0.9521
    2:          FRAS1 related extracellular matrix 3 0.58500           0.7348
    3:                                               0.39650           0.5931
    4:                                               0.65060           0.7815
    5:         poly(A) binding protein cytoplasmic 5 0.10640           0.3227
   ---                                                                       
21957:           NOC3 like DNA replication regulator 0.06358           0.2647
21958:   long intergenic non-protein coding RNA 2961 0.25470           0.4735
21959: cholinergic receptor nicotinic beta 2 subunit 0.01213           0.1288
21960:             TNF receptor superfamily member 4 0.13140           0.3516
21961:                                               0.75260           0.8463
          rank contrast_bipolar disorder, manic phase_logFoldChange
    1: 0.96470                                             -0.02471
    2: 0.79600                                              0.18950
    3: 0.66850                                              0.09179
    4: 0.83250                                              0.15640
    5: 0.32990                                              0.18450
   ---                                                             
21957: 0.24020                                              0.14810
21958: 0.53800                                             -0.17630
21959: 0.09417                                              0.10290
21960: 0.37370                                             -0.00863
21961: 0.88930                                             -0.11050
       contrast_bipolar disorder, manic phase_tstat
    1:                                     -0.35870
    2:                                      1.01200
    3:                                      1.13900
    4:                                      0.67390
    5:                                      2.03200
   ---                                             
21957:                                      0.71780
21958:                                     -1.53600
21959:                                      1.12300
21960:                                     -0.06482
21961:                                     -0.59600
       contrast_bipolar disorder, manic phase_pvalue
    1:                                       0.72210
    2:                                       0.31910
    3:                                       0.26330
    4:                                       0.50520
    5:                                       0.05047
   ---                                              
21957:                                       0.47810
21958:                                       0.13440
21959:                                       0.26980
21960:                                       0.94870
21961:                                       0.55530
       contrast_euthymic phase, Bipolar Disorder_logFoldChange
    1:                                                -0.02495
    2:                                                 0.14030
    3:                                                 0.10160
    4:                                                 0.20960
    5:                                                 0.16030
   ---                                                        
21957:                                                -0.33430
21958:                                                -0.02347
21959:                                                 0.28670
21960:                                                 0.23120
21961:                                                -0.13160
       contrast_euthymic phase, Bipolar Disorder_tstat
    1:                                         -0.3622
    2:                                          0.7495
    3:                                          1.2600
    4:                                          0.9032
    5:                                          1.7660
   ---                                                
21957:                                         -1.6210
21958:                                         -0.2045
21959:                                          3.1280
21960:                                          1.7370
21961:                                         -0.7101
       contrast_euthymic phase, Bipolar Disorder_pvalue
    1:                                         0.719600
    2:                                         0.459000
    3:                                         0.216600
    4:                                         0.373100
    5:                                         0.086940
   ---                                                 
21957:                                         0.114800
21958:                                         0.839300
21959:                                         0.003721
21960:                                         0.091960
21961:                                         0.482700
# Classify probes for plotting
de$diffexpr <- "No"
de$diffexpr[de$`contrast_bipolar disorder, manic phase_logFoldChange` > 1.0 & 
        de$`contrast_bipolar disorder, manic phase_pvalue` < 0.05] <- "Up"
de$diffexpr[de$`contrast_bipolar disorder, manic phase_logFoldChange` < -1.0 & 
        de$`contrast_bipolar disorder, manic phase_pvalue` < 0.05] <- "Down"

# Upregulated probes
filter(de, diffexpr == "Up") %>%
    arrange(`contrast_bipolar disorder, manic phase_pvalue`) %>%
    select(Probe, GeneSymbol, `contrast_bipolar disorder, manic phase_pvalue`, 
        `contrast_bipolar disorder, manic phase_logFoldChange`) %>%
    head(10)
      Probe GeneSymbol contrast_bipolar disorder, manic phase_pvalue
 1: 2319550       RBP7                                     8.612e-05
 2: 2548699     CYP1B1                                     1.027e-04
 3: 3907190       SLPI                                     3.326e-04
 4: 3629103      PCLAF                                     5.183e-04
 5: 3545525      SLIRP                                     5.646e-04
 6: 3146433      COX6C                                     9.204e-04
 7: 2538349                                                1.253e-03
 8: 2899102       H3C3                                     1.269e-03
 9: 3635198     BCL2A1                                     1.800e-03
10: 2633191      GPR15                                     2.410e-03
    contrast_bipolar disorder, manic phase_logFoldChange
 1:                                                1.074
 2:                                                1.322
 3:                                                1.056
 4:                                                1.278
 5:                                                1.349
 6:                                                1.467
 7:                                                1.073
 8:                                                1.026
 9:                                                1.080
10:                                                1.205
# Downregulated probes
filter(de, diffexpr == "Down") %>%
    arrange(`contrast_bipolar disorder, manic phase_pvalue`) %>%
    select(Probe, GeneSymbol, `contrast_bipolar disorder, manic phase_pvalue`, 
        `contrast_bipolar disorder, manic phase_logFoldChange`) %>%
    head(10)
      Probe GeneSymbol contrast_bipolar disorder, manic phase_pvalue
 1: 2775390                                                2.095e-06
 2: 3760268                                                1.153e-05
 3: 3124344                                                1.389e-04
 4: 3673179                                                1.581e-04
 5: 3245871      WDFY4                                     1.681e-04
 6: 3022689   SND1-IT1                                     2.267e-04
 7: 2679014                                                2.982e-04
 8: 4019758                                                3.553e-04
 9: 3336402      RBM14                                     3.606e-04
10: 2880955                                                3.740e-04
    contrast_bipolar disorder, manic phase_logFoldChange
 1:                                               -1.556
 2:                                               -1.851
 3:                                               -1.037
 4:                                               -1.034
 5:                                               -1.157
 6:                                               -1.220
 7:                                               -1.175
 8:                                               -1.405
 9:                                               -1.071
10:                                               -1.522
# Add gene symbols as labels to DE genes
de$delabel <- ""
de$delabel[de$diffexpr != "No"] <- de$GeneSymbol[de$diffexpr != "No"]

# Volcano plot for bipolar patients vs controls
ggplot(
    data = de,
    aes(
        x = `contrast_bipolar disorder, manic phase_logFoldChange`,
        y = -log10(`contrast_bipolar disorder, manic phase_pvalue`),
        color = diffexpr,
        label = delabel
    )
) +
    geom_point() +
    geom_hline(yintercept = -log10(0.05), col = "gray45", linetype = "dashed") +
    geom_vline(xintercept = c(-1.0, 1.0), col = "gray45", linetype = "dashed") +
    labs(x = "log2(FoldChange)", y = "-log10(p-value)") +
    scale_color_manual(values = c("blue", "black", "red")) +
    geom_text_repel(show.legend = FALSE) +
    theme_minimal()
Differentially-expressed genes in bipolar patients during manic phase versus controls.

Figure 2: Differentially-expressed genes in bipolar patients during manic phase versus controls

8 Larger queries

To query large amounts of data, the API has a pagination system which uses the limit and offset parameters. To avoid overloading the server, calls are limited to a maximum of 100 entries, so the offset allows you to get the next batch of entries in the next call(s).

To see how many available results are there, you can look at the attributes of the output objects where additional information from the API response is appended.

platform_count = attributes(get_platforms_by_ids(limit = 1))$totalElements
print(platform_count)
[1] 681

After which you can use offset to access all available platforms.

lapply(seq(0,platform_count,100), function(offset){
    get_platforms_by_ids(limit = 100, offset = offset) %>%
        select(platform.ID, platform.ShortName, taxon.Name)
}) %>% do.call(rbind,.)
     platform.ID   platform.ShortName taxon.Name
  1:           1                GPL96      human
  2:           2              GPL1355        rat
  3:           3              GPL1261      mouse
  4:           4               GPL570      human
  5:           5                GPL81      mouse
 ---                                            
677:        1314       Rosetta_Merged      human
678:        1315    RG-U34_ABC_Merged        rat
679:        1316             RAE230AB        rat
680:        1317             GPL30172      mouse
681:        1318 Agilent_8x60K_Merged      mouse

Many endpoints only support a single identifier:

get_dataset_annotations(c("GSE35974", "GSE46416"))
Error: Please specify one valid identifier for dataset.

In these cases, you will have to loop over all the identifiers you wish to query and send separate requests.

lapply(c("GSE35974", "GSE12649"), function(dataset) {
    get_dataset_annotations(dataset) %>% 
        mutate(experiment.shortName = dataset) %>%
        select(experiment.shortName, class.Name, term.Name)
}) %>% do.call(rbind,.)
    experiment.shortName     class.Name              term.Name
 1:             GSE35974        disease       Bipolar Disorder
 2:             GSE35974 biological sex                 female
 3:             GSE35974  organism part             cerebellum
 4:             GSE35974        disease          schizophrenia
 5:             GSE35974 biological sex                   male
 6:             GSE35974        disease      mental depression
 7:             GSE12649        disease       Bipolar Disorder
 8:             GSE12649        disease          schizophrenia
 9:             GSE12649  organism part reference subject role
10:             GSE12649  organism part      prefrontal cortex

9 Output options

9.1 Raw data

By default, Gemma API does some parsing on the raw API results to make it easier to work with inside of R. In the process, it drops some typically unused values. If you wish to fetch everything, use raw = TRUE. Instead of a data table, you’ll usually be served a list that represents the underlying JSON response.

get_gene_locations("DYRK1A")
   chromosome strand nucleotide length taxon.Name  taxon.Scientific taxon.ID
1:         11      +   33890705 118714        rat Rattus norvegicus        3
2:         21      +   37365572 160785      human      Homo sapiens        1
3:         16      +   94370636 125741      mouse      Mus musculus        2
   taxon.NCBI taxon.Database.Name taxon.Database.ID
1:      10116                 rn6                86
2:       9606                hg38                87
3:      10090                mm10                81
get_gene_locations("DYRK1A", raw = TRUE)
[[1]]
[[1]]$id
[1] 84782783

[[1]]$nucleotide
[1] 33890705

[[1]]$nucleotideLength
[1] 118714

[[1]]$strand
[1] "+"

[[1]]$bin
[1] 105

[[1]]$chromosome
[1] "11"

[[1]]$taxon
[[1]]$taxon$id
[1] 3

[[1]]$taxon$scientificName
[1] "Rattus norvegicus"

[[1]]$taxon$commonName
[1] "rat"

[[1]]$taxon$ncbiId
[1] 10116

[[1]]$taxon$externalDatabase
[[1]]$taxon$externalDatabase$id
[1] 86

[[1]]$taxon$externalDatabase$name
[1] "rn6"

[[1]]$taxon$externalDatabase$description
[1] "RGSC Rnor_6.0"

[[1]]$taxon$externalDatabase$uri
[1] "https://genome.ucsc.edu/cgi-bin/hgTracks?db=rn6"

[[1]]$taxon$externalDatabase$releaseVersion
NULL

[[1]]$taxon$externalDatabase$releaseUrl
NULL

[[1]]$taxon$externalDatabase$lastUpdated
NULL

[[1]]$taxon$externalDatabase$externalDatabases
[[1]]$taxon$externalDatabase$externalDatabases[[1]]
[[1]]$taxon$externalDatabase$externalDatabases[[1]]$id
[1] 116

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$name
[1] "rn6 RNA-Seq annotations"

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$description
[1] "Annotations provided by NCBI Genome and used by the RNA-Seq pipeline for rat data."

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$uri
[1] "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/"

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$releaseVersion
[1] "106"

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$releaseUrl
[1] "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Rattus_norvegicus/106"

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$lastUpdated
[1] "2022-10-24T07:00:00.000+00:00"

[[1]]$taxon$externalDatabase$externalDatabases[[1]]$externalDatabases
list()


[[1]]$taxon$externalDatabase$externalDatabases[[2]]
[[1]]$taxon$externalDatabase$externalDatabases[[2]]$id
[1] 100

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$name
[1] "rn4 annotations"

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$description
NULL

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$uri
[1] "https://hgdownload.cse.ucsc.edu/goldenpath/rn4/database/"

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$releaseVersion
NULL

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$releaseUrl
NULL

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$lastUpdated
NULL

[[1]]$taxon$externalDatabase$externalDatabases[[2]]$externalDatabases
list()






[[2]]
[[2]]$id
[1] 84362978

[[2]]$nucleotide
[1] 37365572

[[2]]$nucleotideLength
[1] 160785

[[2]]$strand
[1] "+"

[[2]]$bin
[1] 108

[[2]]$chromosome
[1] "21"

[[2]]$taxon
[[2]]$taxon$id
[1] 1

[[2]]$taxon$scientificName
[1] "Homo sapiens"

[[2]]$taxon$commonName
[1] "human"

[[2]]$taxon$ncbiId
[1] 9606

[[2]]$taxon$externalDatabase
[[2]]$taxon$externalDatabase$id
[1] 87

[[2]]$taxon$externalDatabase$name
[1] "hg38"

[[2]]$taxon$externalDatabase$description
[1] "Genome Reference Consortium Human GRCh38.p13 (GCA_000001405.28)"

[[2]]$taxon$externalDatabase$uri
[1] "https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38"

[[2]]$taxon$externalDatabase$releaseVersion
[1] "GRCh38.p13"

[[2]]$taxon$externalDatabase$releaseUrl
NULL

[[2]]$taxon$externalDatabase$lastUpdated
[1] "2022-06-30T07:00:00.000+00:00"

[[2]]$taxon$externalDatabase$externalDatabases
[[2]]$taxon$externalDatabase$externalDatabases[[1]]
[[2]]$taxon$externalDatabase$externalDatabases[[1]]$id
[1] 94

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$name
[1] "hg38 annotations"

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$description
NULL

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$uri
[1] "https://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/"

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$releaseVersion
[1] "GRCh38.p13"

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$releaseUrl
NULL

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$lastUpdated
[1] "2022-06-30T07:00:00.000+00:00"

[[2]]$taxon$externalDatabase$externalDatabases[[1]]$externalDatabases
list()


[[2]]$taxon$externalDatabase$externalDatabases[[2]]
[[2]]$taxon$externalDatabase$externalDatabases[[2]]$id
[1] 124

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$name
[1] "hg38 RNA-Seq annotations"

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$description
[1] "Annotations provided by NCBI Genome and used by the RNA-Seq pipeline for human data."

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$uri
[1] "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/"

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$releaseVersion
[1] "110"

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$releaseUrl
[1] "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Homo_sapiens/110/"

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$lastUpdated
[1] "2023-01-17T20:27:55.059+00:00"

[[2]]$taxon$externalDatabase$externalDatabases[[2]]$externalDatabases
list()






[[3]]
[[3]]$id
[1] 84610778

[[3]]$nucleotide
[1] 94370636

[[3]]$nucleotideLength
[1] 125741

[[3]]$strand
[1] "+"

[[3]]$bin
[1] 20

[[3]]$chromosome
[1] "16"

[[3]]$taxon
[[3]]$taxon$id
[1] 2

[[3]]$taxon$scientificName
[1] "Mus musculus"

[[3]]$taxon$commonName
[1] "mouse"

[[3]]$taxon$ncbiId
[1] 10090

[[3]]$taxon$externalDatabase
[[3]]$taxon$externalDatabase$id
[1] 81

[[3]]$taxon$externalDatabase$name
[1] "mm10"

[[3]]$taxon$externalDatabase$description
[1] "Genome Reference Consortium GRCm38, which includes approximately 2.6 Gb of sequence, is considered to be \"essentially complete\". The assembly includes chromosomes 1-19, X, Y, M (mitochondrial DNA) and chr*_random (unlocalized) and chrUn_* (unplaced clone contigs)."

[[3]]$taxon$externalDatabase$uri
[1] "https://genome.ucsc.edu/cgi-bin/hgTracks?db=mm10"

[[3]]$taxon$externalDatabase$releaseVersion
NULL

[[3]]$taxon$externalDatabase$releaseUrl
NULL

[[3]]$taxon$externalDatabase$lastUpdated
[1] "2022-06-30T07:00:00.000+00:00"

[[3]]$taxon$externalDatabase$externalDatabases
[[3]]$taxon$externalDatabase$externalDatabases[[1]]
[[3]]$taxon$externalDatabase$externalDatabases[[1]]$id
[1] 97

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$name
[1] "mm10 annotations"

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$description
NULL

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$uri
[1] "https://hgdownload.cse.ucsc.edu/goldenpath/mm10/database/"

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$releaseVersion
NULL

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$releaseUrl
NULL

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$lastUpdated
[1] "2022-06-30T07:00:00.000+00:00"

[[3]]$taxon$externalDatabase$externalDatabases[[1]]$externalDatabases
list()


[[3]]$taxon$externalDatabase$externalDatabases[[2]]
[[3]]$taxon$externalDatabase$externalDatabases[[2]]$id
[1] 115

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$name
[1] "mm10 RNA-Seq annotations"

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$description
[1] "Annotations provided by NCBI Genome and used by the RNA-Seq pipeline for mouse data."

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$uri
[1] "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/"

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$releaseVersion
[1] "109"

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$releaseUrl
[1] "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Mus_musculus/109/"

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$lastUpdated
[1] "2023-01-17T20:30:25.219+00:00"

[[3]]$taxon$externalDatabase$externalDatabases[[2]]$externalDatabases
list()






attr(,"call")
[1] "https://gemma.msl.ubc.ca/rest/v2/genes/DYRK1A/locations"

9.2 File outputs

Sometimes, you may wish to save results to a file for future inspection. You can do this simply by providing a filename to the file parameter. The extension for this file will be one of three options:

  1. .json, if you requested results with raw=TRUE
  2. .csv if the results have no nested data tables
  3. .rds otherwise

You can also specify whether or not the new fetched results are allowed to overwrite an existing file by specifying the overwrite = TRUE parameter.

9.3 Memoise data

To speed up results, you can remember past results so future queries can proceed virtually instantly. This is enabled through the memoise package. To enable memoisation, simply set memoised = TRUE in the function call whenever you want to refer to the cache, both to save data for future calls and use the saved data for repeated calls.

By default this will create a cache in your local filesystem.

You can see the path to your cache by using rappdirs::user_cache_dir(appname = "gemmaR"). If you want to use a different directory, use options(gemma.cache = 'path').

If you’re done with your fetching and want to ensure no space is being used for cached results, or if you just want to ensure you’re getting up-to-date data from Gemma, you can clear the cache using forget_gemma_memoised.

9.4 Changing defaults

We’ve seen how to change raw = TRUE, overwrite = TRUE and memoised = TRUE in individual function calls. It’s possible that you want to always use the functions these ways without specifying the option every time. You can do this by simply changing the default, which is visible in the function definition. See below for examples.

options(gemma.memoised = TRUE) # always refer to cache
options(gemma.overwrite = TRUE) # always overwrite when saving files
options(gemma.raw = TRUE) # always receive results as-is from Gemma

10 Session info

sessionInfo()
R version 4.3.0 RC (2023-04-13 r84269)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] viridis_0.6.2               viridisLite_0.4.1          
 [3] pheatmap_1.0.12             SummarizedExperiment_1.30.0
 [5] Biobase_2.60.0              GenomicRanges_1.52.0       
 [7] GenomeInfoDb_1.36.0         IRanges_2.34.0             
 [9] S4Vectors_0.38.0            BiocGenerics_0.46.0        
[11] MatrixGenerics_1.12.0       matrixStats_0.63.0         
[13] ggrepel_0.9.3               ggplot2_3.4.2              
[15] dplyr_1.1.2                 gemma.R_1.2.0              
[17] BiocStyle_2.28.0           

loaded via a namespace (and not attached):
 [1] gtable_0.3.3            xfun_0.39               bslib_0.4.2            
 [4] lattice_0.21-8          vctrs_0.6.2             tools_4.3.0            
 [7] bitops_1.0-7            generics_0.1.3          curl_5.0.0             
[10] tibble_3.2.1            fansi_1.0.4             highr_0.10             
[13] pkgconfig_2.0.3         Matrix_1.5-4            data.table_1.14.8      
[16] RColorBrewer_1.1-3      lifecycle_1.0.3         GenomeInfoDbData_1.2.10
[19] farver_2.1.1            stringr_1.5.0           compiler_4.3.0         
[22] munsell_0.5.0           htmltools_0.5.5         sass_0.4.5             
[25] RCurl_1.98-1.12         yaml_2.3.7              pillar_1.9.0           
[28] jquerylib_0.1.4         cachem_1.0.7            DelayedArray_0.26.0    
[31] magick_2.7.4            tidyselect_1.2.0        digest_0.6.31          
[34] stringi_1.7.12          purrr_1.0.1             bookdown_0.33          
[37] labeling_0.4.2          fastmap_1.1.1           grid_4.3.0             
[40] colorspace_2.1-0        cli_3.6.1               magrittr_2.0.3         
[43] utf8_1.2.3              withr_2.5.0             scales_1.2.1           
[46] rappdirs_0.3.3          bit64_4.0.5             timechange_0.2.0       
[49] lubridate_1.9.2         rmarkdown_2.21          XVector_0.40.0         
[52] httr_1.4.5              bit_4.0.5               gridExtra_2.3          
[55] memoise_2.0.1           evaluate_0.20           knitr_1.42             
[58] rlang_1.1.0             Rcpp_1.0.10             glue_1.6.2             
[61] BiocManager_1.30.20     jsonlite_1.8.4          R6_2.5.1               
[64] zlibbioc_1.46.0