The MetaGxOvarian package is a compendium of Ovarian Cancer datasets. The package is publicly available and can be installed from Bioconductor into R version 3.6.0 or higher.
To install the MetaGxOvarian package from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MetaGxOvarian")
First we load the MetaGxOvarian package into the workspace.
To load the packages into R, please use the following commands:
library(MetaGxOvarian)
esets <- MetaGxOvarian::loadOvarianEsets()[[1]]
This will load 26 expression datasets. Users can modify the parameters of the function to restrict datasets that do not meet certain criteria for loading. Some example parameters are shown below:
keepCommonOnly
: Retain only genes that are common across all platforms loaded (default = FALSE)minSampleSize
: Retain studies with a minimum sample size (default = 0)minNumberGenes
: Retain studies with a minimum number of genes (default = 0)minNUmberEvents
: Retain studies with a minimum number of survival events (default = 0)removeDuplicates
: Remove duplicate samples (default = TRUE)numSamples <- vapply(seq_along(esets), FUN=function(i, esets) {
length(sampleNames(esets[[i]]))
}, numeric(1), esets=esets)
SampleNumberSummaryAll <- data.frame(NumberOfSamples = numSamples,
row.names = names(esets))
total <- sum(SampleNumberSummaryAll[,"NumberOfSamples"])
SampleNumberSummaryAll <- rbind(SampleNumberSummaryAll, total)
rownames(SampleNumberSummaryAll)[nrow(SampleNumberSummaryAll)] <- "Total"
xtable(SampleNumberSummaryAll, digits = 2)
## % latex table generated in R 4.4.0 by xtable 1.8-4 package
## % Thu May 2 10:39:38 2024
## \begin{table}[ht]
## \centering
## \begin{tabular}{rr}
## \hline
## & NumberOfSamples \\
## \hline
## E.MTAB.386 & 129.00 \\
## GSE2109 & 202.00 \\
## GSE6008 & 101.00 \\
## GSE6822 & 62.00 \\
## GSE8842 & 83.00 \\
## GSE9891 & 276.00 \\
## GSE12418 & 54.00 \\
## GSE12470 & 49.00 \\
## GSE13876 & 157.00 \\
## GSE14764 & 79.00 \\
## GSE17260 & 110.00 \\
## GSE18520 & 59.00 \\
## GSE20565 & 135.00 \\
## GSE26193 & 14.00 \\
## GSE26712 & 191.00 \\
## GSE30009 & 103.00 \\
## GSE30161 & 58.00 \\
## GSE32062 & 257.00 \\
## GSE32063 & 40.00 \\
## GSE44104 & 47.00 \\
## GSE49997 & 204.00 \\
## GSE51088 & 172.00 \\
## PMID15897565 & 63.00 \\
## PMID17290060 & 117.00 \\
## PMID19318476 & 42.00 \\
## TCGAOVARIAN & 536.00 \\
## Total & 3340.00 \\
## \hline
## \end{tabular}
## \end{table}
We can also obtain a summary of the phenotype data (pData) for each expression dataset. Here, we assess the proportion of samples in every datasets that contain a specific pData variable.
pDataID <- c("sample_type", "histological_type", "primarysite", "summarygrade",
"summarystage", "tumorstage", "grade",
"age_at_initial_pathologic_diagnosis", "pltx", "tax",
"neo", "days_to_tumor_recurrence", "recurrence_status",
"days_to_death", "vital_status")
pDataPercentSummaryTable <- NULL
pDataSummaryNumbersTable <- NULL
pDataSummaryNumbersList = lapply(esets, function(x)
vapply(pDataID, function(y) sum(!is.na(pData(x)[,y])), numeric(1)))
pDataPercentSummaryList = lapply(esets, function(x)
vapply(pDataID, function(y)
sum(!is.na(pData(x)[,y]))/nrow(pData(x)), numeric(1))*100)
pDataSummaryNumbersTable = sapply(pDataSummaryNumbersList, function(x) x)
pDataPercentSummaryTable = sapply(pDataPercentSummaryList, function(x) x)
rownames(pDataSummaryNumbersTable) <- pDataID
rownames(pDataPercentSummaryTable) <- pDataID
colnames(pDataSummaryNumbersTable) <- names(esets)
colnames(pDataPercentSummaryTable) <- names(esets)
pDataSummaryNumbersTable <- rbind(pDataSummaryNumbersTable, total)
rownames(pDataSummaryNumbersTable)[nrow(pDataSummaryNumbersTable)] <- "Total"
# Generate a heatmap representation of the pData
pDataPercentSummaryTable<-t(pDataPercentSummaryTable)
pDataPercentSummaryTable<-cbind(Name=(rownames(pDataPercentSummaryTable))
,pDataPercentSummaryTable)
nba<-pDataPercentSummaryTable
gradient_colors = c("#ffffff","#ffffd9","#edf8b1","#c7e9b4","#7fcdbb",
"#41b6c4","#1d91c0","#225ea8","#253494","#081d58")
library(lattice)
nbamat<-as.matrix(nba)
rownames(nbamat)<-nbamat[,1]
nbamat<-nbamat[,-1]
Interval<-as.numeric(c(10,20,30,40,50,60,70,80,90,100))
levelplot(nbamat,col.regions=gradient_colors,
main="Available Clinical Annotation",
scales=list(x=list(rot=90, cex=0.5),
y= list(cex=0.5),key=list(cex=0.2)),
at=seq(from=0,to=100,length=10),
cex=0.2, ylab="", xlab="", lattice.options=list(),
colorkey=list(at=as.numeric(factor(c(seq(from=0, to=100, by=10)))),
labels=as.character(c( "0","10%","20%","30%", "40%","50%",
"60%", "70%", "80%","90%", "100%"),
cex=0.2,font=1,col="brown",height=1,
width=1.4), col=(gradient_colors)))
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] lattice_0.22-6 MetaGxOvarian_1.25.0
## [3] SummarizedExperiment_1.35.0 GenomicRanges_1.57.0
## [5] GenomeInfoDb_1.41.0 IRanges_2.39.0
## [7] S4Vectors_0.43.0 MatrixGenerics_1.17.0
## [9] matrixStats_1.3.0 ExperimentHub_2.13.0
## [11] AnnotationHub_3.13.0 BiocFileCache_2.13.0
## [13] dbplyr_2.5.0 Biobase_2.65.0
## [15] BiocGenerics_0.51.0 xtable_1.8-4
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.45.0 impute_1.79.0 xfun_0.43
## [4] vctrs_0.6.5 tools_4.4.0 generics_0.1.3
## [7] curl_5.2.1 tibble_3.2.1 fansi_1.0.6
## [10] AnnotationDbi_1.67.0 RSQLite_2.3.6 highr_0.10
## [13] blob_1.2.4 pkgconfig_2.0.3 Matrix_1.7-0
## [16] lifecycle_1.0.4 GenomeInfoDbData_1.2.12 compiler_4.4.0
## [19] Biostrings_2.73.0 yaml_2.3.8 pillar_1.9.0
## [22] crayon_1.5.2 cachem_1.0.8 DelayedArray_0.31.0
## [25] abind_1.4-5 mime_0.12 tidyselect_1.2.1
## [28] purrr_1.0.2 dplyr_1.1.4 BiocVersion_3.20.0
## [31] fastmap_1.1.1 grid_4.4.0 SparseArray_1.5.0
## [34] cli_3.6.2 magrittr_2.0.3 S4Arrays_1.5.0
## [37] utf8_1.2.4 withr_3.0.0 filelock_1.0.3
## [40] UCSC.utils_1.1.0 rappdirs_0.3.3 bit64_4.0.5
## [43] XVector_0.45.0 httr_1.4.7 bit_4.0.5
## [46] png_0.1-8 memoise_2.0.1 evaluate_0.23
## [49] knitr_1.46 rlang_1.1.3 glue_1.7.0
## [52] DBI_1.2.2 BiocManager_1.30.22 jsonlite_1.8.8
## [55] R6_2.5.1 zlibbioc_1.51.0