MetaGxPancreas: A Package for Pancreatic Cancer Gene Expression Analysis

Installing the Package

The MetaGxPancreas package is a compendium of Pancreatic Cancer datasets. The package is publicly available and can be installed from Bioconductor into R version 3.6.0 or higher. Currently, the phenoData for the datasets is overall survival status and overall survival time. This survival information is available for 11 of the 15 datasets.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MetaGxPancreas")

Loading Datasets

First we load the MetaGxPancreas package into the workspace.

library(MetaGxPancreas)

## Loading required package: SummarizedExperiment

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: GenomicRanges

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
##     tapply, union, unique, unsplit, which.max, which.min

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:utils':
## 
##     findMatches

## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname

## Loading required package: IRanges

## Loading required package: GenomeInfoDb

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

## Loading required package: ExperimentHub

## Loading required package: AnnotationHub

## Loading required package: BiocFileCache

## Loading required package: dbplyr

## 
## Attaching package: 'AnnotationHub'

## The following object is masked from 'package:Biobase':
## 
##     cache

pancreasData <- loadPancreasDatasets()

## Filtered out duplicated samples: ICGC_0400, ICGC_0402, GSM388116, GSM388118, GSM388120, GSM388145, GSM299238, GSM299239, GSM299240

duplicates <- pancreasData$duplicates
SEs <- pancreasData$SEs

This will load 15 expression datasets. Users can modify the parameters of the function to restrict datasets that do not meet certain criteria for loading. Some example parameters are shown below:

removeDuplicates: remove patients with a Spearman correlation greater than or equal to 0.98 with other patient expression profiles (default TRUE)
quantileCutoff: A numeric between 0 and 1 specifying to remove genes with standard deviation below the required quantile (default 0)
rescale: apply centering and scaling to the expression sets (default FALSE)
minNumberGenes: an integer specifying to remove expression sets with less genes than this number (default 0)
minSampleSize: an integer specifying the minimum number of patients required in an SE (default 0)
minNumberEvents: an integer specifying how man survival events must be in the dataset to keep the dataset (default 0)
removeSeqSubset currently only removes the ICGSSEQ dataset as it contains the same patients as the ICGS microarray dataset (defeault TRUE, currently just ICGSSEQ)
keepCommonOnly remove probes not common to all datasets (default FALSE)
imputeMissing impute missing expression value via knn

Obtaining Sample Counts in Datasets

To obtain the number of samples per dataset, run the following:

numSamples <- vapply(SEs, function(SE) length(colnames(SE)), FUN.VALUE=numeric(1))

sampleNumberByDataset <- data.frame(numSamples=numSamples,
                                    row.names=names(SEs))

totalNumSamples <- sum(sampleNumberByDataset$numSamples)
sampleNumberByDataset <- rbind(sampleNumberByDataset, totalNumSamples)
rownames(sampleNumberByDataset)[nrow(sampleNumberByDataset)] <- 'Total'

knitr::kable(sampleNumberByDataset)

	X0
Total	0

SessionInfo

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MetaGxPancreas_1.26.0       ExperimentHub_2.14.0       
##  [3] AnnotationHub_3.14.0        BiocFileCache_2.14.0       
##  [5] dbplyr_2.5.0                SummarizedExperiment_1.36.0
##  [7] Biobase_2.66.0              GenomicRanges_1.58.0       
##  [9] GenomeInfoDb_1.42.0         IRanges_2.40.0             
## [11] S4Vectors_0.44.0            BiocGenerics_0.52.0        
## [13] MatrixGenerics_1.18.0       matrixStats_1.4.1          
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.46.0         impute_1.80.0           xfun_0.48              
##  [4] lattice_0.22-6          vctrs_0.6.5             tools_4.4.1            
##  [7] generics_0.1.3          curl_5.2.3              tibble_3.2.1           
## [10] fansi_1.0.6             AnnotationDbi_1.68.0    RSQLite_2.3.7          
## [13] blob_1.2.4              pkgconfig_2.0.3         Matrix_1.7-1           
## [16] lifecycle_1.0.4         GenomeInfoDbData_1.2.13 compiler_4.4.1         
## [19] Biostrings_2.74.0       yaml_2.3.10             pillar_1.9.0           
## [22] crayon_1.5.3            DelayedArray_0.32.0     cachem_1.1.0           
## [25] abind_1.4-8             mime_0.12               tidyselect_1.2.1       
## [28] dplyr_1.1.4             purrr_1.0.2             BiocVersion_3.20.0     
## [31] fastmap_1.2.0           grid_4.4.1              cli_3.6.3              
## [34] SparseArray_1.6.0       magrittr_2.0.3          S4Arrays_1.6.0         
## [37] utf8_1.2.4              withr_3.0.2             filelock_1.0.3         
## [40] UCSC.utils_1.2.0        rappdirs_0.3.3          bit64_4.5.2            
## [43] XVector_0.46.0          httr_1.4.7              bit_4.5.0              
## [46] png_0.1-8               memoise_2.0.1           evaluate_1.0.1         
## [49] knitr_1.48              rlang_1.1.4             glue_1.8.0             
## [52] DBI_1.2.3               BiocManager_1.30.25     jsonlite_1.8.9         
## [55] R6_2.5.1                zlibbioc_1.52.0