This package provides a unified approach to programming with Bioconductor components to address problems in cancer genomics. Central concerns are:
The NCI Thesaurus project distributes an OBO representation of oncotree. We can use this through the ontoProc (devel branch only) and ontologyPlot packages. Code for visualizing the location of ‘Glioblastoma’ in the context of its ‘siblings’ in the ontology follows.
In conjunction with restfulSE which handles aspects of the interface to BigQuery, this package provides tools for working with the PanCancer atlas project data.
A key feature distinguishing the pancancer-atlas project from TCGA is the availability of data from normal tissue or metastatic or recurrent tumor samples. Codes are used to distinguish the different sources:
## SampleTypeLetterCode SampleType
## 1 TAM Additional Metastatic
## 2 TAP Additional - New Primary
## 3 TR Recurrent Solid Tumor
## 4 TB Primary Blood Derived Cancer - Peripheral Blood
## 5 TM Metastatic
## 6 NT Solid Tissue Normal
## 7 TP Primary solid Tumor
The following code will run if you have a valid setting
for environment variable CGC_BILLING
, to allow
BiocOncoTK::pancan_BQ() to generate a proper BigQueryConnection.
library(BiocOncoTK)
if (nchar(Sys.getenv("CGC_BILLING"))>0) {
pcbq = pancan_BQ() # basic connection
BRCA_mir = restfulSE::pancan_SE(pcbq)
}
The result is
> BRCA_mir
class: SummarizedExperiment
dim: 743 1068
metadata(0):
assays(1): assay
rownames(743): hsa-miR-30d-3p hsa-miR-486-3p ... hsa-miR-525-3p
hsa-miR-892b
rowData names(0):
colnames(1068): TCGA-LD-A7W6 TCGA-BH-A18I ... TCGA-E9-A1N9 TCGA-B6-A0X0
colData names(746): bcr_patient_uuid bcr_patient_barcode ...
bilirubin_upper_limit days_to_last_known_alive
To shift attention to the normal tissue samples provided, use
to find
class: SummarizedExperiment
dim: 743 90
metadata(0):
assays(1): assay
rownames(743): hsa-miR-7641 hsa-miR-135a-5p ... hsa-miR-1323
hsa-miR-520d-5p
rowData names(0):
colnames(90): TCGA-BH-A18P TCGA-BH-A18S ... TCGA-E9-A1N6 TCGA-E9-A1N9
colData names(746): bcr_patient_uuid bcr_patient_barcode ...
bilirubin_upper_limit days_to_last_known_alive
The intersection of the colnames from the two SummarizedExperiments thus formed (patients contributing both solid tumor and matched normal) has length 89.
You need to know what type of sample has been assayed for the tumor type of interest.
Here is how you find the candidates.
bqcon %>% tbl(pancan_longname("rnaseq")) %>% filter(Study=="GBM") %>%
group_by(SampleTypeLetterCode) %>% summarise(n=n())
To get RNA-seq on recurrent GBM samples:
Suppose we want to work with the mRNA, RPPA, 27k/450k merged methylation and miRNA data together. We can invoke pancan_SE again, specifying the appropriate tables and fields.
BRCA_mrna = pancan_SE(pcbq,
assayDataTableName = pancan_longname("rnaseq"),
assayFeatureName = "Entrez",
assayValueFieldName = "normalized_count")
BRCA_rppa = pancan_SE(pcbq,
assayDataTableName = pancan_longname("RPPA"),
assayFeatureName = "Protein",
assayValueFieldName = "Value")
BRCA_meth = pancan_SE(pcbq,
assayDataTableName = pancan_longname("27k")[2],
assayFeatureName = "ID",
assayValueFieldName = "Beta")
After obtaining the clinical data for BRCA with
library(dplyr)
library(magrittr)
clinBRCA = pcbq %>% tbl(pancan_longname("clinical")) %>%
filter(acronym=="BRCA") %>% as.data.frame()
rownames(clinBRCA) = clinBRCA[,2]
clinDF = DataFrame(clinBRCA)
we use
library(MultiAssayExperiment)
brcaMAE = MultiAssayExperiment(
ExperimentList(rnaseq=BRCA_mrna, meth=BRCA_meth, rppa=BRCA_rppa,
mirna=BRCA_mir),colData=clinDF)
to generate brcaMAE
. No assay data are present in
this object, but data are retrieved on request.
> brcaMAE
A MultiAssayExperiment object of 4 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 4:
[1] rnaseq: SummarizedExperiment with 20531 rows and 1097 columns
[2] meth: SummarizedExperiment with 22601 rows and 1067 columns
[3] rppa: SummarizedExperiment with 259 rows and 873 columns
[4] mirna: SummarizedExperiment with 743 rows and 1068 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
It is convenient to check for sample availability for the
different assays using upsetSamples
in MultiAssayExperiment.