Intra-tumor heterogeneity (ITH) is now thought to be a key factor that results in the therapeutic failures and drug resistance, which have arose increasing attention in cancer research. Here, we present an R package, MesKit, for characterizing cancer genomic ITH and inferring the history of tumor evolutionary. MesKit provides a wide range of analyses including ITH evaluation, enrichment, signature, clone evolution analysis via implementation of well-established computational and statistical methods. The source code and documents are freely available through Github (https://github.com/Niinleslie/MesKit). We also developed a shiny application to provide easier analysis and visualization.
In R console, enter citation("MesKit")
.
_MesKit: A Tool Kit for Dissecting Cancer Evolution of Multi-region Tumor Biopsies through Somatic Alterations (In production)
To analyze with MesKit, you need to provide:
*.maf / *.maf.gz
). RequiredNote: Tumor_Sample_Barcode
should be consistent in all input files, respectively.
Mutation Annotation Format (MAF) files are tab-delimited text files with aggregated mutations information from VCF Files. The input MAF file could be gz compressed, and allowed values of Variant_Classification
column can be found at Mutation Annotation Format Page.
The following fields are required to be contained in the MAF files with MesKit.
Mandatory fields:
Hugo_Symbol
, Chromosome
, Start_Position
, End_Position
, Variant_Classification
, Variant_Type
, Reference_Allele
, Tumor_Seq_Allele2
, Ref_allele_depth
, Alt_allele_depth
, VAF
, Tumor_Sample_Barcode
Note:
Tumor_Sample_Barcode
of each sample should be unique.VAF
(variant allele frequencie) can be on the scale 0-1 or 0-100.Example MAF file
## Hugo_Symbol Chromosome Start_Position End_Position Variant_Classification
## 1 KLHL17 1 899515 899515 Silent
## 2 MFSD6 2 191301342 191301342 Missense_Mutation
## 3 KIAA0319 6 24596837 24596837 Missense_Mutation
## Variant_Type Reference_Allele Tumor_Seq_Allele2 Ref_allele_depth
## 1 SNP C T 85
## 2 SNP G A 53
## 3 SNP C T 6
## Alt_allele_depth VAF Tumor_Sample_Barcode
## 1 1 0.012 V402_P_1
## 2 0 0.000 V750_P_2
## 3 0 0.000 V750_BM_3
Clinical data file contains clinical information about each patient and their tumor samples, and mandatory fields are Tumor_Sample_Barcode
, Tumor_ID
, Patient_ID
, and Tumor_Sample_Label
.
Example clinical data file
## Tumor_Sample_Barcode Tumor_ID Patient_ID Tumor_Sample_Label
## 1 V402_P_1 P V402 P_1
## 2 V402_P_2 P V402 P_2
## 3 V402_P_3 P V402 P_3
## 4 V402_P_4 P V402 P_4
## 5 V402_BM_1 BM V402 BM_1
By default, there are six mandatory fields in input CCF file: Patient_ID
, Tumor_Sample_Barcode
, Chromosome
, Start_Position
, CCF
and CCF_Std
/CCF_CI_High
(required when identifying clonal/subclonal mutations). The Chromosome
field of your MAF file and CCF file should be in the same format (both in number or both start with “chr”). Notably, Reference_Allele
and Tumor_Seq_Allele2
are also required if you want include contains INDELs in the CCF file.
Example CCF file
## Patient_ID Tumor_Sample_Barcode Chromosome Start_Position CCF CCF_Std
## 1 V402 V402_P_1 1 899515 0.031 0.126
## 2 V402 V402_P_1 1 982996 0.031 0.117
## 3 V402 V402_P_1 1 2452742 0.125 0.239
## 4 V402 V402_P_1 1 6203883 0.422 0.750
## 5 V402 V402_P_1 1 11106655 0.094 0.324
The segmentation file is a tab-delimited file with the following columns:
Patient_ID
- patient IDTumor_Sample_Barcode
- tumor sample barcode of samplesChromosome
- chromosome name or IDStart_Position
- genomic start position of segments (1-indexed)End_Position
- genomic end position of segments (1-indexed)SegmentMean/CopyNumber
- segment mean value or absolute integer copy numberMinor_CN
- copy number of minor allele OptionalMajor_CN
- copy number of major allele OptionalTumor_Sample_Label
- the specific label of each tumor sample. OptionalNote: Positions are in base pair units.
Example Segmentation file
## Patient_ID Tumor_Sample_Barcode Chromosome Start_Position End_Position
## 1 V402 V402_P_1 1 1 1650882
## 2 V402 V402_P_1 1 1650883 33159352
## 3 V402 V402_P_1 1 33159353 33610373
## 4 V402 V402_P_1 1 33610374 88509894
## 5 V402 V402_P_1 1 88509895 89462108
## CopyNumber Minor_CN Major_CN Tumor_Sample_Label
## 1 2 1 1 P_1
## 2 1 0 1 P_1
## 3 3 0 3 P_1
## 4 2 1 1 P_1
## 5 2 0 2 P_1
readMaf
function creates Maf/MafList objects by reading MAF files, clinical files and cancer cell fraction (CCF) data (optional but recommended). Parameter refBuild
is used to set reference genome version for Homo sapiens reference ("hg18"
, "hg19"
or "hg38"
). You should set use.indel.ccf = TRUE
when your ccfFile
contains INDELs apart from SNVs.
maf.File <- system.file("extdata/", "CRC_HZ.maf", package = "MesKit")
ccf.File <- system.file("extdata/", "CRC_HZ.ccf.tsv", package = "MesKit")
clin.File <- system.file("extdata", "CRC_HZ.clin.txt", package = "MesKit")
# Maf object with CCF information
maf <- readMaf(mafFile = maf.File,
ccfFile = ccf.File,
clinicalFile = clin.File,
refBuild = "hg19")
In order to explore the genomic alterations during cancer progression with multi-region sequencing approach, we provided classifyMut
function to categorize mutations. The classification is based on shared pattern or clonal status (CCF data is required) of mutations, which can be specified by class
option. Additionally, classByTumor
can be used to reveal the mutational profile within tumors.
# Driver genes of CRC collected from [IntOGen] (https://www.intogen.org/search) (v.2020.2)
driverGene.file <- system.file("extdata/", "IntOGen-DriverGenes_COREAD.tsv", package = "MesKit")
driverGene <- as.character(read.table(driverGene.file)$V1)
mut.class <- classifyMut(maf, class = "SP", patient.id = 'V402')
head(mut.class)
plotMutProfile
function can visualize the mutational profile of tumor samples.