We make use of the most of the functions described above to show how to perform inference with various algorithms; the reader should read first those sections of the vignette to have an explanation of how those functions work. The aCML dataset is used as a test-case for all algorithms, regardless it should be precessed by algorithms to infer ensemble-level progression models.

To replicate the plots of the original paper were the aCML dataset was first analyzed with CAPRI, we can change the colors assigned to each type of event with the function change.color.

dataset = change.color(aCML, 'Ins/Del', 'dodgerblue4')
dataset = change.color(dataset, 'Missense point', '#7FC97F')
as.colors(dataset)
##          Ins/Del   Missense point Nonsense Ins/Del   Nonsense point 
##    "dodgerblue4"        "#7FC97F"        "#FDC086"        "#fab3d8"

0.0.0.1 Data consolidation.

All TRONCO algorithms require an input dataset were events have non-zero/non-one probability, and are all distinguishable. The tool provides a function to return lists of events which do not satisfy these constraint.

consolidate.data(dataset)
## $indistinguishable
## list()
## 
## $zeroes
## list()
## 
## $ones
## list()

The aCML data has none of the above issues (the call returns empty lists); if this were not the case data manipulation functions can be used to edit a TRONCO object.

0.1 CAPRI

In what follows, we show CAPRI’s functioning by replicating the aCML case study presented in CAPRI’s original paper. Regardless from which types of mutations we include, we select only the genes mutated at least in the 5% of the patients – thus we first use as.alterations to have gene-level frequencies, and then we apply there a frequency filter (R’s output is omitted).

alterations = events.selection(as.alterations(aCML), filter.freq = .05)
## *** Aggregating events of type(s) { Ins/Del, Missense point, Nonsense Ins/Del, Nonsense point }
## in a unique event with label " Alteration ".
## Dropping event types Ins/Del, Missense point, Nonsense Ins/Del, Nonsense point for 23 genes.
## .......................
## *** Binding events for 2 datasets.
## *** Events selection: #events =  23 , #types =  1 Filters freq|in|out = { TRUE ,  FALSE ,  FALSE }
## Minimum event frequency:  0.05  ( 3  alterations out of  64  samples).
## .......................
## Selected  7  events.
## 
## Selected  7  events, returning.

To proceed further with the example we create the dataset to be used for the inference of the model. From the original dataset we select all the genes whose mutations are occurring at least the 5% of the times, and we get that by the alterations profiles; also we force inclusion of all the events for the genes involved in an hypothesis (those included in variable gene.hypotheses, this list is based on the support found in the literature of potential aCML patterns).

gene.hypotheses = c('KRAS', 'NRAS', 'IDH1', 'IDH2', 'TET2', 'SF3B1', 'ASXL1')
aCML.clean = events.selection(aCML,
    filter.in.names=c(as.genes(alterations), gene.hypotheses))
## *** Events selection: #events =  31 , #types =  4 Filters freq|in|out = { FALSE ,  TRUE ,  FALSE }
## [filter.in] Genes hold:  TET2, EZH2, CBL, ASXL1, SETBP1  ...  [ 10 / 14  found].
## Selected  17  events, returning.
aCML.clean = annotate.description(aCML.clean, 
    'CAPRI - Bionformatics aCML data (selected events)')

We show a new oncoprint of this latest dataset where we annotate the genes in gene.hypotheses in order to identify them. The sample names are also shown.

oncoprint(aCML.clean, gene.annot = list(priors = gene.hypotheses), sample.id = TRUE)
## *** Oncoprint for "CAPRI - Bionformatics aCML data (selected events)"
## with attributes: stage = FALSE, hits = TRUE
## Sorting samples ordering to enhance exclusivity patterns.
## Annotating genes with RColorBrewer color palette Set1 .
## Setting automatic row font (exponential scaling): 10.7
## Setting automatic samples font half of row font: 5.3
Data selected for aCML reconstruction annotated with the events which are part of a pattern that we will input to CAPRI.

Figure 1: Data selected for aCML reconstruction annotated with the events which are part of a pattern that we will input to CAPRI

0.1.1 Testable hypotheses via logical formulas (i.e., patterns)

CAPRI is the only algorithm in TRONCO that supports hypotheses-testing of causal structures expressed as logical formulas with AND, OR and XOR operators. An example invented formula could be

(APC:Mutation XOR APC:Deletion) OR CTNNB1:Mutation

where APC mutations and deletions are in disjunctive relation with CTNNB1 mutations; this is done to test if those events could confer equivalent fitness in terms of ensemble-level progression – see the original CAPRI paper and the PiCnIc pipeline for detailed explanations.

Every formula is transformed into a CAPRI pattern. For every hypothesis it is possible to specify against which possible target event it should be tested, e.g., one might test the above formula against PIK3CA mutations, but not ATM ones. If this is not done, a pattern is tested against all other events in the dataset but those which constitute itself. A pattern tested against one other event is called an hypothesis.

0.1.1.1 Adding custom hypotheses.

We add the hypotheses that are described in CAPRI’s manuscript; we start with hard exclusivity (XOR) for NRAS/KRAS mutation,

NRAS:Missense point XOR KRAS:Missense point

tested against all the events in the dataset (default pattern.effect = *)

aCML.hypo = hypothesis.add(aCML.clean, 'NRAS xor KRAS', XOR('NRAS', 'KRAS'))

When a pattern is included, a new column in the dataset is created – whose signature is given by the evaluation of the formula constituting the pattern. We call this operation lifting of a pattern, and this shall create not inconsistency in the data – i.e., it shall not duplicate any of the other columns. TRONCO check this; for instance when we try to include a soft exclusivity (OR) pattern for the above genes we get an error (not shown).

aCML.hypo = hypothesis.add(aCML.hypo, 'NRAS or KRAS',  OR('NRAS', 'KRAS'))

Notice that TRONCO functions can be used to look at their alterations and understand why the OR signature is equivalent to the XOR one – this happens as no samples harbour both mutations.

oncoprint(events.selection(aCML.hypo,
    filter.in.names = c('KRAS', 'NRAS')),
    font.row = 8,
    ann.hits = FALSE)
## *** Events selection: #events =  18 , #types =  4 Filters freq|in|out = { FALSE ,  TRUE ,  FALSE }
## [filter.in] Genes hold:  KRAS, NRAS  ...  [ 2 / 2  found].
## Selected  2  events, returning.
## *** Oncoprint for ""
## with attributes: stage = FALSE, hits = FALSE
## Sorting samples ordering to enhance exclusivity patterns.
Oncoprint output to show the perfect (hard) exclusivity among NRAS/KRAS mutations in aCML

Figure 2: Oncoprint output to show the perfect (hard) exclusivity among NRAS/KRAS mutations in aCML

We repeated the same analysis as before for other hypotheses and for the same reasons, we will include only the hard exclusivity pattern. In this case we add a two-levels pattern

SF3B1:Missense point XOR (ASXL1:Ins/Del XOR ASXL1:Nonsense point)

since ASXL1 is mutated in two different ways, and no samples harbour both mutation types.

aCML.hypo = hypothesis.add(aCML.hypo, 'SF3B1 xor ASXL1', XOR('SF3B1', XOR('ASXL1')),
    '*')

Finally, we now do the same for genes TET2 and IDH2. In this case 3 events for the gene TET2 are present, that is Ins/Del, Missense point and Nonsense point. For this reason, since we are not specifying any subset of such events to be considered, all TET2 alterations are used. Since the events present a perfect hard exclusivity, their patters will be included as a XOR.

as.events(aCML.hypo, genes = 'TET2') 
##         type             event 
## gene 4  "Ins/Del"        "TET2"
## gene 32 "Missense point" "TET2"
## gene 88 "Nonsense point" "TET2"
aCML.hypo = hypothesis.add(aCML.hypo,
    'TET2 xor IDH2',
    XOR('TET2', 'IDH2'),
    '*')
aCML.hypo = hypothesis.add(aCML.hypo,
    'TET2 or IDH2',
    OR('TET2', 'IDH2'),
    '*')

Which is the following pattern

(TET2:Ins/Del) XOR (TET2:Missense point) XOR (TET2:Nonsense point) XOR (IDH2:Missense point)

which we can visualize via an oncoprint.

oncoprint(events.selection(aCML.hypo,
    filter.in.names = c('TET2', 'IDH2')),
    font.row = 8,
    ann.hits = FALSE)
## *** Events selection: #events =  21 , #types =  4 Filters freq|in|out = { FALSE ,  TRUE ,  FALSE }
## [filter.in] Genes hold:  TET2, IDH2  ...  [ 2 / 2  found].
## Selected  4  events, returning.
## *** Oncoprint for ""
## with attributes: stage = FALSE, hits = FALSE
## Sorting samples ordering to enhance exclusivity patterns.