1 Introduction

OncoSimulR is an individual- or clone-based forward-time genetic simulator for biallelic markers (wildtype vs. mutated) in asexually reproducing populations without spatial structure (perfect mixing). Its design emphasizes flexible specification of fitness and mutator effects.

OncoSimulR was originally developed to simulate tumor progression with emphasis on allowing users to set restrictions in the accumulation of mutations as specified, for example, by Oncogenetic Trees (OT: Desper et al., 1999; Szabo & Boucher, 2008) or Conjunctive Bayesian Networks (CBN: Beerenwinkel, Eriksson, et al., 2007; Gerstung et al., 2009; Gerstung, Eriksson, et al., 2011), with the possibility of adding passenger mutations to the simulations and allowing for several types of sampling.

Since then, OncoSimulR has been vastly extended to allow you to specify other types of restrictions in the accumulation of genes, such as the XOR models of Korsunsky et al. (2014) or the “semimonotone” model of Farahani & Lagergren (2013). Moreover, different fitness effects related to the order in which mutations appear can also be incorporated, involving arbitrary numbers of genes. This is very different from “restrictions in the order of accumulation of mutations”. With order effects, described in a recent cancer paper by Ortmann and collaborators (Ortmann et al., 2015), the effect of having both mutations “A” and “B” differs depending on whether “A” appeared before or after “B” (the actual case involves genes JAK2 and TET2).

More generally, OncoSimulR now also allows you to specify arbitrary epistatic interactions between arbitrary collections of genes and to model, for example, synthetic mortality or synthetic viability (again, involving an arbitrary number of genes, some of which might also depend on other genes, or show order effects with other genes). Moreover, it is possible to specify the above interactions in terms of modules, not genes. This idea is discussed in, for example, Raphael & Vandin (2015) and Gerstung, Eriksson, et al. (2011): the restrictions encoded in, say, CBNs or OT can be considered to apply not to genes, but to modules, where each module is a set of genes (and the intersection between modules is the empty set) that performs a specific biological function. Modules, then, play the role of a “union operation” over the set of genes in a module. In addition, arbitrary numbers of genes without interactions (and with fitness effects coming from any distribution you might want) are also possible.

You can also directly specify the mapping between genotypes and fitness and, thus, you can simulate on fitness landscapes of arbitrary complexity.

It is now (released initially in this repo as the freq-dep-fitness branch on February 2019) also possible to simulate scenarios with frequency-dependent fitness, where the fitness of one or more genotypes depends on the relative or absolute frequencies of other genotypes, as in game theory and adaptive dynamics. This makes it possible to model predation and parasitism, cooperation and mutualism, and commensalism. It also allows to model therapeutic interventions (where fitness changes at specified time points or as a function of the total populations size or as a function of arbitrary user-defined variables); in particular, it is possible to emulate adaptive therapy (Hansen & Read (2020b); Hansen & Read (2020a)).

Simulations can start from arbitrary initial population compositions and it is also possible to simulate multiple species. Thus, simulations that involve both ecological and evolutionary processes are possible.

Mutator/antimutator genes, genes that alter the mutation rate of other genes (Gerrish et al., 2007; Tomlinson et al., 1996), can also be simulated with OncoSimulR and specified with most of the mechanisms above (you can have, for instance, interactions between mutator genes). And, regardless of the presence or not of other mutator/antimutator genes, different genes can have different mutation rates.

Simulations can be stopped as a function of total population size, number of mutated driver genes, or number of time periods. Simulations can also be stopped with a stochastic detection mechanism where the probability of detecting a tumor increases with total population size. Simulations return the number of cells of every genotype/clone at each of the sampling periods and we can take samples from the former with single-cell or whole- tumor resolution, adding noise if we want. If we ask for them, simulations also store and return the genealogical relationships of all clones generated during the simulation.

The models so far implemented are all continuous time models, which are simulated using the BNB algorithm of Mather et al. (2012). The core of the code is implemented in C++, providing for fast execution. To help with simulation studies, code to simulate random graphs of the kind often seen in CBNs, OTs, etc, is also available. Finally, OncoSimulR also allows for the generation of random fitness landscapes and the representation of fitness landscapes and provides statistics of evolutionary predictability.

Funding

Supported by: grant BFU2015-67302-R (MINECO/FEDER, EU) funded by MCIN/AEI/10.13039/501100011033 and by ERDF A way of making Europe to R. Diaz-Uriarte; grant PID2019-111256RB-I00 funded by MCIN/AEI/10.13039/501100011033 to R. Diaz-Uriarte; “Beca de Colaboración” at the Universidad Autónoma de Madrid from Spanish Ministry of Education, 2017-18, to S. Sánchez Carrillo; Comunidad de Madrid’s PEJ16/MED/AI-1709 and PEJ-2019-AI/BMD-13961 to R. Diaz-Uriarte.