Quantifying local malignant adaptation in tissue‐specific evolutionary trajectories by harnessing cancer’s repeatability at the genetic level

Abstract Cancer is a potentially lethal disease, in which patients with nearly identical genetic backgrounds can develop a similar pathology through distinct combinations of genetic alterations. We aimed to reconstruct the evolutionary process underlying tumour initiation, using the combination of convergence and discrepancies observed across 2,742 cancer genomes from nine tumour types. We developed a framework using the repeatability of cancer development to score the local malignant adaptation (LMA) of genetic clones, as their potential to malignantly progress and invade their environment of origin. Using this framework, we found that premalignant skin and colorectal lesions appeared specifically adapted to their local environment, yet insufficiently for full cancerous transformation. We found that metastatic clones were more adapted to the site of origin than to the invaded tissue, suggesting that genetics may be more important for local progression than for the invasion of distant organs. In addition, we used network analyses to investigate evolutionary properties at the system‐level, highlighting that different dynamics of malignant progression can be modelled by such a framework in tumour‐type‐specific fashion. We find that occurrence‐based methods can be used to specifically recapitulate the process of cancer initiation and progression, as well as to evaluate the adaptation of genetic clones to given environments. The repeatability observed in the evolution of most tumour types could therefore be harnessed to better predict the trajectories likely to be taken by tumours and preneoplastic lesions in the future.

| 1063 TOKUTOMI eT al. recurrent alterations observed in different tissue types (Greaves & Maley, 2012). The hypothesis that malignant transformation requires multiple alterations provides an explanation to why cancer incidence increases with age (Armitage & Doll, 1954) and why progression to cancer occurs via intermediate benign stages (Fearon & Vogelstein, 1990;Vogelstein et al., 2013). This is further corroborated by the observation that most solid adult tumours harbour multiple "driver" alterations, that is those likely to functionally impact cell behaviour and push it towards malignancy (Zack et al., 2013). Yet, the dynamics and stochastic nature of malignant transformation are still insufficiently understood: clinicians still lack efficient diagnostic tools to accurately predict if and when benign lesions will progress to cancer (Martinez et al., 2016), which patients at risk will develop tumours and what their genetic characteristics will be (Lässig, Mustonen, & Walczak, 2017).
The heterogeneity observed between tumours of the same type demonstrates that multiple evolutionary trajectories can converge towards malignancies with similar phenotypic characteristics (Yates & Campbell, 2012). Yet, how much impact individual driver alterations have on this adaptation or how they interact is still largely unknown.
In addition, although metastatic cancer still represents a major clinical challenge, the role of the genetic makeup of the primary site in the adaptation to a novel distant site is also unclear. While reconstructing the genetic history of individual patients through sequencing becomes easier (Gerlinger et al., 2012;Nik-Zainal et al., 2012;Yates et al., 2017), the generic process of carcinogenesis underlying each tumour type remains elusive. The availability of public data sets describing the genetic characteristics of multiple specimen of various tumour types however represents an opportunity to decipher oncogenesis as a stochastic process. All tumours characterised so far are outcomes of their type-specific evolutionary process. This is akin to Gould's contingency definition (Gould, 1990), as if "replaying the tape" of somatic evolution: recording all malignant developments starting from the mostly identical genetic backgrounds of different human individuals (Rosenberg et al., 2002).
Here, we investigate these recorded outcomes to infer the evolutionary landscape of each tumour type, mapping the evolutionary trajectories that can lead normal cells to cancerous transformation.
We make the following assumptions (a) cancer is of monoclonal origin, whereby the genetic background of the initial clone is detectable in all subsequent generations; (b) although different evolutionary trajectories can lead to such a clone, they all are similarly capable of invading their organ-specific environment of origin. We thus estimate different evolutionary parameters to investigate the contribution of all drivers within genetic clones and calculate a local malignant adaptation (LMA) score resulting from their combination in nine tumour types. Similar to a fitness definition in evolutionary biology, our score can be understood as a measure of adaptation to a disease-specific evolutionary context, leading to harmful overproliferation and domination of the local environment. Our model highlights differences across tumour types regarding the interactivity between driver alterations and predicts that premalignant skin and colorectal lesions are adapted to their environment, yet not as much as invasive tumours. We find that genetic landscapes of local adaptation do not explain the location of distant metastases, suggesting that adaptation to metastatic sites does not rely on genetics as much as tumorigenesis does. Finally, we suggest that networks can be used to represent stepwise genetic progression, providing useful tools to stochastically reconstruct the contingencies of the oncogenesis process and study its systemic properties.

| TCGA data
We downloaded data for 2,742 samples from The Cancer Genome

| Naevi & melanoma data
The mutational and copy number data for 37 primary melanomas with paired precursor lesions were downloaded from Shain et al. (2015). Naevi, blue naevi and intermediate benign annotations were considered benign lesions, while intermediate malignant, melanoma in situ, melanoma (all stages), desmoplastic melanoma and blue naevus-like melanoma annotations were considered malignant. In the few cases in which multiple malignant lesions were found for a patient, the least advanced was selected. Mutations were considered clonal when the normalised MAF values in the reported sequencing data were >0.8. To detect CNAs, we first calculated the mean and standard deviation in segment mean (i.e., logR ratios for expected chromosomal copies in a segment) for all segments from the reported normal samples. Gains and losses for the four CN drivers retained for the melanoma landscape (CDK4_gain, CDKN2A_loss, MDM2_gain, PTEN_loss) were calculated based on the reported segment mean for the relevant segment of each sample deviated significantly from the normal segment mean distribution, using a one-tailed test with the pnorm R function. It is worth noting that, unlike whole-exome TCGA data, these data are mostly obtained through targeted sequencing of a 293-gene panel and may therefore miss information on given drivers.

| Colorectal adenoma and carcinoma data
We retrieved whole-genome sequencing data from 31 samples from 9 adenomas and 72 samples from 11 carcinomas (Cross et al. 2018). Mutations were called with platypus (Rimmer et al., 2014), copy number data and ploidy were obtained using the cloneHD software (Fischer, Vázquez-García, Illingworth, & Mustonen, 2014). Mutation clonality was estimated using the EstimateClonality package (McGranahan et al., 2015) on each of the 31 samples as for the TCGA data. Mutations were considered clonal in an entire adenoma when they were predicted as clonal in ≥75% of the related samples.

| MET500 data
Genomic data were retrieved from Robinson et al. (2017) and via the MET500 website (https://met500.path.med.umich.edu/). Sample annotation was manually curated to attribute a tumour type to the primary site and metastatic site. All curated annotations are reported in Supporting Information Tables 2 and 3. Sample purity was collected from the Supp. Information of the original publication; sample ploidy was calculated as the mean copy number weighed by number of targeted exons. Mutation clonality was estimated using the EstimateClonality package.

| Clonality and driver alterations
Mutation clonality was estimated thanks to the EstimateClonality package (McGranahan et al., 2015); clonal mutations were defined as those for which the 95% confidence interval of the cancer cell fraction (CCF) included 1. Gain and losses of each gene in each sample were defined relatively, if the copy number deviated by 0.6 or more from the ploidy given by ASCAT in the segment that included the gene of interest. Segments with <10 probes were filtered out. Driver alterations for each tumour type were retrieved from the IntOGen website (Gonzalez-Perez et al., 2013;Rubio-Perez et al., 2015). Due to the lack of an appropriate clonality estimation method, all copy number alterations were considered clonal. Final matrices of clonal mutations per sample are available via github. To limit the number of potential combinations, only the 50 most frequent driver alterations that occurred in >0.5% of a cohort were selected. Due to their proximity on the chromosome, CDKN2A loss and CDKN2B loss were merged into a single event, which avoids later biases when investigating co-occurrence. This left 34 drivers in GBM,47 in KIRC and 50 in all other tumour types.

| Quantification of evolutionary parallelism
We represented each data set as a gene per patient matrix, using 0 to denote the absence of a clonal alteration and 1 for the presence of a specific clonal alteration in a specific patient. When multiple point mutations occur in the same gene in a patient, they are reported with a value of 1 encompassing all mutations. For each tumour type, 10 randomised matrices were generated by randomly reassigning the mutations of each patient. For instance, upon simulation, a patient with 300 clonal mutations out of 17,000 potential genes will still have 300 clonal mutations albeit in different genes than the originally observed alterations. The Jaccard Index between two samples is then computed using the following formula: Where A and B are the number of alterations in each sample, and A∩B the number of overlapping alterations. All pairwise indices are computed for each matrix. Indices from the 10 simulated matrices are then pooled as randomised controls for a given tumour type.

| Genetic parameters of local malignant adaptation
The selective advantage of each driver alteration was defined as the ratio between expectations and observations in a given tumour type, considering mutations and CNAs separately. The expected number of mutations occurring in a gene was calculated by dividing the total number of mutations by the total number of genes in which at least 1 mutation was observed, weighted by each gene's length in base pairs, as given by the median transcript length for the gene in Ensembl (Zerbino et al., 2018). Only clonal mutations were considered. CNA selective advantage was calculated separately for gains and losses, by the ratio of expected occurrences given all events to the actual occurrences of each CNA, giving equal probability to all genes without weighing for length. The selective advantage SA d of each driver d is thus centred around 1 with a 0 lower boundary.
We hypothesised that drivers that tend to occur with few other drivers had more impact on malignant progression, and thus more "self-sufficient," that those occurring with more additional drivers.
Malignant epistatic interactions reflect the fact that two alterations can depend on each other, either positively or negatively, to mediate the potential of a clone to become malignant and invasive in its local environment. We quantified these interactions using the ratio of observed co-occurrences of driver alterations to expectations of observing each alteration in each sample. We relied on hypergeometric probabilities, based on the specific number of alterations per sample and the number of occurrences of each alteration in the cohort. See Appendix S1 for full details and examples.

| Scoring local malignant adaptation: models of parameter combination
As the prevalence and interplay of each parameter are unknown, we designed different models that correspond to different combinations of the three LMA parameters investigated. For simplicity, all models rely on summing the contribution of each driver alteration to the adaptation of each individual sample, given its alteration load and tumour-type-specific context. Parameters in a model are then assigned specific weights, which are optimised so as to minimise the variation of the overall score between samples in each tumour type.
We separated the three parameters into two types of LMA components: intrinsic (selective advantage, self-sufficiency) and interactive (malignant epistatic interactions). In any given tumour type, the intrinsic component is driver-specific, while the interactive component is context-dependent and relies on which other driver alterations are also present in a sample. We designed four models based on the combination of two criteria: (a) whether selective advantage and self-sufficiency are separate or combined; (b) whether the malignant epistatic score of a driver is given by the mean of all its interactions with the other drivers in the sample, or by the product of these interactions. This thus corresponds to four possible designs in total, which are labelled separate_mean, separate_prod, combined_mean, combined_prod (Supporting Information Table S1). The former two models therefore comprise the three parameters weighed individually then summed, while the latter two models only comprise two individually weighted parameters, with the intrinsic component being given for each driver d by SS d multiplied by SA d . Such a multiplicative relationship is more relevant than an additive one, as SA d (range: 0.3-208.5) has higher variability than SS d (range: 0.3-1.8).

| Weight inference
To assign weights to the components of each model, we used 37 possible empirical values ranging from 0.01 to 100, symmetrically mirrored around 1. The complete list is (0.010, 0.011, 0.012, 0.014, 0.017, 0.020, 0.025, 0.033, 0.05, 0.10, 0.11, 0.12, 0.14, 0.17, 0.20, 0.25, 0.33, 0.5, 1, 2, 3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100). For all four models, we performed LMA score calculations using all possible combinations of values for each of the model's parameters. For instance, using the combined_prod model with a 0.2 factor for the intrinsic component and a 3 factor for the interactive one, the LMA score for a sample s with N drivers is given by the following equation: where E ij is the malignant epistatic score for the interaction of drivers i and j. SA i , SS i and E ij depend on the context given by the tumour type of s. We calculated the standard deviation of the LMA scores from the related samples normalised by their median. We used an objective function aiming to minimise the standard deviation in normalised LMA while assessing all weight combinations, so as to match our initial assumption that most tumour-initiating clones in a given environment have a similar score. As different tumour types can involve different development mechanisms, the weights were optimised in tissue-specific fashion. The weight combination yielding the lowest standard deviation in normalised LMA was thus selected in each tumour type.

| Network representation
Network images were produced using the cytoscape software (Shannon et al., 2003).

| Cancer evolution is highly repeatable
In order to calculate the adaptation of a genetic clone to its local environment, we defined landscapes of LMA specific to each of 9 cancer types, based on the occurrence of their most frequent driver alterations. Our LMA score aims at scoring the potential of a genetic clone to develop malignantly in a given environment. To accurately reflect the genetic background of the clones that ultimately adapted to each environment, we only focused on clonal driver alterations (i.e., those present in all cells of a tumour). We investigated the repeatability of cancer evolution in each tumour type using a Jaccard Index based at the gene level, considering all sample-specific sets F I G U R E 1 Repeatability of cancer evolution. (a) Jaccard distances between the clonal genetic makeup of samples in the TCGA (left, vivid colours) and randomised controls (right, light colours) for each of the nine tumour types. (b) Jaccard distances in the TCGA data normalised via division by the 95th percentile of each corresponding randomised data. Horizontal red line highlights a value of 1 (no difference). The percentage of observations exceeding the simulated 95th percentile is reported left of each distribution. Boxes represent the middle quartiles; black horizontal bars represent the median of each distribution; whiskers extend up to 1.5 times the interquartile range (box height) away from the box. Outliers (beyond the whiskers) are not displayed Normalised Jaccard Index of clonal alterations as the genotypes of similarly adapted phenotypes (Bailey, Rodrigue, & Kassen, 2015;Yeaman, Gerstein, Hodgins, & Whitlock, 2018). Results were compared to randomised distributions of mutations that followed the same mutational load per patient (see Methods). As expected given the recurrence of driver mutations, our results highlight a high parallelism within each tumour type (Figure 1a). On average, the similarity between samples from the real distribution of alterations was 1.4-5.1 times superior to the 95th percentile of those observed in our randomised controls ( Figure 1b). The glioblastoma (GBM) and lung squamous cell cancer (LUSC) sets displayed particularly high parallelism, with 96% of the indices being higher than the 95th percentile of the randomised control.

| Genetic landscapes of local malignant adaptation: parameter definition
We identified three types of genetics-based factors that could quantify the adaptation of cells to a determined environment: se- and was defined in our case as the ratio of observed occurrence of each clonal alteration (driver or not) compared to its expected occurrence given its number of clonal observations, weighted by gene size (in base pairs). We compared our selection score to the corrected dN/dS measure obtained by Zapata et al. (2018) in a recent publication. We find that both scores are moderately, yet significantly correlated (p < 0.001, Supporting Information Figure S1). The observed variability can furthermore be explained by the fact that their analysis was based on pan-cancer data while our genetic landscapes are tumour-type specific, and the fact that we focus solely on clonal alterations.
Driver self-sufficiency is however a novel measure, reflecting how many additional drivers are needed on average to induce malignant development. Differences could be observed across tumour types, with for instance a high discrepancy in the number of additional drivers in colorectal adenocarcinomas and a relatively homogeneous distribution in lung squamous cancers (Figure 2b,c and Supporting Information Figure S2).
Both selective advantage and self-sufficiency measures are specific to single driver alterations in a given tumour type. We found they were significantly correlated, yet with a very high variability (R 2 = 0.04, p < 0.001, Supporting Information Figure S2). This indicates that they can provide distinct information on the impact of each alteration and that mutations highly selected for may still require numerous other alterations in order to induce full cancerous transformation.
We defined malignant epistatic interactions as the ratio of co-occurrence between two genes given their respective frequencies and the number of mutations per sample in a cohort, using hypergeometric probabilities (see Methods). In this work, these interactions differ slightly from the traditional concept of epistasis and are entirely focused on cancerous development: the interaction between two genes can favour or hamper malignant transformation on the long term, without immediately impacting selective advantage during somatic development. The most negative interaction was the one found between known antagonists BRAF and NRAS in melanoma (Curtin et al., 2005), while the most positive interactions included those between AURKA gain, APC and TP53 mutations in colorectal cancer (Figure 2d). Across cancers, CNAs tended to frequently cooccur with each other and TP53 mutations (Supporting Information Figure S4), in agreement with the role of TP53 in promoting genome instability, which then accelerates CNA acquisition (Martinez et al., 2018;Sansregret, Vanhaesebroeck, & Swanton, 2018).

| Differences among and within cancer types
We measured the percentage of the LMA scores accountable to the intrinsic component for each sample of each tumour type These observations reflect the extensive heterogeneity recurrently observed both inter-and intra-tumour at the genetic and phenotypic levels, which can be mirrored by our occurrence-based framework to calculate LMA.

| Premalignant lesions are specifically adapted to the local landscape
We analysed two published cohorts of premalignant lesions linked to melanomas and colorectal carcinomas (CRC), to understand whether these precursors differed from fully-formed tumours in our measure of adaptation. We first analysed 31 precursor/melanoma pairs (Shain et al., 2015), including 23 benign naevi. We observed that the LMA score of the melanomas was consistently and significantly higher than the one of their paired benign precursor (p = 0.001 paired Wilcoxon test, p = 0.01 unpaired, Figure 4a).
When we calculated the LMA scores of the 23 naevi and 9 adenomas in other tumour-type-specific landscapes, they appeared more adapted, respectively, to the melanoma and CRC landscapes than to the other landscapes on average (p < 0.001 and p = 0.005, paired Wilcoxon test, Figure 4c,d and Supporting Information Figure   S7). Our model is thus able to detect that premalignant lesions are specifically adapted to their local environment, further suggesting that additional evolutionary steps are still necessary to acquire locally invasive capacity.

| Metastasis does not rely on genetic adaptation to the distant site's landscape
We then applied our methods to metastatic samples, in order to shed light on whether the genetic basis of adaptation to an environment was as relevant for its metastatic colonisation as for local invasion.
We used the MET500 data set of 500 metastatic samples (Robinson et al., 2017)  For the 35 samples with existing landscapes for the primary and metastatic sites, we observed that LMA scores were consistently higher in the primary site than in the metastatic one (p < 0.001, paired Wilcoxon test, Figure 5a and Supporting Information Figure   S8). This suggests that metastatic clones are more genetically adapted to the environment in which they originated than to the one they colonised. As for premalignant lesions, we analysed the LMA scores of each sample in all the other tumour types. LMA in the landscape of the primary site was significantly higher than the average LMA in the other 8 landscapes (p < 0.001, paired Wilcoxon test, Figure 4b, Supporting Information Figure S9), while we did not observe any difference when considering the adaptation in the metastatic site's landscape (Figure 5c and Supporting Information Figure S10). This suggests that these genetic clones were specifically adapted to their environment of origin. However, their genetic makeup did not provide them with the potential to specifically adapt to their metastatic site as well as local primary tumours.

| TCGA landscapes as evolutionary graphs
Our results demonstrate that our LMA model can quantify genetic adaptation to specific somatic evolutionary contexts. Our method can further be combined to a network approach to understand adaptation dynamics as individual driver alterations accrue in a clone We took advantage of this framework to investigate how malignant adaptation to the local environment evolves in the nine investigated TCGA tumour types, by following how genetic clones progress through the graph by acquiring new alterations. We observed differences in LMA dynamics depending on the properties of each tumour type. We see that in many cases, LMA increases linearly on average with each novel driver, as can be expected from our approach based on summing the contextualised contributions to adaptation of each driver in a clone. However, tumour types in which our model predicted a high prevalence of malignant epistatic interactions (GBM, BRCA) display a strong deviation from linearity ( Figure 6b). This is also reflected in how fast the maximum LMA score in the network is reached, with some tumour types displaying a log-like distribution with decreasing improvement with each additional alteration, while the maximum LMA of GBM increases exponentially ( Figure 6c).
Interestingly, the maximum LMA for BLCA, BRCA and COAD is reached early and decreases after, respectively, 10, 12 and 13 alterations, as would be expected under diminishing returns epistasis (Chou, Chiu, Delaney, Segrè, & Marx, 2011). This highlights that our model can produce non-linear relationships between LMA and the number of drivers, depending on the nature of malignant epistatic interactions among driver alterations.
Finally, we investigated if these network representations could represent a basis on which to predict the evolutionary trajectories potentially leading to cancer. In all tumour types, we recorded all trajectories that reached a LMA score superior or equal to the median score observed in the corresponding TCGA cohort. The first node meeting such a criterion in a trajectory was considered final and its offspring nodes were not investigated. These trajectories thus represent all combinations of genes that likely lead to sufficient genetic adaptation for tumorigenesis. The number of drivers in these combinations was superior to the one observed in the actual sample, which was expected given that their LMA score had to be equal or higher than the data set's median (Figure 6d,e). The mean number of required alterations is however strongly correlated to the mean number of clonal drivers observed in each tumour type (R 2 = 0.91, p < 0.001, Figure 6f). This suggests that networks of contingencybased adaptation metrics can thus provide a framework to both represent and study precancerous progression under a novel angle, while recapitulating the specificities of different tumour types. Their use can furthermore identify system-level properties of the evolutionary context that funnels malignant somatic evolution in different organs and environments.

| D ISCUSS I ON
Here, we developed a methodology to investigate the progression of normal cells towards oncogenesis, by way of measuring the adaptation of genetic clones to a given environment. Our method relies on the contingency and repeatability of cancer development, observed when multiple same-type cancers occur with distinct evolutionary trajectories in different human individuals. We built and optimised simple models to quantify LMA based on the presence of recurrent genetic alterations in nine cohorts corresponding to nine tumour types. This approach is similar to fitness landscapes that aim to map the space in which phenotypes evolve and adapt in evolutionary biology. We then applied our method to independent data sets of premalignant and metastatic lesions to analyse the impact of genetics in the adaptation to and the colonisation of different environments.
Our exploratory model aimed at simplicity and still suffers from several limitations. First of all, our model is solely based on genetics and ignores how a cell's context and phenotypic state can dictate its response to specific genetic insults (Sieber, Tomlinson, & Tomlinson, 2005), which can impact tumorigenesis (Morel et al., 2017;Puisieux, Pommier, Morel, & Lavial, 2018) or lead to non-genetic selection (Shaffer et al., 2017). It also ignores the contribution of epigenetic alterations as potential drivers; the inclusion of which will require a more general understanding on the recurrent epigenetic alterations functionally linked to tumour formation (Timp & Feinberg, 2013 (Ortmann et al., 2015). As occurrence-driven definition of epistatic interactions requires large numbers of observations, animal models in which driver combinations can be induced and followed over time could provide valuable controls for pre-identified targets (Rogers et al., 2018). Aside from genetic alterations, our model does not include the interactive adaptation relationship between a preinvasive cell and its environment, the interplay between both being very likely to modify selective pressures as potential tumour-initiating cells develop (Bissell & Radisky, 2001;Rozhok, Salstrom, & DeGregori, 2016;. Finally, our work focuses on a single clone being responsible for initial local invasion, while it is possible that this process can involve multiple clones (Casasent et al., 2018). Such polyclonal invasion would however likely involve a recent common ancestor. Addressing these drawbacks in the future will allow to better stochastically model malignant progression.
Despite these limitations, our model fits the hypothesis that carcinogenesis is likely a stepwise process. Using similar data on per sample genomic alterations from three complementary data sets on primary tumours, premalignant and metastatic lesions, we were able to investigate the genetic determinants of different stages of cancer progression. Our analyses highlighted that premalignant lesions were specifically adapted to their environment, yet insufficiently to promote local malignant invasion. This thus suggests that cancer arises when benign lesions acquire further driver alterations, at least in melanomas and colorectal carcinomas. In addition, the analysis of metastatic lesions suggested that genetics contributed to the formation of the primary tumours but were not a defining factor in the adaptation to the metastatic site. Metastasis to a specific site requires convergent evolution for clones from divergent backgrounds to adapt to the new environment (Cunningham, Brown, Vincent, & Gatenby, 2015). This adaptation may however not depend on novel genetic alterations, potentially relying on cellular plasticity (Varga & Greten, 2017) or epigenetic changes (Roe et al., 2017), or may even involve completely different genetic determinants than ab initio oncogenesis in the same site.
By combining our LMA score to evolutionary approaches, we were furthermore able to create a framework to simulate genomic progression towards a malignant phenotype. This framework could capture the specificities of different tumour-type-specific evolutionary dynamics and suggests potential applications to forecast and control tumour evolution. Such an approach can help predict the most probable (and fastest) evolutionary route(s) linking premalignant lesions to cancer progression, given their genetic makeup. Our method however only scores malignant potential, which may differ from proliferative advantage in the tissue of origin. Interestingly, recent work on the somatic evolution of normal oesophagus suggests that alterations providing a competitive growth advantage in the normal tissue did not necessarily yield malignant phenotypes, as exemplified by the prevalence of NOTCH1 mutations in normal, but not in cancerous tissue (Martincorena et al., 2018). Combined with our findings, this would imply that to accurately simulate (and predict the outcome of) the evolutionary process underlying cancer initiation, the growth advantage and potential for malignant transformation provided by somatic genomic alterations should be assessed and integrated separately.
Although our work focused on clonal mutations and tumour initiation, clonal evolution continues as cancer develops, with multiple clones co-existing and evolving in parallel (Martinez et al., 2013). This results in genetic heterogeneity likely to foster resistance and hamper accurate diagnosis. Multi-region sampling of individual tumours however also highlighted the repeatability of evolutionary patterns after initiation (Gerlinger et al., 2014), which can be used to predict the evolution of tumours once they become invasive (Caravagna et al., 2018). There is therefore a critical need for large, centralised public data sets, with accurate clinical annotation and the alterations recurrently selected by each treatment type, to help reconstruct treatmentspecific fitness landscapes. Adapting our methodology to represent these landscapes as graphs could serve as a basis to predict the nodes (i.e., clones) most likely to be generated by undirected genetic drift of the current population, then selected by different therapeutic pressures. By identifying genotypes associated to high fitness in a treatment-specific landscape, and low fitness in another, this could in turn provide opportunities to channel cancer evolution towards extinction, via sequential treatment regimens inducing evolutionary bottlenecks (Dhawan et al., 2017;Goldie, Coldman, & Gudauskas, 1982;Nichol et al., 2019). As the data available on cancer genomics rapidly increase in quality and quantity, the accuracy of evolutionary frameworks aiming at forecasting cancer evolution will inevitably improve, in time providing compelling tools to guide clinical decisions and optimise therapy.

ACK N OWLED G EM ENTS
The results published here are in part based upon data gener-

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
PM designed the experiments. NT and PM analysed the data. CML and SS organised the collaboration. CML, AP, SS and PM supervised the work. PM wrote the article. All authors reviewed the article.

DATA ACCE SS I B I LIT Y
The code and data used for this analysis are available on github: https://github.com/pierremartinez/ConFitLand.