April 2, 2026
The decision-making gap in oncology trial design
If outcomes could be predicted for a specific population during the design phase for a trial, many of the hardest decisions in oncology development would become more tractable.
- What outcomes should be expected for the population defined by the protocol?
- How do those expectations change as eligibility criteria shift?
- Which comparator is most appropriate for a given setting?
- What is the likelihood that a trial will meet its endpoint?
These are the decisions that help sponsors determine if a study is feasible and likely to succeed
In Part 1, we described why answering these questions has become increasingly difficult. Patient populations are fragmenting into narrower biomarker-defined subgroups. Standards of care are evolving faster than trials can read out. Together, these two forces create a third problem: the evidence required to make design assumptions is distributed across sources that were not built to be used together.
For sponsors, this is not an abstract challenge. These decisions must still be made, often with incomplete and conflicting evidence.
When data matching breaks down
The BREAKWATER Phase III trial illustrates what this looks like in practice. The standard-of-care arm enrolled metastatic CRC patients with BRAF V600E mutations, a subgroup representing only 8 to 12 percent of the mCRC population. When we searched our data for available real-world and trial datasets of patients who fully matched BREAKWATER’s eligibility criteria, we found five patients. A data-matching approach would not provide a reliable basis for estimating outcomes in this setting. Yet decisions about trial design still need to be made.
This is not an isolated case. As oncology trials target increasingly specific populations, the number of directly comparable patients declines sharply. Real-world data remains an important source of patient-level detail, but assembling usable cohorts can cost millions of dollars and requires significant time to collect, and the resulting data is observational and subject to confounding. Published clinical trials provide reliable estimates of outcomes, but only for the populations that were studied. Applying those results to a new trial with different eligibility criteria requires assumptions that are difficult to validate.
The result is a gap between the existing evidence and the decisions that must be made. Study teams are asked to commit to trial designs, sample sizes, and comparators without a clear, data-driven estimate of the expected outcomes in the population they intend to study.
The solution is to treat outcome prediction as a modeling problem rather than a data-matching problem.
From data matching to modeling
To address this, we developed a pan-cancer foundation model trained on detailed clinical and genomic data from approximately 300,000 tumor biopsies. The model learns the joint distribution of clinical variables, genomic alterations, treatment histories, and outcomes across indications, enabling it to generate patient-level cohorts matching specified eligibility criteria and predict outcomes under a given treatment regimen.
Patient-level data alone is not sufficient for clinical decision-making; predictions must be consistent with what has been observed in randomized trials. We therefore apply a calibration procedure that anchors the model's predictions to population-level results from published clinical trials, while preserving the patient-level relationships learned from the data. We refer to this combined approach as Fusion of Recent Evidence and Subject Histories (FRESH) modeling. Full technical details, including validation against published trial results and out-of-sample predictions across oncology indications, are described in our latest whitepaper.
Trial-calibrated digital twins
The output of this process is a set of trial-calibrated digital twins: simulated individuals whose predicted disease courses under a specified standard-of-care regimen reflect what has been observed in real trials while retaining the heterogeneity of real-world data. This means the model retains the granularity needed to ask patient-level questions about narrow subgroups, but its answers are anchored to gold-standard randomized evidence at the cohort level.

The BREAKWATER result illustrates this directly. We predicted the BREAKWATER control arm’s overall survival curve, matching the observed results at the median and at 6-, 12-, and 18-month time points, without using BREAKWATER data. The foundation model generalized from a broader pool of nearly 800 patients with BRAF V600E mutations across indications and regimens, and calibrated to prior unselected mCRC trials. A perfect historical match was not required. What was required was a model.

Answer hard trial design decisions during the design phase
Once you have calibrated, patient-level predictions, key aspects of trial design can be treated as a series of concrete “what if” questions before committing to a protocol.
In precision trial simulation, teams can define a trial cohort based on inclusion and exclusion criteria, biomarker profile, line of therapy, and standard-of-care regimen, and then ask: what outcomes should we expect? How do event rates change if we tighten from all KRAS mutants to KRAS G12C only, or enrich for higher PD-L1 expression? What happens to sample size requirements if we switch comparators? Instead of requiring months of bespoke evidence synthesis for each question, these become on-demand queries against a calibrated model.
In single-arm studies, which are common in early-phase oncology and in narrow biomarker-defined populations, sponsors often lack a reliable comparator for go/no-go decisions. Trial-calibrated digital twins can provide a rigorous, calibrated benchmark to evaluate single-arm results against expected standard-of-care outcomes. In the near term, this supports internal decision-making by giving development teams a structured way to contextualize efficacy signals before committing to larger, more expensive randomized trials.
For comparative effectiveness, sponsors often need to understand how different standard-of-care regimens perform in a specific patient population to inform portfolio positioning and commercial strategy. The model can compare on-market therapies across clinically defined subgroups, including narrow biomarker-defined cohorts, without requiring a head-to-head trial.
Validated across indications, out of sample
We have validated this approach in non-small cell lung cancer and metastatic colorectal cancer. In NSCLC, predictions calibrated to published clinical trial results reproduce observed overall survival at the population level and across most key subgroups, including PD-L1 expression, histology, and genomic alterations such as STK11, KEAP1, and KRAS. In the POSEIDON trial, predicted survival curves aligned with observed outcomes at the population level and across these subgroups, despite calibration being performed only at the cohort level.
The BREAKWATER result described in the previous section represents a fully out-of-sample prediction across indications. These results demonstrate the methodology's capabilities, not its coverage limits. The foundation model's pre-training data already spans breast, pancreatic, and prostate cancer in addition to NSCLC and mCRC.
Reducing the data burden, not adding to it
These results highlight a limitation of traditional approaches. Data matching depends on finding patients who exactly meet a set of criteria. As those criteria become more specific, the number of usable patients declines, often to the point where estimates are unstable or unavailable. A modeling approach allows information to be shared across related patients, indications, and treatment contexts, enabling predictions in populations where direct matches are limited.
This also has implications for how data is used in development programs. Rather than requiring repeated, bespoke real-world data acquisition efforts for each new question, the model can generate calibrated predictions using existing data. Additional data collection can then be focused on areas of greatest incremental value.
In the next post, we will examine how this approach can be applied in practice to evaluate trial scenarios, compare biomarker strategies, and test assumptions before protocol finalization.
