Blog

Why predicting outcomes has become the hardest part of oncology trial design

Unlearn

March 17, 2026

Oncology drug development has always involved making decisions with incomplete information. What has changed over the past decade is not just the scale of that uncertainty but the stakes attached to it, driven by two converging forces: patient populations are fragmenting into ever-narrower biomarker-defined subgroups, and standards of care are evolving faster than trials can read out.

A therapy may be evaluated in patients with a specific mutation (for example, BRAF V600E or KRAS G12C), a PD-L1 expression threshold, and/or a particular treatment history and line of therapy. At the same time, standards of care evolve rapidly as new molecularly targeted therapies and modalities, immunotherapies, and antibody-drug conjugates enter clinical practice. By the time a Phase III trial reads out, the standard of care may have already shifted.

Together, these forces put development teams in a difficult situation. Expected outcomes shape nearly every aspect of trial design (i.e., sample size, endpoint selection, eligibility criteria). Yet the evidence available to estimate those outcomes was built for broader populations plus different and fragmented standards of care. Get the assumptions wrong, and the consequences are severe: an underpowered trial, an unnecessary protocol amendment, or years of follow-up spent on a study that was never going to succeed. Every one of those outcomes traces back to a decision made on faulty assumptions. And if outcome prediction is where the decision goes wrong, it follows that improving outcome prediction is where the most leverage lies.

What you really want is the granularity of patient-level data to evaluate specific design choices, combined with the recency of population-level trial findings, even for brand new treatments. Those two sources exist, but they were never designed to work together. Published clinical trial reports provide reliable outcome estimates (OS curves, PFS medians, event rates) but only as population averages for historical cohorts. Real-world datasets provide patient-level granularity, but they are observational, often confounded, and rarely aligned with the exact population and standard-of-care context a new trial will actually enroll.

Reconciling the two sources is slow, expensive, and increasingly difficult to scale as cohorts narrow…and no off-the-shelf solution exists to do it reliably.

The data problem in oncology trial design

For many development teams, pulling together the data needed to predict outcomes and support a single trial design can take months of work across clinical, statistical, and data science groups. It’s an approach that doesn’t scale when teams want to explore many possible cohorts, comparators, and endpoints.

Accessing real-world data in particular comes at a high cost. Acquiring datasets that reflect the relevant patient population and standard-of-care context can cost hundreds of thousands of dollars (or more) and in many cases those data are not commercially available at all. Even when the data can be obtained, cleaning, harmonizing, and analyzing them is time-consuming work that must be repeated from scratch every time a team wants to evaluate a new cohort, comparator, or endpoint assumption.

Why the margin for error is shrinking

A Phase III oncology trial may require hundreds of patients, dozens of sites, and years of follow-up, with total budgets often reaching tens of millions of dollars or more.

Regulatory trends are adding further pressure: sponsors are increasingly expected to have confirmatory trials already enrolling at the time accelerated approval is granted, which means committing to expensive, high-stakes designs earlier and with less time to stress-test assumptions.

As oncology populations fragment and therapies evolve, it is becoming clear that clinical development needs a new set of tools. Uncertainty, cost, feasibility, and regulatory pressure are all converging on the same bottleneck: the ability to predict outcomes reliably for a specific trial population before committing to a design. Relying on manual evidence synthesis alone is no longer enough, which is why many teams are beginning to treat outcome prediction itself as a modeling problem to be solved in a more systematic way.

In Part 2 of this series, we'll walk through what it looks like to treat outcome prediction as a modeling problem and how bridging patient-level and population-level data opens up new possibilities for trial design.

Blog

Our perspectives

View all

Creating Patients’ Digital Twins with Neural Boltzmann Machines for Clinical Time series
Date
January 2, 2024
Introducing Unlearn's new Digital Twin Generation Architecture
Date
May 30, 2023
Introducing Neural Boltzmann Machines
Date
March 31, 2022

Blog

Why predicting outcomes has become the hardest part of oncology trial design

March 17, 2026

The data problem in oncology trial design

Why the margin for error is shrinking

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Blog

Our perspectives