Different Clinical Data, Different Purpose


Randomized clinical trials (RCTs) are often inefficient. To solve the problem of trial inefficiency, clinical trial sponsors are turning to external data sources to supplement and increase efficiency of clinical trials. This post describes three different types of external data sources (historical trial data, patient registries, and real-world data or RWD) and FDA guidance for using these types of data. 

Given the strengths and weaknesses of each type of data, this post provides a framework for how they should best be used to assess effectiveness of drugs. Historical trial data and patient registries can supplement the control arms of RCT to test new drugs. RWD supplement studies for real-world evidence (RWE) to evaluate already-approved drugs.

Read further:

The randomized controlled trial (RCT) is the gold standard for assessing whether a new drug is effective. In an RCT, subjects are recruited and randomly assigned to one of two groups: the control group, which receives a placebo or standard-of-care, or the experimental group, which receives the drug. 

Although the RCT is the standard approach to testing new drugs, it is often inefficient. In order to have enough statistical power to show that a drug is effective, the trial must recruit large numbers of subjects for both the control and experimental arms. Recruitment is often challenging for various reasons. There potentially aren't enough subjects who have a particular disease (e.g. rare disease) [1] or are willing to receive a placebo [2].

To make clinical trials more efficient, biopharmaceutical companies are leveraging external data sources to supplement data from RCTs. One way to supplement is to run an externally controlled trial. In an RCT, control and experimental subjects come from the same population (figure 1). However, in an externally controlled trial, experimental subjects come from one population while control subjects come from another population outside of the ongoing trial (figure 2). These control subjects belong to what is commonly called the external control arm.

To populate an external control arm, clinical trial sponsors are looking to data sources containing different kinds of patient information. In this post, we lay out three major types.

Historical clinical trial data

What is it?

Historical clinical trial data are data that are collected from patients in past clinical trials. The data are collected under extremely controlled conditions in well-defined populations and, in most cases, also involves some subjects receiving a placebo.

What information do these data provide? 

Historical clinical trial data provide information about a patient's medical history in an earlier trial. 

For example, a historical dataset for Alzheimer's trial subjects would contain:

-outcomes from cognitive tests (e.g. ADAS-Cog, MMSE)

-clinical measurements (e.g. weight, heart rate)

-laboratory test results (e.g. measurements for glucose and other biomarkers)

Pros and cons

+Given that these data are extremely controlled and well-defined, data quality is high.

-Data are not diverse; thus they do not give the full picture of how effective the treatment is in different populations outside of the clinical trial.

FDA guidance

As described in an earlier blog post, an important issue to consider when using historical trial data for external control arms is potentially introducing bias (i.e., the external control population differs from the treatment population such that the differences distort the estimate for how effective a treatment is). The FDA similarly reiterates the importance of minimizing bias and carefully designing these types of externally controlled trials.

To highlight some major points in FDA's guidance on using external controls (including historical controls): 

In general, the FDA recommends a conservative approach for using an externally controlled trial. To run an externally controlled trial, one should be confident that alternative trial designs can't be used. Additionally, the progression of the disease/condition to be treated needs to be well-documented and predictable.  

The FDA has a set of criteria for when historical controls can be used: 

-when the study endpoint is objective

-when the treatment is assumed to be highly effective

-when the covariates influencing the outcome of the disease are well characterized

-when the control and study groups are similar in all known relevant baseline, treatment (minus the study drug), and observational variables [3]

Patient registry

What is it?

In broad strokes, a patient registry collects information about patients with one of the following: 

-a particular disease

-a condition (or, a risk factor) that predisposes them to the occurrence of a health-related event

-prior exposure to substances known or suspected to cause adverse health effects [4]

Patient registries are technically a type of real-world data, but this post will distinguish them as separate from data directly extracted from a broad-based EHR system.

What information do these data provide?

Patient registries provide numerous kinds of information because their purpose is broad in scope. 

They can play a role in:

-collecting disease information

-advancing research hypotheses

-recruiting patients for clinical trials

-observing population behavior patterns

-monitoring health care and outcomes

-studying best practices in care or treatment [5]

Patient registries are generally unrestricted in their goals, less structured, and less controlled [5]. Because of this diversity, there are many kinds of patient registries with varying quality - they are usually standardized, but each registry may be standardized differently from each other. 

For example, patient registries can be observational studies containing data that are more standardized than electronic medical records (EMRs) or other data from routine care that are not highly standardized. 

Pros and cons

+Higher quality than EMRs from real-world data.

+More diverse than historical trial data -- patient registries are less controlled and standardized than historical trial data, and inclusion criteria are much broader.

-Data are not as rich, and quality is lower than that of historical trial data.

FDA guidance

The FDA considers patient registries as part of real-world data (RWD), which are described in the next section. 

Real-World Data (RWD)

What is it?

According to the FDA, real-world data (RWD) are data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources."

Some examples of RWD include:

-electronic health records (EHRs) and electronic medical records (EMRs)

-medical claims and billing data

-data from product and disease registries

-patient-generated data, including from in-home-use settings

-data from mobile devices (e.g. health or fitness-tracking apps)

RWD are used to generate RWE, clinical evidence about the usage and potential benefits or risks of a medical product. According to the FDA, RWE come from other kinds of trials or studies that are not RCTs. [6]  

What information do these data provide? 

RWD show how an existing drug works in patients outside of an RCT. In an RCT, a drug is tested in a limited, highly controlled patient population, so the results from the RCT may not necessarily reflect how all patients respond to the same drug. Thus, RWD are collected to capture how patients with diverse characteristics (e.g. different ages, genders, race and ethnicities, disease severity, and/or co-morbid conditions) respond to a specific drug post-approval. [7]

Pros and cons

+Diverse data provide a more comprehensive picture of how effective specific treatments are, outside of an RCT.

-Because data represent diverse populations and come from many different sources that are uncontrolled, data quality is low.

FDA guidance

The FDA has primarily used RWD to evaluate drug safety and only in certain cases, to assess effectiveness. Some key issues of using RWD are reliability (e.g. how data are collected and data quality) and relevance of the data.

However, because there has been more interest in using RWD/RWE for evaluating drug effectiveness, the FDA has created a framework for evaluating the potential use of RWE for two purposes: 

-to help support the approval of a new indication for a drug already approved 

-to help support or satisfy drug post-approval study requirements 

These two purposes encompass a variety of uses:

-changes to labeling about drug product effectiveness, including adding or modifying an indication, such as a change in dose, dose regimen, or route of administration

-adding a new population

-adding comparative or safety information [6] 

It is important to emphasize that this framework was created to guide usage of RWD/RWE for evaluating the effectiveness of drugs that have already been approved by the FDA. 

Summary about data sources

To summarize the pros and cons of these data sources, the figure below compares the sources on metrics of data quality and diversity. As shown, historical trial data are high in quality, low in diversity; patient registries are diverse and lower in quality than trial data; and RWD are the most diverse and have the lowest quality.

As mentioned before, clinical trial sponsors are leveraging different data sources to supplement clinical trials. The above figure is intended to highlight an important point: because these data sources have different strengths and weaknesses, they should be used for different purposes. 

To distill what data serve what purpose, we can look at this table: 

To summarize this table:  

-historical trial data and patient registries supplement the control arms of RCT to test new drugs

-RWD supplement studies for RWE to evaluate existing/already-approved drugs

In the next post, we will explore a new kind of external control arm called an intelligent control arm, as well as its use cases and benefits. 


  1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964003/
  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5327530/
  3. https://www.fda.gov/media/71349/download
  4. https://www.ncbi.nlm.nih.gov/books/NBK208643/
  5. https://premier-research.com/registry-and-natural-history-studies-vital-contrasting-roles-in-clinical-research/
  6. https://www.fda.gov/media/120060/download
  7. https://www.clinicalleader.com/doc/using-real-world-data-to-enhance-clinical-trials-0001

Enter your email address to download paper.

Click the link to begin download.
Oops! Something went wrong while submitting the form.

Unlearn.AI named to the 2021 CB Insights AI 100 List of Most Innovative Artificial Intelligence Startups


Welcoming Dr. Taylor to Unlearn.AI’s Board of Directors


Unlearn Appoints AstraZeneca’s Chief Medical Officer Ann E. Taylor, M.D. to Board of Directors

Ann E. Taylor, M.D., Chief Medical Officer at AstraZeneca, has joined the Unlearn Board of Directors.
I’m honored to have such an inspirational and experienced leader like Dr. Taylor join our board.
The AI 100 is CB Insights' annual list of the 100 most promising private AI companies in the world.