National Cancer Institute
Post Date: Aug 26, 2019
Levels of evidence refers to the ranking system used by the PDQ editorial boards to indicate the strength of evidence obtained from cited studies. Get detailed information about how to weigh the strength of the evidence obtained in cancer treatment studies in this information summary.
Levels of Evidence: Adult and Pediatric Treatment Studies
A variety of endpoints may be measured and reported from clinical studies in oncology. These may include total mortality (or survival from the initiation of therapy), cause-specific mortality, quality of life, or indirect surrogates of the four outcomes, such as event-free survival, disease-free survival, progression-free survival, or tumor response rate. Endpoints may also be determined within study designs of varying strength, ranging from the gold standard—the randomized, double-blinded controlled clinical trial—to case series experiences from nonconsecutive patients. The PDQ editorial boards use a formal ranking system of levels of evidence to help the reader judge the strength of evidence linked to the reported results of a therapeutic strategy. For any given therapy, results can be ranked on each of the following two scales: (1) strength of the study design and (2) strength of the endpoints. Together, the two rankings give an idea of the overall level of evidence. Depending on perspective, different expert panels, professional organizations, or individual physicians may use different cut points of overall strength of evidence in formulating therapeutic guidelines or in taking action; however, a formal description of the level of evidence provides a uniform framework for the data, leading to specific recommendations.
The PDQ Adult Treatment Editorial Board and the PDQ Pediatric Treatment Editorial Board add information on levels of evidence, described below, to the PDQ Adult Cancer Treatment Summaries and the PDQ Pediatric Cancer Treatment Summaries when appropriate.
Strength of Study Design
The various types of study design are described below in descending order of strength:
- Randomized, controlled, clinical trials (RCT).
- Nonblinded treatment delivery.
The randomized, double-blinded, controlled, clinical trial (1i) is the gold standard of study design. To achieve this ranking, the study allocation must be blinded to the physician both before and after the randomization and the treatment assignment take place. This design provides protection from allocation bias by the investigator and from bias in assessment of outcomes by both the investigator and the patient. Unfortunately, most clinical trials in oncology cannot be double-blinded after treatment allocation because procedures or toxic effects often vary substantially among study allocations in ways that are obvious to both the health care professional and the patient. In most cases, however, it should be possible to blind the investigator and the patient until the randomization has been made. If blinding of the therapy delivered cannot be accomplished, a rank of 1ii is assigned.
Meta-analyses of randomized studies offer a quantitative synthesis of previously conducted studies. The strength of evidence from a meta-analysis is based on the quality of the conduct of individual studies. Moreover, meta-analyses can magnify small systematic errors in individual studies. A study comparing the results of single, large, randomized trials to those of meta-analyses of smaller trials published earlier on the same topics showed only fair agreement (kappa statistic, 0.35). Outcomes of the large, randomized, controlled trials were not predicted accurately by the meta-analysis 35% of the time. Meta-analyses performed by different investigators to address the same clinical issue can reach contradictory conclusions. Therefore, meta-analyses of randomized studies are placed in the same category of strength of evidence as are randomized studies, not at a higher level.
Subset analyses of randomized studies are subject to errors inherent in multiplicity (i.e., statistically significant results to be expected as a result of random variation of measured effects in multiple subsets). Therefore, subset analyses do not represent the same strength of evidence as the overall analysis of a randomized trial as designed unless explicit prospective hypotheses are made for the analyzed subset. Otherwise, subset analyses should be placed in the next lower category of study design (nonrandomized, controlled, clinical trials).
- Nonrandomized, controlled, clinical trials.
This category includes trials in which treatment allocation was made by birth date, chart number, day of clinic appointment, bed availability, or any other strategy that would make the allocation known to the investigator before informed consent is obtained from the patient. An imbalance can occur in treatment allocation under such circumstances. For the reasons given above, subset analyses within randomized trials often fall into this category of evidence.
- Case series or other observational study designs.
- Population-based, consecutive series.
- Consecutive cases (not population-based).
- Nonconsecutive cases or other observational study designs (e.g., cohort or case-control studies).
These clinical experiences are the weakest form of study design, but they may be the only available or practical information in support of a therapeutic strategy, especially in the case of rare diseases or when the evolution of the therapy predates the common use of randomized study designs in medical practice. They may also provide the only practical design when treatments in study arms are radically different (e.g., amputation vs. limb-sparing surgery). Nevertheless, they always raise issues of patient selection and comparability with other populations. In order of generalizability to other populations are population-based studies that have a definable population, nonpopulation-based but consecutive series, and nonconsecutive cases. Some study designs (e.g., cohort and case-control studies) have internal-control study subjects, while others do not (e.g., case-only series with no internal comparison group or case-only series that are compared with historical controls).
Even large, population-based, observational studies with internal controls that compare therapeutic strategies in oncology should be interpreted with extreme caution. In a study that directly compared observational with RCT results, investigators performed a systematic MEDLINE search for observational studies published from 2000 to 2016 using data from the Surveillance, Epidemiology, and End Results (SEER) Program, SEER-Medicare, or the National Cancer Database that compared treatment regimens for any diagnosis of cancer. The investigators matched 350 treatment comparisons to 121 RCTs that made the same comparison. They found no significant correlation between the hazard ratios (HRs) of the observational studies and the matching RCTs (concordance correlation coefficient, 0.08; 95% confidence interval [CI], -0.07 to +0.23). Only 40% of matched studies agreed with respect to treatment effects (kappa, 0.037), and only 62% of the HRs in observational studies were within the 95% CI of the matched randomized trial. None of these correlations exceeded what would be expected by chance, and correlations did not improve in the studies that used the most sophisticated statistical methods of analysis, including propensity score weighting, instrumental variable adjustment, or sensitivity analysis. Of note, 35 studies reported a positive result among the 70 observational studies ranked as rigorous and that reported overall survival, while the RCT reported either no difference or showed an effect in the opposite direction.
- LeLorier J, Grégoire G, Benhaddad A, et al.: Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 337 (8): 536-42, 1997.
- Bailar JC: The promise and problems of meta-analysis. N Engl J Med 337 (8): 559-61, 1997.
- Soni PD, Hartman HE, Dess RT, et al.: Comparison of Population-Based Observational Studies With Randomized Trials in Oncology. J Clin Oncol 37 (14): 1209-1216, 2019.
Strength of Endpoints
Commonly measured endpoints for adult and pediatric cancer treatment studies are listed below in descending order of strength:
- Total mortality (or overall survival from a defined time).
This outcome is arguably the most important one to patients and is also the most easily defined and least subject to investigator bias.
- Cause-specific mortality (or cause-specific mortality from a defined time).
Although this may be of the most biologic importance in a disease-specific intervention, it is a more subjective endpoint than total mortality and more subject to investigator bias in its determination. This endpoint may also miss important effects of therapy that may actually shorten overall survival.
- Carefully assessed quality of life.
This is an extremely important endpoint to patients. Careful documentation of this endpoint within a strong study design is therefore sufficient for most physicians to incorporate a treatment into their practices.
- Indirect surrogates.
- Event-free survival.
- Disease-free survival.
- Progression-free survival.
- Tumor response rate.
These endpoints may be subject to investigator interpretation. More importantly, they may, but do not automatically, translate into direct patient benefit such as survival or quality of life. Nevertheless, it is rational in many circumstances to use a treatment that improves these surrogate endpoints while awaiting a more definitive endpoint to support its use.
Because studies or clinical experiences are ranked both by strength of design and importance of endpoint, a given study would have a two-tiered ranking (e.g., 1iiA for a nonblinded randomized study showing a favorable outcome in overall survival and 3iiiDiv for a phase II trial of selected patients with response rate as the outcome). In addition, all recommendations must take into account other issues that cannot be so easily quantified, such as toxicity, width of confidence intervals of observations, trial size, quality assurance in the trial, and cost. Nevertheless, the PDQ ranking system provides an ordinal categorization of strength of evidence as a starting point for discussions of study results.
Changes to This Summary (08/27/2019)
The PDQ cancer information summaries are reviewed regularly and updated as new information becomes available. This section describes the latest changes made to this summary as of the date above.
Strength of Study Design
Revised text to state that some study designs have internal-control study subjects, while others do not.
Added text to state that even large, population-based, observational studies with internal controls that compare therapeutic strategies in oncology should be interpreted with extreme caution; in a study that directly compared observational with randomized clinical trial (RCT) results, investigators performed a systematic MEDLINE search for observational studies published from 2000 to 2016 using data from the Surveillance, Epidemiology, and End Results (SEER) Program, SEER-Medicare, or the National Cancer Database that compared treatment regimens for any diagnosis of cancer (cited Soni et al. as reference 3). Also added that the investigators matched 350 treatment comparisons to 121 RCTs that made the same comparisons and found no significant correlation between the hazard ratios of the observational studies and the matching RCTs. Of note, 35 studies reported a positive result among 70 observational studies ranked as rigorous and that reported overall survival, while the RCT reported either no difference or showed an effect in the opposite direction.
This summary is written and maintained by the PDQ Adult Treatment Editorial Board, which is editorially independent of NCI. The summary reflects an independent review of the literature and does not represent a policy statement of NCI or NIH. More information about summary policies and the role of the PDQ Editorial Boards in maintaining the PDQ summaries can be found on the About This PDQ Summary and PDQ® - NCI's Comprehensive Cancer Database pages.
About This PDQ Summary
Purpose of This Summary
This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about the formal ranking system used by the PDQ Editorial Boards to assess evidence supporting the use of specific interventions or approaches. It is intended as a resource to inform and assist clinicians in the care of their patients. It does not provide formal guidelines or recommendations for making health care decisions.
Reviewers and Updates
This summary is reviewed regularly and updated as necessary by the PDQ Adult Treatment Editorial Board, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).
Board members review recently published articles each month to determine whether an article should:
- be discussed at a meeting,
- be cited with text, or
- replace or update an existing article that is already cited.
Changes to the summaries are made through a consensus process in which Board members evaluate the strength of the evidence in the published articles and determine how the article should be included in the summary.
Any comments or questions about the summary content should be submitted to Cancer.gov through the NCI website's Email Us. Do not contact the individual Board Members with questions or comments about the summaries. Board members will not respond to individual inquiries.
Levels of Evidence
Some of the reference citations in this summary are accompanied by a level-of-evidence designation. These designations are intended to help readers assess the strength of the evidence supporting the use of specific interventions or approaches. The PDQ Adult Treatment Editorial Board uses a formal evidence ranking system in developing its level-of-evidence designations.
Permission to Use This Summary
PDQ is a registered trademark. Although the content of PDQ documents can be used freely as text, it cannot be identified as an NCI PDQ cancer information summary unless it is presented in its entirety and is regularly updated. However, an author would be permitted to write a sentence such as “NCI’s PDQ cancer information summary about breast cancer prevention states the risks succinctly: [include excerpt from the summary].”
The preferred citation for this PDQ summary is:
PDQ® Adult Treatment Editorial Board. PDQ Levels of Evidence for Adult and Pediatric Cancer Treatment Studies. Bethesda, MD: National Cancer Institute. Updated
Images in this summary are used with permission of the author(s), artist, and/or publisher for use within the PDQ summaries only. Permission to use images outside the context of PDQ information must be obtained from the owner(s) and cannot be granted by the National Cancer Institute. Information about using the illustrations in this summary, along with many other cancer-related images, is available in Visuals Online, a collection of over 2,000 scientific images.
Based on the strength of the available evidence, treatment options may be described as either “standard” or “under clinical evaluation.” These classifications should not be used as a basis for insurance reimbursement determinations. More information on insurance coverage is available on Cancer.gov on the Managing Cancer Care page.
More information about contacting us or receiving help with the Cancer.gov website can be found on our Contact Us for Help page. Questions can also be submitted to Cancer.gov through the website’s Email Us.