Estimands & Intercurrent Events: Turning Clinical Questions into Reproducible, Inspectable Analyses

Published on 15/11/2025

Estimands and Intercurrent Events: Framing the Question So the Answer Stands Up to Review

Estimands 101: Define the Decision Question Before You Touch a Dataset

Estimands are the backbone of modern clinical inference: they make explicit what effect a trial seeks to estimate, for which population, under what handling of intercurrent events, and using which summary measure. The ICH E9(R1) addendum anchors this thinking and is widely recognized by authorities such as the ICH assembly, the U.S. FDA, the EMA, Japan’s

href="https://www.pmda.go.jp/english/" target="_blank" rel="noopener">PMDA, Australia’s TGA, and aligns with the public-health mission of the WHO.

A compliant estimand specification contains four parts: (1) Treatment condition(s); (2) Population (e.g., all randomized eligible participants); (3) Variable/Endpoint (definition and timing); and (4) Intercurrent-event strategy together with the population-level summary (e.g., mean difference, risk ratio, hazard ratio). By nailing these choices up front, you keep design, conduct, analysis, and interpretation synchronized.

Why sponsors stumble. Historically, protocols named endpoints and analysis models but left ambiguous what to do when real-world events occurred—discontinuation, rescue medication, switching therapies, death, or prohibited concomitants. The estimand framework removes that ambiguity, replacing “we will handle appropriately” with a transparent, auditable plan that statisticians can implement and regulators can inspect.

Tie to clinical objectives. Start from the decision the evidence must support. If prescribers and payers need to know the expected treatment effect in routine use even when patients switch or use rescue, a treatment policy strategy is coherent. If the scientific question is mechanistic—what would happen if no rescue were used—a hypothetical strategy is apt. If taking rescue is itself a clinically meaningful failure, a composite strategy suits. If benefit is only relevant while patients persist on assigned therapy, a while-on-treatment strategy fits. And if the question is about a subgroup defined by potential intercurrent events, a principal stratum strategy may be considered (with care).

Population and timing matter. Pre-specify the population (all randomized vs a biomarker-positive subset), the time horizon (e.g., change from baseline at Week 24, 52-week risk, or time to composite failure), and any windows or grace periods. In time-to-event frameworks, clarify the origin (randomization vs first dose), competing risks, and whether events after switching count.

Inspectability by design. Estimands are not stand-alone prose. They must map to design choices (sample size, randomization strata), data flows (capturing intercurrent-event metadata), and analysis code (SAP, programs, and outputs). File point-in-time configuration snapshots for forms, edit checks, and visit windows so reviewers can reconstruct what was in force when decisions and events occurred.

Intercurrent Events: A Playbook of Strategies and When to Use Them

What qualifies as an intercurrent event? Any post-randomization occurrence that affects either the interpretation or the existence of the endpoint: treatment discontinuation, rescue/alternative treatments, dose changes beyond protocol, prohibited concomitants, surgical interventions, death, COVID-19 disruptions, or even device replacements in device trials. The key is pre-specification and consistent capture.

Strategy catalogue.

Treatment policy: ignore the intercurrent event for the endpoint; analyze all observed data regardless of rescue/switch. Best for policy questions about “use in practice.” Requires robust missing-data handling because behavior after the event may change adherence and visit completion.
Hypothetical: ask what would have happened if the intercurrent event had not occurred. Requires explicit modeling/assumptions (e.g., multiple imputation under a defined scenario). Useful for mechanistic questions; document plausibility and sensitivity grids.
Composite: incorporate the event into the endpoint (e.g., failure if rescue taken, surgery, or death). Straightforward to implement and interpret; ensure clinical meaningfulness and avoid gaming by uneven access to rescue.
While-on-treatment: focus on outcomes measured up to discontinuation or before prohibited concomitants. Requires careful censoring rules and clarity that the effect pertains to persistence.
Principal stratum: target the effect in the latent subgroup that would (or would not) experience the event under either treatment (e.g., survivors). Statistically challenging; needs strong identification assumptions or instrumental variables; often relegated to supportive estimation.

Choosing coherently. The same trial can host different estimands for different stakeholders. For example, a chronic-disease study might pair: (1) a treatment-policy primary estimand for labeling, and (2) a hypothetical supportive estimand to explore pharmacologic effect absent rescue. Pre-plan multiplicity and reporting hierarchy to avoid confusion.

Time-to-event specifics. In survival analyses, specify whether post-discontinuation events count, whether switching triggers censoring (and if so, whether to use methods such as inverse probability of censoring weights, rank-preserving structural failure time models, or two-stage estimators), and how competing risks are handled. Non-proportional hazards are common when switching or rescue occurs late; consider estimands based on restricted mean survival time (RMST) or milestone survival.

Responder and composite scales. For binary responder endpoints, define responder criteria that incorporate intercurrent events (e.g., responder only if target biomarker threshold is met and no rescue). For ordinal scales, specify how the event maps to categories (e.g., “worst category” upon rescue or surgery). Keep the mapping clinically motivated and symmetric across arms.

Device and procedure nuance. In device trials, intercurrent events include explant, revision, or crossover procedures. Clarify whether outcomes post-revision belong to the original strategy (treatment policy) or require a composite failure rule.

Document the taxonomy. Build a codebook of intercurrent events in the Data Management Plan: definitions, triggers, capture methods (EDC, IRT, eCOA, medical review), timestamp requirements (local time + UTC offset), and links to analysis flags. This turns real-world messiness into analysable, inspectable structure.

From Words to Workflows: Operationalizing Estimands Across Protocol, Data, and SAP

Protocol precision. Place formal estimand statements in the objectives/endpoints cross-referencing the intercurrent-event taxonomy. For each, specify the population, endpoint definition/timing, strategy, and summary measure. Write in testable terms (“treatment policy estimand for mean change at Week 24, ITT population, difference in least-squares means”).

Sample size implications. Estimands influence variance and event rates. A treatment-policy approach may increase variability due to heterogeneous post-event behavior and thus require larger N. Composite strategies can increase event rates (improving power) but may dilute clinical meaning if the composite is dominated by soft components. Survival estimands that censor at switching usually change the effective information and mandate simulation to understand power under plausible switching patterns.

CRF and system design. Build data capture around estimands: fields and edit checks for rescue, discontinuation, switching, prohibited concomitants, surgery, and reasons. In IRT/IAM, capture emergency unblinding with timestamps and rationale. Ensure audit trails record who/what/when/why with local time and UTC offset so an inspector can reconstruct the sequence of events relative to visit windows and endpoint timing.

Programming & derivations. In ADaM, create explicit flags/variables to represent intercurrent events (e.g., ASEVNT types, ONTRTFL, SWITCHDT, RESCUEFL). For hypothetical strategies, implement multiple imputation or model-based approaches aligned with the SAP, carrying through stratification and covariates. For while-on-treatment, define rules for truncation and document how missingness before truncation is handled.

Missing data vs intercurrent events. Distinguish the two. An intercurrent event handled by the estimand strategy is not “missing”—it is part of the definition. Only data absent relative to the chosen strategy are “missing.” In treatment-policy estimands, post-event observations are observed data; in hypothetical estimands, data after the event are typically missing by design and require assumption-driven imputation or modeling. State the mechanisms (MAR/MNAR) and conduct tipping-point analyses that vary plausible departures.

Switching and advanced methods. If censoring at switching is used, pre-specify methods to mitigate bias from informative censoring. Options include inverse probability of censoring weighting (with robust variance), rank-preserving structural failure time models, and two-stage estimators. Simulate operating characteristics under realistic switching distributions and covariate patterns; retain code, seeds, and versions for inspection.

Blinded Data Review (BDR). Schedule a BDR to verify definitional logic without breaking the blind: check rescue/switch flags, discontinuation codes, and time anchors; confirm consistency across EDC, IRT, safety, and adjudication data. Document what may be corrected (e.g., inconsistent dates) and what may not be altered (e.g., outcomes themselves), with approvals and audit trails.

Transparency in the SAP. For each estimand, the SAP should name the model (e.g., ANCOVA, MMRM, stratified Cox, RMST), covariates, and exact handling of the intercurrent events/flags. For hypothetical approaches, specify imputation models (variables, visit structure, number of imputations, delta adjustments). For composite/while-on-treatment, define censoring, failure, and windows precisely. Include a mapping table from the estimand prose → ADaM variables/flags → TFL shells.

Communications and labeling. Plan how estimands appear in the CSR and labeling language. Use phrasing that reflects the strategy (“among all randomized participants regardless of rescue,” “among participants while persisting on assigned treatment,” “in a hypothetical scenario without rescue therapy”). Clarity here reduces misinterpretation in HTA/payer submissions.

Regulatory Evidence Pack: Common Pitfalls, Sensitivity Arsenal, and a One-Page Checklist

What reviewers will ask for quickly.

Formal estimand statements per ICH E9(R1); protocol/SAP cross-references and approval dates.
Intercurrent-event taxonomy, capture methods, edit checks, and examples; evidence of training for sites and monitors.
ADaM derivation specs and variable dictionaries for flags/timestamps; traceability back to SDTM and source.
Operating characteristics for chosen strategies (analytic or simulation), including power under switching/rescue scenarios.
Sensitivity suite: treatment-policy vs hypothetical contrasts; tipping-point grids; RMST vs Cox when PH may fail; IPCW/RPSFT performance summaries.
Audit-ready artifacts: configuration snapshots, audit trails with local time + UTC offset for key events, and BDR documentation.

Sensitivity analyses—organized, not opportunistic. Pre-define a reference (primary) analysis and a structured set of sensitivity analyses that probe departures in assumptions relevant to the chosen strategy. Examples: (a) hypothetical imputation with alternative deltas; (b) treatment-policy MMRM vs MI; (c) survival analyses using RMST milestones; (d) IPCW with different covariate sets; (e) per-protocol supportive estimation under while-on-treatment strategies. Present results with clear directionality (how assumptions shift estimates) rather than a laundry list of p-values.

Frequent failure modes—and durable fixes.

Estimand–analysis mismatch (e.g., hypothetical prose, treatment-policy analysis) → align definitions; update both protocol and SAP with governance.
Undefined intercurrent events → lock a taxonomy and capture plan; rehearse retrieval; include examples in the TMF.
Switching ignored in survival estimands → pre-specify IPCW/RPSFT or alternative estimands; simulate under plausible switching.
Composite dominated by soft components → justify weights/components clinically; consider dual estimands (composite and “hard component”).
Ambiguous timing windows → specify anchors and grace periods; store local time and UTC offset; confirm DST handling.
Over-reliance on MAR for hypothetical imputation → provide MNAR/tipping-point analyses with clinically interpretable deltas.
Unverifiable code → archive programs, seeds, versions; double-program key pieces; map estimand prose to code lineages.

Program-level KPIs to demonstrate control.

Estimand traceability: time to retrieve prose → ADaM flags → TFL mapping (target: minutes).
Capture completeness for intercurrent events (≥99% of expected fields populated at key visits).
Consistency between EDC and IRT/safety for event timing (mismatch rate near zero; all mismatches adjudicated).
Sensitivity coverage: proportion of planned sensitivity analyses executed and reported (target: 100%).
Reproducibility: independent rerun success for primary estimand outputs (target: 100%).

Study-ready checklist (single page).

For each primary/key secondary objective, an estimand statement with treatment, population, variable/timing, strategy, and summary measure.
Intercurrent-event taxonomy, CRF fields, and edit checks defined; training artifacts filed; capture includes local time + UTC offset.
SAP encodes models per estimand, including imputation or censoring logic; ADaM derivations and flags mapped to TFL shells.
Sample size reflects strategy impacts (variance/event rate); survival designs simulate switching/rescue scenarios.
BDR scope defined; discrepancies resolvable without unblinding; audit trails exportable and archived.
Sensitivity plan pre-specified (hypothetical vs treatment-policy contrasts, RMST, IPCW/RPSFT, tipping-points) with decision-oriented presentation.
Rapid-pull TMF index points to protocol/SAP approvals, configuration snapshots, audit trails, derivation specs, and simulation code/outputs.

Bottom line. Estimands convert an implicit research question into explicit, inspectable intent. When you choose coherent strategies for intercurrent events, capture them faithfully, and implement analyses that match the prose—with sensitivity analyses that reveal how assumptions matter—your results will be reproducible, intelligible, and persuasive to assessors at the FDA, the EMA, the PMDA, the TGA, across the ICH community, and aligned with the WHO public-health perspective.