Published on 16/11/2025
Handling Missing Data in Clinical Trials: Strategy, Methods, and Sensitivity that Regulators Trust
Start with the Question: Estimands, Missingness Taxonomy, and Prevention by Design
Missing data are inevitable—but unplanned handling can distort treatment effects and credibility. A defensible approach starts with the estimand (per ICH E9(R1)) and then distinguishes missingness from intercurrent events. If the primary estimand is a treatment policy effect, post-rescue observations are data, not missing; if the estimand is hypothetical without rescue, data after rescue are missing by design and require principled assumptions. Aligning the strategy to the
Taxonomy—know what you’re assuming. Missingness mechanisms shape valid methods:
- MCAR (Missing Completely At Random): missingness independent of observed and unobserved values. Rare in practice; justifies simple methods but seldom believable.
- MAR (Missing At Random): missingness depends only on observed data (e.g., baseline severity, prior outcomes). Supports Multiple Imputation (MI) and likelihood-based models when covariates are rich.
- MNAR (Missing Not At Random): missingness depends on unobserved outcomes (e.g., participants stop visits when not improving). Requires sensitivity analyses (pattern-mixture, selection models, delta adjustments, or reference-based imputation).
Prevention beats repair. Statistical sophistication cannot substitute for good operations. Build prevention into protocol and systems:
- Visit windows with grace periods and proactive rescheduling; centralized monitoring to flag missed visits in near real time.
- eCOA diaries with gentle prompts and display of “time-last-synced”; server receipt time stored with local time and UTC offset.
- Home health or tele-visits for hard-to-reach participants; kit delivery aligned to windows via IRT.
- Minimal respondent burden for PROs; permissible partial completion with validated scoring rules.
Data capture for analysis later. To support principled MAR/MNAR handling, capture reasons for missingness (e.g., AE, lack of efficacy, relocation), treatment discontinuation flags, rescue/switch timestamps, and adherence measures. In EDC/eSource, enforce structured reasons rather than free text; enable audit trails that record who/what/when/why with time zone context. These details enable realistic models and transparent sensitivity analyses in the SAP.
Define roles in the SAP. The SAP must pre-specify the primary method (and its assumptions) plus a structured sensitivity suite that probes plausible departures. Include: variables in imputation models, number of imputations, how intercurrent events map to missingness (for hypothetical estimands), and how PRO item-level missingness is handled. The programming specifications should then translate this into ADaM variables and TFL shells so reviewers can trace intent → code → outputs at agencies like the FDA and EMA.
Choose and Justify Your Primary Method: Likelihood, MI, and Domain-Specific Nuances
Likelihood-based longitudinal models (often MAR). For continuous repeated measures (e.g., change from baseline), MMRM (mixed models for repeated measures) analyzes observed data without imputing missing post-baseline values, assuming MAR conditional on included covariates. Pre-specify covariance structure (e.g., unstructured, compound symmetry), transformations, and visit window rules. Include diagnostics (residuals, influence) and a plan for robust variance if assumptions strain.
Multiple Imputation (MI) under MAR. MI creates multiple completed datasets using models that condition on observed data; analyses are combined using Rubin’s rules. Specify:
- Imputation model variables (treatment, visit, baseline, region, prior outcome, prognostic covariates).
- Imputation type (multivariate normal; fully conditional specification for mixed types; or predictive mean matching).
- Number of imputations (e.g., ≥m where m ≈ % missing information; often 20–50).
- Alignment with the analysis model (same estimand scale, transformations, and interaction terms).
Reference-based imputation (MNAR-leaning, common in withdrawal). When participants discontinue due to lack of efficacy, assume post-dropout behavior resembles a reference group. Popular options:
- J2R (Jump-to-Reference): upon treatment discontinuation, outcomes follow the control arm trajectory.
- CIR (Copy Increments in Reference): preserve within-participant change at dropout but add future increments from control.
- CR (Copy Reference): replicate the entire reference distribution after dropout.
These methods require careful justification in the SAP and alignment with the estimand (often hypothetical). Perform simulation or at least scenario analyses to show operating characteristics and impact on power.
Delta-adjusted MI (pattern-mixture flavor). Apply offsets (δ) to imputed treatment-arm responses to reflect plausible shortfalls among dropouts (e.g., −0.5 SD). Provide a grid of δ values in sensitivity analyses and interpret the δ at which conclusions would change (tipping point).
Selection models. Model the joint distribution of outcomes and response indicators via a missingness (selection) model. Useful when scientific rationale suggests a mechanism for dropout given unobserved data; more technical and often reserved for sensitivity.
Binary/ordinal endpoints. Use logistic models or MI with appropriate link functions; for ordinal scales, consider proportional odds with partial proportional alternatives tested in sensitivity. For responder analyses, define how missing assessments are treated (e.g., non-responder imputation under a composite estimand vs MI under hypothetical).
Time-to-event outcomes. Missingness manifests as informative censoring. Methods include:
- IPCW (Inverse Probability of Censoring Weights) with robust variance; model censoring using observed covariates and time-varying factors.
- Joint models (e.g., longitudinal biomarker + survival) to capture informative dropout.
- RMST (Restricted Mean Survival Time) as an estimand robust to proportional-hazards violations common when rescue/switching occurs.
PK/PD and count outcomes. For negative binomial models (e.g., exacerbations), justify how exposure time and missing follow-up are handled (offset terms, IPCW). For PK/PD, prespecify rules for below-LLOQ values (set to LLOQ/2, model-based approaches) and window adherence.
PRO/COA instruments. Handle item-level missingness using validated scoring algorithms (e.g., ≥50% items present), or MI at item level when permitted. Distinguish instrument missingness from visit missingness; pre-specify rules in the SAP and TFL shells.
Sensitivity Suites that Convince: Structured Probes of Assumptions, Not Fishing Expeditions
Principles for sensitivity analysis. Sensitivity must be pre-specified, structured, and decision-oriented. For each estimand, identify the key assumptions (MAR, trajectory similarity after dropout, censoring independence) and design a small number of targeted probes that vary those assumptions in clinically interpretable ways. Report directionality (how estimates move) and the threshold at which decisions would differ.
Recommended toolkit (choose per estimand/endpoint).
- Longitudinal continuous (treatment policy or hypothetical):
- Primary: MMRM (MAR) or MI(MAR) aligned with analysis model.
- Sensitivity A: Reference-based MI (J2R/CIR/CR) for discontinuations due to lack of efficacy.
- Sensitivity B: Delta-adjusted MI grid (e.g., δ ∈SD).
- Sensitivity C: Pattern-mixture with dropout-time strata.
- Sensitivity D: Robust variants (Huber-White SEs; rank-based ANCOVA).
- Binary responder:
- Primary (composite): non-responder assignment if rescue/discontinuation; missing otherwise via MI under MAR.
- Sensitivity: tipping-point table that converts increasing fractions of missing to non-responders in the treatment arm; or δ-adjusted logit-space MI.
- Time-to-event:
- Primary: stratified Cox/log-rank with planned censoring rules.
- Sensitivity A: IPCW for informative censoring using time-dependent covariates.
- Sensitivity B: RMST at clinically relevant horizons (e.g., 12, 24 months).
- Sensitivity C: Landmark/milestone survival or weighted log-rank for non-PH.
Visualizations that surface risk. Provide missingness heatmaps by visit and arm, Kaplan–Meier of first missing visit, spaghetti plots of trajectories for completers vs dropouts, and funnel plots of site-level missingness. These help DSMBs and regulators see whether assumptions like MAR are plausible.
Blinded Data Review (BDR) boundaries. Before unblinding, verify the completeness and logic of missingness indicators, rescue/switch flags, and reasons. Correct inconsistencies (date logic, duplicate visits) under documented governance; do not modify values in ways that depend on suspected arm effects. Store BDR approvals with local time + UTC offset in the TMF.
Programming discipline. Implement imputation/analysis pipelines as version-controlled code with seeds recorded. For MI, fix random seeds and store imputation diagnostics (convergence, fraction of missing information). For reference-based MI, archive the reference selection rules and mapping to ADaM flags. Double-program high-stakes outputs.
Subgroup and multiplicity connections. Sensitivity is not multiplicity control, but claims over subgroups should not rest solely on sensitivity outputs. Keep the hierarchy clear in the SAP: primary inference first, sensitivity for robustness, exploratory subgroup signals labeled as such unless multiplicity control is in place.
Communicating results. Present a compact table: primary estimate/CI/p-value; sensitivity estimates; concise narrative of how the estimate shifts across assumptions and whether conclusions change. Avoid “kitchen-sink” appendices without synthesis; regulators value clarity over volume.
Proving Control on Inspection Day: Evidence, KPIs, Pitfalls, and a One-Page Checklist
Rapid-pull evidence pack (what reviewers ask first).
- Protocol/SAP sections showing estimand ↔ missingness strategy mapping; pre-specified primary and sensitivity methods.
- Programming specs and code (with versions and random seeds) for MI/MMRM/IPCW/reference-based methods; imputation diagnostics.
- ADaM metadata defining missingness flags, rescue/switch variables, and PRO scoring rules; traceability to SDTM and source.
- Configuration snapshots (eCRF forms, visit windows, dictionary versions) at UAT, release, and lock; audit trails with local time and UTC offset for key edits.
- Visual QA: missingness heatmaps, event accrual vs plan, site-level missingness outliers, and BDR minutes.
- Operating characteristics or scenario analyses demonstrating the impact of plausible MNAR departures.
KPIs that reveal control (examples).
- Visit completeness at primary endpoint by arm (target ≥90% or justified); trend over time.
- Reason-coded missingness completion (≥95% of missing visits have structured reasons).
- eCOA adherence (median latency to sync; % days with entries during critical windows).
- Fraction of Missing Information (FMI) for key parameters; alert if FMI exceeds planned bounds.
- Reproducibility (independent rerun match rate for MI/MMRM outputs = 100% within tolerance).
Common pitfalls—and durable fixes.
- Estimand–method mismatch (e.g., treatment-policy estimand analyzed as hypothetical). → Realign SAP; map intercurrent events explicitly.
- MAR asserted, not supported. → Enrich imputation models with prognostic covariates; provide visuals/reasons; run MNAR sensitivity.
- Non-prespecified sensitivity introduced post hoc. → Keep a pre-declared suite; label truly post hoc as exploratory and avoid over-reliance.
- Imputation/analysis model incongruence (different covariates/transformations). → Harmonize models; document any intentional differences.
- Seed or version opacity. → Lock seeds and package versions; archive scripts; include checksums in the TMF.
- PRO scoring not documented. → Reference instrument manuals; pre-specify partial-completion rules; verify implementation.
- Informative censoring ignored in survival. → Add IPCW/joint modeling/RMST sensitivity; document impact.
Study-ready checklist (single page).
- Primary estimand(s) defined; missingness mapped to intercurrent-event strategies.
- Primary method specified (MMRM/MI/IPCW/etc.) with assumptions, variables, and diagnostics; number of imputations and seeds set.
- Structured sensitivity suite (reference-based MI, delta grids, pattern-mixture, IPCW/RMST) pre-declared and tied to clinical narratives.
- EDC/eCOA capture reasons for missingness; rescue/switch/discontinuation timestamps recorded with local time + UTC offset.
- BDR scope and approvals filed; corrections traceable via audit trails; configuration snapshots archived.
- ADaM variables/flags documented; PRO item-level rules specified; TFL shells aligned to estimands.
- Reproducibility plan executed (double-program key outputs); code, versions, and checksums archived in the TMF.
- Dashboard monitoring of completeness, FMI, and eCOA adherence; CAPA triggers defined for threshold breaches.
Bottom line. Missing data strategy is not an afterthought; it is part of trial design and evidence integrity. When you anchor methods to the estimand, prevent avoidable gaps, choose models whose assumptions you can defend, and run a concise, pre-specified sensitivity suite, your results will withstand scrutiny across the FDA, EMA, PMDA, TGA, within the ICH community, and in line with the WHO mission for trustworthy, reproducible health evidence.