CDISC SDTM & ADaM: Designing Inspectable Clinical Data Standards End-to-End

Published on 16/11/2025

Operationalizing CDISC SDTM and ADaM for Reliable, Regulator-Ready Evidence

Why Standards Matter: Regulators, Quality, and the End-to-End Story

Clinical data standards translate a complex trial into structured, analyzable, and reviewable evidence. The Clinical Data Interchange Standards Consortium (CDISC) suite—primarily SDTM (Study Data Tabulation Model) for tabulation and ADaM (Analysis Data Model) for analysis—enables consistent review and re-use across programs and agencies. When implemented well, standards shorten review timelines, reduce ambiguity, and strengthen the chain protocol intent → collection → tabulation → analysis → reporting that reviewers must be able to reconstruct.

Global expectations.

While specific guidance evolves, sponsors should assume that major agencies expect CDISC-compliant datasets accompanied by machine-readable metadata:

U.S. FDA: study data standards and the Study Data Technical Conformance mindset emphasize SDTM/ADaM and define.xml to facilitate reproducibility.
EMA: centralized procedures routinely receive CDISC data; reviewer guides improve traceability during assessment.
PMDA: publishes a data standards catalog and expects conformance and clear metadata/traceability.
TGA: aligns with ICH and eCTD practices; CDISC submissions ease review.
ICH: quality-by-design and estimand thinking encourage structured data that preserve decision-critical assumptions.
WHO: public-health priorities are best served by transparent, interoperable data that can be re-examined years later.

Principles first, mappings second. Standards are not a “file format”; they are a quality system. Anchor implementation to Critical-to-Quality (CtQ) factors: consent integrity, eligibility precision, endpoint timing/method fidelity, IP/device integrity, safety clocks, and traceable data lineage. For each CtQ domain, define: system of record, transformation rules, conformance checks, and metadata sufficient for re-analysis.

Metadata are the product. The most valuable deliverable is not the XPT file—it is the metadata that make it intelligible. That includes define.xml 2.1 (dataset/variable/value-level metadata, derivations, computational algorithms), the Study Data Reviewer’s Guide (cSDRG), the Analysis Data Reviewer’s Guide (ADRG), annotated CRFs (aCRF), and a Study Data Standardization Plan-style overview. Together they explain why a value looks the way it does and how to reproduce a table or figure.

Controlled terminology and identifiers. Fix versions for NCI-sourced controlled terminology (codelists) and dictionary versions (e.g., MedDRA, WHODrug). Preserve consistently the core keys STUDYID, USUBJID, domain keys like --SEQ, and analysis keys like ASEQ. Store timestamps in ISO 8601 with local time and UTC offset to end disputes across time zones and daylight saving transitions.

Risk-proportionate design. Not every nuance requires a custom domain. Prefer canonical SDTM domains and ADaM structures; use supplemental qualifiers (SUPP--) or RELREC only when necessary. Avoid over-customization that increases validation noise without clinical value. This proportionate approach will feel consistent with expectations across FDA/EMA/PMDA/TGA and the ICH ethos.

Getting SDTM Right: Domains, Timing, and Transparent Traceability

Domain strategy. SDTM organizes data into Events (e.g., AE, MH, DS), Interventions (CM, EX, PR, SU), Findings (LB, VS, EG, QS), Findings About (FA), and Special-Purpose (DM, CO, SE, SV) domains. Device and imaging add specialized structures (e.g., SDTM-MD, MI). Start by mapping each eCRF/eSource page to its canonical domain. Only introduce custom domains when the SDTM Implementation Guide cannot accommodate the concept with a standard pattern.

Keys and relationships. Implement stable keys and declared links:

Uniqueness: Enforce STUDYID + USUBJID + domain-level keys (e.g., --SEQ) as unique.
RELREC: Use the Relationships table to connect parent↔child records (e.g., AE that leads to a dose change in EX), rather than inventing bespoke variables.
Findings About: Use FA to annotate characteristics about events/findings (e.g., severity gradings, lesion size attributes) where appropriate, instead of proliferating custom variables.

Timing is everything. SDTM time variables—--DTC (ISO 8601 date/time), --DY (study day), visit variables (VISIT, VISITNUM, --TPT)—must reflect the protocol’s window logic. When partial dates exist, represent them per IG rules and document imputations separately in ADaM (not in SDTM). For decentralized components (eCOA/wearables), propagate device timestamps and synchronization indicators through QS (questionnaires) or custom findings as needed, preserving source system provenance in --ORRESU/--METHOD or SUPP-- with clear descriptions.

Supplemental qualifiers without mystery. If a concept does not fit a core variable, place it in SUPPxx with QNAM, QLABEL, QVAL, and QORIG. Avoid burying critical analysis drivers in SUPP; regulators should not have to hunt for primary endpoints. If a SUPP item proves ubiquitous, consider promoting it to a proper variable in a future amendment and explain the transition in the cSDRG.

Controlled terminology and value-level metadata. For variables constrained by codelists (e.g., AE severity, relationship to study drug), use the correct --CAT/--SCAT, codelist names, and versions. When permissible values depend on another variable (e.g., lab test specifics), define Value-Level Metadata (VLM) in define.xml 2.1 so reviewers know exactly which values apply and when.

Traceability forward to ADaM. Bake traceability into SDTM: keep derivation-ready fields and provenance. Use consistent identifiers (--SEQ, --GRPID, device serials/UIDs, accession IDs) so ADaM can reference its source via SRC variables or LINKID patterns. Your cSDRG should narrate mapping choices, unit conversions, and any known conformance “noise” and how it was mitigated.

Quality controls. Run conformance checks early (e.g., against community rule sets) to catch structure errors, unmet codelists, or illegal nulls. Separate conformance failures (fix them) from content issues (justify clinically, explain in cSDRG). Resist hiding problems in reviewer guides; fix the data when feasible.

Designing ADaM That Analysts and Inspectors Can Trust

Purpose and structures. ADaM provides analysis-ready datasets that are traceable to SDTM and suitable for recreating tables, figures, and listings (TFLs). The main structures are:

ADSL (Subject-Level Analysis): one record per subject; analysis populations (e.g., SAFFL, FASFL, ITTFL, PPSFL), key dates (randomization, treatment start/stop), strata, covariates, and study identifiers.
BDS (Basic Data Structure): one or more records per subject, parameter, and analysis time; used for labs, efficacy endpoints, questionnaires, vitals (e.g., ADLB, ADEFF, ADVS, ADPRO).
OCCDS (Occurrence Data): event-oriented analysis such as ADAE for AEs.
ADTTE (Time-to-Event): survival/time-to-event endpoints with censoring indicators and analysis times.

Traceability you can follow with a finger. Every derived value should point back to SDTM: capture SRCVAR, SRCSEQ, and where needed SRCDOM. For composite derivations, include algorithm descriptions in define.xml and ADRG, and, when helpful, add helper variables (e.g., pre-derivation components) for transparency. The best test is simple: can a reviewer find the exact SDTM row(s) that produced AVAL without guessing?

BDS patterns that scale. Use PARAMCD/PARAM to define what is being analyzed; AVAL/AVALC for numeric/categorical values; AVISIT/AVISITN and ADT/ATPT for timing; baseline and change variables (ABLFL, BASE, CHG, PCHG); and analysis flags (ANLzzFL) to subset the dataset exactly as the SAP specifies. Avoid hard-wiring visit windows into code without documenting them in metadata.

ADTTE done right. Define time origin (e.g., randomization), event definitions, censoring rules, and competing events. Variables such as PARAMCD (e.g., OS, PFS), AVAL (time), CNSR, and ADT must align with the SAP. Keep SRC** pointers to SDTM AE/DS/RS (response) as applicable, and document handling of partial dates or intermittent assessments.

Populations and flags. Implement analysis populations explicitly in ADSL with boolean flags (e.g., SAFFL, ITTFL) defined in ADRG. Derive them reproducibly from SDTM (e.g., treatment exposure, randomization status) and avoid “hand edits.” If protocol amendments change criteria, version the flag derivations and explain the historical application.

Algorithms and imputations. Move imputations to ADaM, not SDTM. Document all rules (e.g., windowing, partial date imputation, questionnaire scoring, lab unit conversions) in define.xml 2.1 computational algorithms and ADRG narrative. Prefer deterministic, auditable logic over ad-hoc scripts; store function versions when you package code.

Device, imaging, and DCT considerations. For device endpoints (e.g., wearable summaries), create BDS parameters that reflect clinically validated features (mean daily steps, arrhythmia burden) and retain source provenance. For imaging, include read dates, parameter compliance flags, and link to SDTM MI/FA via SRC**. For eCOA, capture adherence and sync latency as analysis variables when they affect estimands or missing-data handling.

Validation mindset. Cross-check counts vs SDTM, reproduce key SAP outputs independently, and run rule-based validators to catch structure/content issues. Separate “must fix” defects from “explain in ADRG” edge cases. Keep the code and configuration (dictionary versions, controlled terminology) under change control so analyses are re-runnable.

Submission Package, Evidence, and Pitfalls: Making Standards Work on Inspection Day

What reviewers expect to find quickly. A coherent package that lets them regenerate a TFL and verify lineage without interviews:

Datasets (SDTM, ADaM) in XPT v5 with consistent keys and controlled terminology.
define.xml 2.1 with dataset, variable, and value-level metadata; derivations/algorithms; codelist versions; where-used for VLM.
Annotated CRF (aCRF) mapping fields to SDTM variables; show controlled terms and units.
cSDRG and ADRG narrating assumptions, deviations from IGs, reconciliation issues, and traceability design.
Program and configuration catalog: analysis code versions, controlled terminology versions, and links to configuration snapshots taken at lock.

Conformance and content checks. Run rule-based conformance (structure, codelists, timing) and document resolutions. For content, reconcile SDTM vs ADaM counts for subjects, visits, and key endpoints; verify population flags; and demonstrate that one or two headline efficacy/safety TFLs can be regenerated from ADaM with the provided programs.

Time and provenance discipline. Include local time and UTC offset in timestamps exposed to reviewers; keep point-in-time configuration snapshots (e.g., dictionary versions, visit windows) captured at freeze and lock. These small habits eliminate “which day?” debates and are recognizable across agencies.

Common pitfalls—and durable fixes.

Hidden derivations in code only → move logic into define.xml algorithms and ADRG; expose helper variables in ADaM when needed.
Overuse of SUPPxx for analysis drivers → promote to core variables or to FA where appropriate; document transitions in the cSDRG.
Visit/window ambiguity → standardize windowing rules; reflect them in AVISITN and ANLzzFL; narrate exceptions.
Unstable keys (changing --SEQ or missing identifiers) → lock generation rules; version mapping; resist re-sequencing late in the study.
Dictionary/codelist drift → freeze versions with effective dates; justify upgrades; keep side-by-side outputs if recoding occurs.
Device/imaging provenance gaps → carry UIDs/serials and read dates from SDTM into ADaM; document parameter compliance handling.

Checklist (study-ready CDISC package).

Clear mapping matrix from eCRF/eSource to SDTM domains; unit conversions and timing rules specified.
Controlled terminology and dictionary versions fixed; NCI codelist names referenced in define.xml.
Traceability design implemented (SAS keys or SRC** variables) so ADaM can point to exact SDTM rows.
ADSL complete with analysis population flags; BDS/OCCDS/ADTTE datasets align with SAP; algorithms documented.
cSDRG/ADRG drafted early and updated through lock; aCRF complete and legible.
Validation evidence available (conformance reports, double-programming spot checks, reconcile counts).
Configuration snapshots and program versions archived at freeze and lock; time stamps with local time + UTC offset.

Bottom line. Standards are a means to trustworthy decisions. When SDTM is faithful to sources, ADaM is transparent and reproducible, metadata are rich and current, and time/provenance are unambiguous, your package will feel familiar and reliable to assessors at the FDA, EMA, PMDA, TGA, within the ICH community, and aligned with the public-health mission of the WHO.