CSR Tables, Figures & Listings (TFLs): Templates, Traceability, and Quality Controls for Submissions

Published on 16/11/2025

Clinical Study Report TFLs: Designing, Programming, and Verifying Outputs That Regulators Trust

From Protocol to Pages: What Belongs in CSR TFLs and Why It Matters

Tables, figures, and listings (TFLs) are the visible record of a study’s results in the Clinical Study Report (CSR). They transform protocol objectives and Statistical Analysis Plan (SAP) rules into paginated, reproducible evidence. Global assessors—the U.S. FDA, the EMA, Japan’s PMDA, Australia’s TGA, and the public-health lens of the

rel="noopener">WHO—expect TFLs to be internally consistent, traceable to analysis datasets, and aligned with the scientific principles of the ICH (e.g., the CSR structure tradition rooted in ICH E3).

Purpose of each T/F/L type. Tables carry precise numbers (counts, estimates, confidence intervals) with footnotes and denominators; figures communicate patterns (e.g., Kaplan–Meier curves, forest plots, lab shift heatmaps); listings provide record-level transparency (patient narratives, protocol deviations, serious adverse events). Together, they must tell one coherent story: who was studied, what happened, how outcomes were analyzed, and how robust findings are.

Core sections most programs include.

Subject disposition and analysis sets (screened, randomized, treated; ITT/SAF/PP flags and reasons for exclusion).
Demographics and baseline characteristics (overall and by arm; region, disease severity, key prognostics; consistency with stratification).
Treatment exposure (dose intensity, duration, compliance, interruptions).
Efficacy endpoints (primary, key secondary) with estimates, CIs, p-values, and multiplicity indications per SAP.
Safety overview (TEAEs, SAEs, deaths, AEs leading to discontinuation), EAIR where appropriate.
Labs, vitals, ECGs (shift tables, grade changes, outlier flags; reference to CTCAE if used).
Concomitant medication summaries and protocol deviations listings with severity/impact categorization.
Optional domains: PK/PD summaries, immunogenicity, device performance, and patient-reported outcomes scoring.

Estimands drive presentation. If the primary estimand is treatment policy, tables should reflect outcomes regardless of rescue; if while-on-treatment, truncation rules and windows must be explicit. For survival estimands, figures should emphasize events and follow-up time; if non-proportional hazards are anticipated, include RMST/milestone displays alongside Cox results.

Submission posture. TFLs live alongside the data standards package (SDTM/ADaM/define.xml) and programming specifications. The line of sight from CSR text → TFLs → ADaM → SDTM/source is as important as the numbers. Inspectors will attempt to regenerate key figures and tables using the analysis datasets; they must match within documented precision rules.

Blueprint Before Build: Mock Shells, Style Guides, and Traceability Rules

Mock shells are contracts, not sketches. Each shell must define title, population (ITT/SAF/PP), denominator rules, row/column structure, sorting, precision, handling of zeros/NA, footnote text, abbreviations, and statistical methods (e.g., ANCOVA with baseline as covariate; stratified Cox with specified strata). Link every shell to a unique identifier and the SAP section that authorizes it.

Precision and rounding. Adopt consistent, pre-declared rules (e.g., means to 1 decimal if SD <10, otherwise 2; proportions to 1 decimal; p-values to 3 decimals with “<0.001” floor; risks/HRs with 2–3 decimals). All derived values should be rounded only for presentation; internal computations use full precision. State significant-figure policy for PK and lab measures.

Denominators and analysis sets. The shell must show the analysis population for each display: ITT for efficacy, SAF for safety, PP for supportive. Where denominators vary by visit (e.g., missed windows), show n/N (%) with N explicit per time point. For responder endpoints, define the responder rule and how missing/intercurrent events contribute (e.g., non-responder imputation under composite estimand).

Controlled terminology and coding versions. Display MedDRA version for AE coding and WHO-DD for concomitant meds in table footnotes; include CTCAE version for grading, if used. Ensure the versions match those in define.xml and protocol/SAP. Mismatched versions are a common inspection finding.

Style guide and reuse. A study or program style guide should standardize typography, indentation, column spacing, thousand separators, missing-value glyphs (e.g., “—”), hyphenation, and pagination behaviors (repeat headers, widow/orphan rules). Provide a component library (shell snippets, footnote library, standard abbreviations) to maximize reuse and reduce errors across studies.

Traceability mapping. Include a mapping for each shell: ADaM dataset(s) and key variables used (e.g., ADSL for populations, ADLB for lab shifts, ADTTE for time-to-event). For complex derivations, attach a derivation block (pseudo-code) and reference program modules. The mapping allows a reviewer to move seamlessly from a cell value back to the precise analysis variable and derivation logic.

Figures that inform. Standard figure shells include KM curves with risk tables, forest plots for subgroups (with interaction p-values), spaghetti plots for longitudinal outcomes, waterfall plots for tumor burden, and lab shift heatmaps. Define axis scales, censoring marks, confidence-band methods, and color accessibility. State whether arms are shown for blinded CSR drafts; final CSRs typically include arm labels after unblinding.

Listings for transparency. Pre-define inclusion criteria for subject listings (e.g., all SAEs with onset relative to first dose, causality, outcome, MedDRA SOC/PT; all deaths; all discontinuation reasons; all major protocol deviations with impact). Protect privacy by masking direct identifiers and following minimum-necessary principles consistent with data-protection expectations in the U.S./EU/UK and the public-health guidance of the WHO.

From Datasets to Deliverables: Programming, Validation, and Documented Controls

Inputs and lineage. TFLs must be generated from analysis datasets (ADaM), not directly from SDTM, to preserve derivation consistency. Maintain lineage manifests that show source SDTM domains, transformation steps, and the ADaM variables feeding each TFL. Ensure that define.xml describes variables, controlled terms, and derivations that match the code and shells.

Automation that respects control. Use parameterized programs and macro libraries for repeatable structures (subject disposition, AE summaries, lab shifts). Build a table engine that enforces the style guide, pagination, and footnote logic uniformly. Guardrails matter—automate with validation, not instead of it.

Double programming and peer review. For pivotal outputs (primary efficacy table, KM curve, top-level safety table), perform independent double programming by a second statistician/programmer using separate code. Compare at the dataset level and at the presentation level; mismatches must be reconciled with documented root cause and resolution.

Quality checks that catch real issues.

Completeness: expected vs produced TFLs; gaps flagged with rationale.
Consistency: cross-table totals (e.g., population counts) and denominators; figures vs tables agreement (KM median times vs Table median estimates).
Precision: rounding rules applied consistently; p-value format; CI brackets and order.
Sorting/grouping: SOC/PT order (MedDRA alphabetical vs frequency), visit order, and stratification order.
Units: SI vs conventional; conversion factors documented; mixed-unit hazards eliminated.
Footnotes: required abbreviations, coding versions, imputations, multiplicity statements; no orphan footnote symbols.

Reproducibility and versioning. Lock program versions, package/library versions, and random seeds (for simulation-based displays) in a controlled repository. Capture a point-in-time configuration snapshot (ADaM datasets, shells, code, style guide, macro versions) at each data cut and at CSR finalization. Archive artifacts in the TMF to facilitate regulator re-runs at the FDA, EMA, PMDA, and TGA.

Blinding hygiene in production. If CSR drafts are produced before unblinding, generate arm-agnostic TFLs (e.g., Group A/B) and quarantine arm-labeled outputs to a restricted folder accessible only to unblinded roles. Keep access logs and approvals. After unblinding, regenerate only the labels; do not re-compute numbers unless a planned lock/refresh is approved.

Output formats and pagination. CSRs typically require RTF/PDF with consistent pagination, repeating headers, and book-ready styles. Exports for health-technology assessments may need Excel/CSV companions. Ensure that page numbers, section anchors (e.g., 14.x series), and table/figure captions match the CSR and the table of contents. Avoid line wrapping that breaks n/N (%) columns or footnote references.

Special domains—common pitfalls.

Adverse events: TEAE definition (treatment-emergent window) must be coded into ADaM; exposure-adjusted rates (EAIR) need a consistent denominator (person-time rules).
Lab shifts: grade/threshold tables depend on reference ranges and CTCAE; verify effective-dated ranges and site-specific differences.
Survival endpoints: event/censor logic and cut-off dates must match the SAP; medians from KM figures should equal table medians (within rounding).
PROs: scoring algorithms (minimum item completion, imputation for items) need to be documented and versioned; summarize at both instrument and domain levels.

Change control and auditability. Any post-lock change to shells or programs requires a controlled change record with impact assessment and approvals from statistics, QA, and clinical leads. Maintain an audit trail of who ran which program when, with dataset checksums, to reconstruct the exact state of outputs in case of questions.

Inspection-Grade Confidence: Evidence Bundle, Metrics, Pitfalls, and a Practical Checklist

What reviewers ask for first. Prepare a “rapid-pull” index that surfaces within minutes:

Shell library with IDs, SAP cross-references, and style guide.
Traceability maps linking each TFL → ADaM variables → SDTM domains (with define.xml alignment).
Validation dossiers (QC results, double-programming comparisons, discrepancy logs).
Configuration snapshots at data cuts/lock: code versions, macro libraries, dataset checksums.
Versioned dictionaries (MedDRA/WHO-DD/CTCAE) with proof of consistency across TFL footnotes, define.xml, and SAP.
Sample reproducibility packs (data subset, code, log) that re-create a pivotal table and figure identically.

Quality indicators worth tracking.

Cross-table consistency rate: % of population counts/denominators matching across TFLs (target: 100%).
Footnote integrity: % of TFLs with required version/abbreviation/derivation notes (target: 100%).
Rounding compliance: audit results on precision rules (target: 100% within policy, 0 critical deviations).
Double-program match rate for pivotal outputs (target: 100% within tolerance).
Reproducibility speed: time to regenerate a pivotal TFL from archived snapshot (goal: minutes, not hours).
Listing completeness: % of required listing subjects/events present vs TMF trackers (target: 100%).

Common failure modes—and durable fixes.

Denominator drift across tables (e.g., varying N without disclosure). → Enforce shell rules; print N per column/time point; add cross-table checks.
Inconsistent coding versions (MedDRA/WHO-DD) between TFLs and define.xml. → Centralize dictionary metadata; auto-inject version footnotes.
Rounding artifacts (rows not summing due to rounded percentages). → Document rounding policy; allow sums to differ by ≤0.1–0.2%; add note.
Unclear imputation or estimand handling. → State strategies in footnotes; link to SAP section; display both treatment-policy and hypothetical sensitivity if central to interpretation.
Figure–table mismatch (e.g., KM medians). → Automate reconciliation checks; fail build on discrepancy.
Over-styled visuals that reduce readability. → Use accessible palettes, clear gridlines, and consistent scales; avoid 3D and dual y-axes without cause.
Late changes after lock without governance. → Enforce change-control; include impact summaries in the CSR changes from SAP section.

One-page checklist (study-ready TFLs).

All mock shells approved with SAP links; style guide applied programmatically.
Traceability documented from TFL → ADaM → SDTM/source; define.xml in sync with variable/derivation usage.
Population flags and denominators defined per shell; responder/handling of intercurrent events/missingness declared.
Controlled terminology versions (MedDRA/WHO-DD/CTCAE) displayed in footnotes and consistent across artifacts.
Programs parameterized; macro library validated; pivotal outputs double-programmed and matched.
Pagination, headers, captions, and numbering aligned to CSR ToC; exports available in required formats (RTF/PDF; CSV as needed).
QC checks (completeness, consistency, precision, units) passed; discrepancy log resolved and archived.
Blinding preserved for drafts; unblinded labels applied post-lock with access logs; no unapproved recomputation.
Configuration snapshots archived at each cut/lock; reproducibility pack prepared for a sample pivotal TFL.
Regulatory links referenced (FDA/EMA/PMDA/TGA/ICH/WHO) and expectations reflected in conventions.

Bottom line. CSR TFLs are more than formatted numbers—they are a compliance artifact that encodes your SAP, standards, and quality system. When shells are explicit, mappings are transparent, programs are validated and reproducible, and outputs read consistently across the CSR, reviewers at the FDA, EMA, PMDA, and TGA can navigate quickly. Following the harmonized perspective of the ICH and the public-health mission of the WHO, these practices make your conclusions clearer and your submission stronger.