Coding, Case Narratives & Follow-Up: A Regulator-Ready System for Clear, Consistent Safety Cases (2025)

Published on 15/11/2025

Make Safety Cases Speak for Themselves—Disciplined Coding, Narratives, and Follow-Up

Why Coding, Narratives, and Follow-Up Decide Whether Your Safety File Holds Up

Every serious safety decision rests on three pillars: the medical coding that turns free text into analyzable signals, the narrative that explains the clinical story in plain, precise terms, and the follow-up that closes knowledge gaps quickly and ethically. When these are engineered as a small, disciplined system, expedited reporting becomes fast and defensible, aggregate reviews become reliable, and inspections become straightforward. When they are improvised, cases

fragment across systems, timelines slip, and reviewers cannot reconstruct what happened or why.

Global anchors that keep teams aligned. A proportionate, quality-by-design posture—tightest where it protects participants and the integrity of endpoints—fits with widely recognized principles articulated by the International Council for Harmonisation. Practical orientation on investigator responsibilities, trustworthy records, and subject protection is available from the U.S. Food and Drug Administration’s clinical trial resources and the European Medicines Agency’s pharmacovigilance pages. Ethical touchstones—respect, fairness, and comprehensibility—are emphasized across the World Health Organization’s research safety guidance. Multiregional programs should keep terminology coherent with orientation offered by Japan’s PMDA and Australia’s Therapeutic Goods Administration so that case handling remains consistent across regions.

ALCOA++ as the backbone. Coding choices, narrative text, and follow-up evidence must be attributable, legible, contemporaneous, original, accurate, complete, consistent, enduring, and available. Operationally, that means immutable timestamps; a single “record of record” for each artifact; version-controlled narratives; and a direct click-through from dashboard dates to the exact attachments (discharge summaries, lab trends, device logs). If a reviewer cannot retrieve the chain—awareness time → coded term → narrative → submission proof—within minutes, the case is not inspection-ready.

Seriousness, severity, expectedness, and causality—make the hand-offs tight. Coding and prose must reflect the same clinical facts: what happened, how serious, whether the team judged it related, and whether it matched the current reference safety information (RSI/label). Investigators give the site judgment; the sponsor ensures consistency and applies policy where uncertainty persists. When opinions differ, the conservative plausible classification should govern expedited routing while both views are archived verbatim.

Devices are similar—but not the same. Device cases demand a parallel vocabulary: adverse device effects, malfunctions that could lead to serious harm if repeated, and human-factors context (training, lighting, language, interface). The narrative must capture hardware, software, and use-error hypotheses, while coding must accurately distinguish effect vs. malfunction so recurrence risk is addressed—not just paperwork.

System, not heroics. High-performing programs use: (1) a metric dictionary for coding quality; (2) a narrative template that produces structured, readable prose; (3) a query catalog that turns common gaps into short, targeted requests; (4) dashboards with burn-down clocks; and (5) a monthly five-minute retrieval drill. These are small, reusable tools—not binders—that convert pressure into predictable performance.

Medical Coding That Creates Signal, Not Noise

Start at the source and protect clinical meaning. The reporter’s words anchor the Lowest Level Term (LLT); choose the LLT that preserves nuance (e.g., “near syncope” vs. “syncope”). Verify that the mapped Preferred Term (PT) reflects what clinicians meant. If the chart describes neurally mediated episodes with prodrome and rapid recovery, selecting “Syncope” instead of “Near syncope” may inflate severity in aggregate analyses and misdirect queries.

Use dictionary controls to make results reproducible. Lock the dictionary and version at awareness, store it with the case, and define re-coding rules for when the dictionary updates (e.g., re-code for DSUR tables without overwriting historical transmissions). Maintain a short synonym table for protocol-specific jargon so coders converge (“liver enzymes ↑” → “Alanine aminotransferase increased” / “Aspartate aminotransferase increased”).

Pick the right granularity. One clinical event → one primary PT. Split only when justified by distinct pathologies (e.g., “Sepsis” and “Acute respiratory failure” can coexist). Avoid double-counting symptom clusters that are intrinsically part of a diagnosis (“myocardial infarction” already implies “chest pain”). Where the diagnosis is uncertain, code the most specific sign/symptom supported by evidence; revise when data mature.

Seriousness and severity must be consistent with coding. If the coded PT is “Anaphylactic reaction,” seriousness should reflect life-threatening criteria or need for emergent intervention; if not, consider whether the PT choice overstated the clinical picture. Conversely, a “Severe” intensity does not make a case “serious” absent outcome criteria; keep the constructs distinct.

SMQs and groupings are powerful—use them deliberately. Standardized MedDRA Queries (SMQs) support signal detection and aggregate review (e.g., anaphylaxis, liver injury, torsade de pointes). Configure the safety database to assign SMQ membership at lock and refresh for aggregate outputs so case-level history is preserved while table-level views stay current. Resist over-broad SMQs for expedited decisions; they are for surveillance, not single-case triage.

Laboratory and device data need clear rules. For labs, code the clinical event, not the analytic fact, unless the protocol mandates capturing both (e.g., “Alanine aminotransferase increased” with a derived Hy’s law flag). For devices, separate the effect on the person (e.g., “Burn”) from the malfunction (e.g., “Device overheat”) so root-cause and recurrence risk are visible. Store model, lot/serial, and firmware/software versions as structured fields; narratives should reference them, not repeat them.

Quality controls that actually work. Add pre-lock rules: (1) narrative contains the exact PT string at least once; (2) seriousness criterion is compatible with PT; (3) expectedness mapping references RSI/label with version/date; (4) device cases include malfunction taxonomy when applicable. Route failures to a “red tile” queue for correction before submission.

Common coding pitfalls—and durable fixes.

Inflation by diagnosis. Fix with sign/symptom coding until objective criteria are met; revise when confirmed.
Vague LLTs that hide meaning. Fix with a short synonym list and a “preferred LLTs” quick guide for high-volume terms.
PT/narrative mismatches. Fix by generating narrative shells from coded fields and forcing a pre-lock consistency check.
Duplicate PTs for one event. Fix with a “one event, one primary PT” rule and additional qualifiers in the narrative.

Governance and metrics. Track first-pass coding acceptance, narrative-field consistency rate, PT change frequency after follow-up, SMQ capture rate for predefined risks, and the proportion of device cases with explicit malfunction taxonomy. Each number should click through to cases that illustrate the pattern so training becomes concrete.

Case Narratives That Are Readable, Auditable, and Clinically Persuasive

Write for a careful clinician under time pressure. Narratives should be skimmable and complete. Use short headings and a consistent order so readers find the facts in seconds. Avoid rhetoric; use plain language that respects the participant and reflects clinical thinking.

A proven narrative template.

Context & history: age/sex, condition under study, comorbidities, key baseline labs/vitals, relevant prior therapies.
Exposure timeline: product/device, dose/configuration, route/use, start/stop; for devices, model/firmware/software and most recent maintenance or update.
Onset & course: exact date/time, evolution, interventions, and response.
Objective evidence: labs with units/trends, imaging, ECG method/rate, device logs/photos, returned-unit ID if applicable.
Alternatives considered: disease progression, infection, drug–drug interactions, procedural factors; for devices, human-factors and environment.
Dechallenge/rechallenge: what changed after stop/removal; any re-exposure outcome.
Outcome: recovered/resolved, recovering, not recovered, sequelae, death (with cause if known).
Causality rationale: one sentence—“Onset X after Y; alternatives Z; dechallenge ☐; conclusion → relatedness at least possible/probable.”
Expectedness reference: RSI or label version/date used for decision.

Style and discipline. Use active voice and concrete verbs (“presented with,” “developed,” “was admitted”). Put numbers next to claims (ALT 865 U/L from 38 baseline; QTcF 520 ms by Fridericia at HR 82). Do not copy-paste entire charts; cite them (“see discharge summary dated…”). Never infer facts not in evidence. Keep emotion and advocacy out of the prose; clarity persuades better than adjectives.

Blinding and privacy by design. Narratives must not leak allocation. If unblinding is necessary to protect participants, state that it occurred per SOP and record who learned what and why—without disclosing codes in documents visible to blinded teams. Store only the minimum necessary personal data; mask dates where local privacy rules require it while preserving medical meaning (e.g., “Day +3 after Dose 2”).

Translation and readability controls. For multilingual portfolios, maintain a controlled glossary for medical terms and common phrases so translations are consistent and fast. Use short sentences and standard term order so machine-assisted translation performs well without distorting meaning. Require a pre-submission readability check: can a clinical reader understand the causal chain and expectedness judgment in under a minute?

Version control and “what changed and why.” When timelines force interim narratives, label them clearly, then append follow-ups with a two-line header: “Added [labs/imaging]; expectedness unchanged; causality refined from possible → probable.” Never overwrite; always append. This keeps the audit trail self-explanatory.

Device and diagnostic nuance. Add a short engineering capsule for suspected malfunctions: environment (fluids, EMI), user role/training, task sequence, alarms/messages, and whether recurrence could cause serious harm. Reference the IFU section relevant to the step where the error occurred. Keep conjecture separate from confirmed findings; the point is to communicate risk and next steps, not to litigate blame.

Common narrative pitfalls—and durable fixes.

Prose that contradicts coded fields. Fix by generating the shell from structured data and forcing a pre-lock check.
Timelines without times. Fix by mandating date and clock time for key events (onset, dose, admission, intervention).
Ambiguous expectedness. Fix by citing the exact RSI/label version and date at the point of decision.
Overlong quotes from records. Fix with concise paraphrase and citations to attachments.

Governance and metrics. Track readability pass rate, proportion of cases with explicit one-sentence causality rationale, share of expedited cases with interim → final addenda, and narrative-field consistency. Review a small sample weekly in a 20-minute “case rounds” huddle; turn recurrent defects into template tweaks, not just reminders.

Follow-Up That Closes Risk Quickly Without Burning Goodwill

Targeted queries beat fishing expeditions. Follow-up should seek decision-critical data: elements that change seriousness, causality, expectedness, or route. Build a query catalog mapped to event types—hepatic (ALT/AST/bili, INR, viral serologies, imaging), cardiac (ECG method/rate, troponin, electrolytes, QTc formula), hypersensitivity (tryptase timing, rechallenge), neurologic (onset clock time, imaging modality), device malfunction (model/firmware/software, environment, returned-unit logistics). Keep requests short; give examples of acceptable documents; set due dates; and explain why each item matters.

Clock discipline and buffers. The moment a valid case exists, awareness time is immutable and clocks start. Internal service levels must beat external timelines comfortably, anticipating weekends, holidays, and cross-border time zones. If expectedness is likely to be “unexpected,” pre-stage country routing and translations immediately; do not wait for perfect information before building the packet.

Interim transmissions and corrections. When data are incomplete but the clock demands action, transmit a compliant initial ICSR with an interim narrative, then follow with updates. Use a concise “what changed and why” header; avoid rewriting history. If later facts change the expedited status (e.g., diagnosis revision; RSI update), send a correction or nullification per national rules and file a short memo that explains the reclassification.

Reconciliation that prevents quiet drift. Schedule routine listings that compare subject ID, onset date, PT, seriousness, relatedness, outcome, and expectedness between the safety database and EDC/source. Close discrepancies with audit-trailed notes that explain why the record changed. If the study captures site causality in EDC and sponsor causality in safety, display both clearly and document which governs expedited routing (the most conservative plausible view).

Vendor and decentralized paths. Ensure eCOA/home-health partners can supply timestamped logs and that identity verification is robust. Courier or home-nursing records belong in the case when they inform onset or seriousness. Where engineering analysis drives device recurrence risk, enforce a 24-hour placeholder at intake and a short SLA so the decision arrives before a national deadline.

Dashboards that change behavior. Display: awareness-to-validity confirmation time; intake-to-entry; entry-to-medical review; narrative-field mismatch rate; expedited clock burn-down; proportion of interim narratives; translation cycle time; duplicate rate; reconciliation gap rate; and click-through to proof of submission. Red/amber states must open a dated action with an owner—contain, correct, communicate—and link to the artifact that proves closure.

KRIs and QTLs that force decisions. Monitor early warnings: spike in “unassessable” causality; missing expectedness version/date; high duplicate rate; late engineering analyses for malfunction cases; portal rejections near deadlines. Convert the most consequential into Quality Tolerance Limits (e.g., “≥5% expedited cases missing explicit expectedness reference/version in any rolling month”; “≥10% narrative-field inconsistency at lock”). Crossing a QTL triggers a formal review and a dated corrective plan.

30–60–90-day operating plan. Days 1–30: publish the narrative template, coding synonym list, and query catalog; lock dictionary/version controls; wire dashboards to artifacts; define signature blocks that capture the meaning of approval (medical accuracy, expectedness reference checked, ALCOA++ verified). Days 31–60: pilot on two countries; run weekend drills; measure awareness-to-entry time; tune the translation SLA and engineering response; begin monthly five-minute retrieval drills. Days 61–90: scale to all sites; institute weekly case rounds; set KRIs/QTLs; and close CAPA with design fixes (template/text changes, validation rules), not just retraining.

Ready-to-use checklist (paste into your safety plan or SOP).

Dictionary and version locked at awareness; synonym list active; “one event, one primary PT” rule enforced.
Narrative template used; one-sentence causality rationale present; expectedness reference and version/date cited.
Device cases include malfunction taxonomy, model/lot/firmware/software, and returned-unit logistics.
Pre-lock validation: narrative contains the PT string; seriousness compatible with PT; attachments cited; privacy/blinding controls respected.
Targeted follow-up queries issued with due dates and reasons; interim transmission used when clocks demand it; corrections/nullifications documented with memos.
Routine reconciliation between safety and EDC/source; dual causality (site vs. sponsor) displayed and rule for routing documented.
Dashboards wired to artifacts; red/amber states open dated actions; monthly five-minute retrieval drill passed.
KRIs and QTLs monitored; crossing a limit triggers a formal review and a corrective plan with owners and dates.

Bottom line. Clear coding, disciplined narratives, and purposeful follow-up convert anxious phone calls into cases that speak for themselves. Engineer the system once—templates, controls, dashboards, and drills—and you will protect participants, deliver clean aggregate outputs, meet expedited timelines, and be able to show why every decision made clinical and regulatory sense.