Published on 15/11/2025
Moving and Connecting Clinical Data Without Losing Trust
Strategy & Governance: Deciding What to Move, When to Move It, and Who Owns It
Data migration (one-time or episodic movement of historical/live records) and data integration (ongoing exchanges between systems) are unavoidable in modern trials—protocol amendments, mid-study system upgrades, decentralized components, or multi-vendor ecosystems all create movement. These activities must be risk-proportionate, traceable, and inspectable to protect participants and endpoint credibility. Anchor decisions to Critical-to-Quality (CtQ) factors: consent evidence, eligibility thresholds, primary endpoint timing/method fidelity, investigational product/device chain-of-custody, safety clocks, and
Regulatory footing. Agencies expect control frameworks recognizable to the International Council for Harmonisation (ICH) and compatible with U.S. FDA 21 CFR Part 11 and EU Annex 11 mindsets (intended-use validation, audit trails, e-signatures, role-based access). The European Medicines Agency (EMA), Japan’s PMDA, Australia’s TGA, and the WHO public-health lens focus on reproducibility: a reviewer must reconstruct the path from source to analysis without interviews.
System of record & lineage. For every data class, declare the system of record and the reconciliation keys (e.g., USUBJID + local timestamp + UTC offset + accession/UID + kit/logger ID). Document data flow diagrams and lineage (origin → verification → transformations → tabulation/analysis), including refresh cadence and ownership. This avoids conflicting “truths” when multiple systems capture similar facts (e.g., dosing in EDC vs IRT).
When to migrate vs integrate. Choose migration (bulk move + cutover) for system decommissioning or format standardization; choose integration (near-real-time feeds) for ongoing operational decisions (e.g., randomization, central read results, safety case updates). Some programs adopt a hybrid: initial migration plus incremental change data capture (CDC) until lock.
Privacy and blinding. Apply minimum-necessary principles consistent with HIPAA/GDPR/UK-GDPR. Segregate unblinded flows (pharmacy/IRT, unblinded statisticians) from blinded dashboards; audit access to key/kit maps and emergency unblinding records. For cross-border transfers, record lawful bases and Data Transfer Agreements in the vendor file and reference them in the Trial Master File (TMF).
Governance and decision rights. Publish RACI: data management (lineage and mapping), statistics/programming (analysis impacts), clinical/medical (blinding and context), safety/PV (SAE clocks), quality (validation posture), privacy/security (lawful transfer), vendor management (SLAs). Require change advisory review for CtQ-impacting migrations; record rationales and effective dates in a decision log.
Time discipline as a policy. Enforce ISO 8601 timestamps with local time and UTC offset end-to-end. Synchronize clocks (NTP), capture daylight-saving transitions, and store both “created” and “received” times for asynchronous sources (e.g., eCOA). Most migration/integration disputes are timestamp disputes—solve them by design.
Engineering the Pipeline: Specifications, Mappings, and Controls That Never Guess
Source-to-Target Mapping (STM). Create an STM for each flow: field definitions, units, controlled terminology/codelists, permitted nulls, primary keys, foreign keys, and business rules. Include unit conversions (with factor and precision), time-zone handling, and derivation order. Version the STM; link each rule to protocol/SAP/DMP paragraphs for explainability.
Identifiers and keys. Stability is non-negotiable. Use deterministic keys and record linkages: USUBJID, domain sequence IDs (e.g., --SEQ), accession IDs (labs), DICOM UIDs (imaging), kit/lot/logger IDs (IRT/IP). If source keys are weak, generate surrogate keys and persist a lookup table with checksums so a reviewer can walk the lineage forward and backward.
Transformations with provenance. Treat ETL code as GxP configuration: under version control, peer-reviewed, and validated for intended use. For every transformation, capture: algorithm version, inputs/outputs, and a reproducible hash/checksum of the input slice. Preserve original values (and units) alongside converted values where clinically meaningful; never overwrite in a way that destroys auditability.
Interface contracts & SLAs. Define interface specifications (file/API schema, encoding, frequency, cut-off times, retry strategy, error handling, idempotency, and expected latency). Build reject queues with alerts; do not silently coerce bad data. Publish operational SLAs (e.g., lab results within 24 h; imaging read events within 48 h) aligned to safety and endpoint windows.
Security & segregation. Enforce named accounts, least-privilege service principals, MFA for consoles, network allowlists, encryption in transit and at rest, and time-boxed elevated privileges for releases (“break glass” with justification and full session logging). Segregate blinded outputs by design—arm-agnostic objects for blinded roles; key/kit map access restricted with logs.
Data quality guardrails. Implement pre-load validation (schema, datatypes, mandatory fields), semantic checks (ranges, plausibility, cross-record consistency), and temporal checks (window compliance using local time + offset). For eCOA/wearables, capture “time-last-synced,” device/app versions, and latency bands; for imaging, log parameter-compliance flags and read queue age; for labs, enforce effective-dated reference ranges.
Standards and harmonization. Map to CDISC SDTM/ADaM consistently and freeze dictionary versions (MedDRA, WHODrug) with effective dates. Use controlled terminology from NCI where applicable. For analysis derivations, document computational algorithms in define.xml and the ADRG; keep SDTM faithful to sources and perform derivations in ADaM for transparency.
Configuration snapshots. At each UAT sign-off and production release, export human- and machine-readable snapshots: EDC form/field catalog, edit-check library, visit windows; IRT rules/unblinding scripts; lab reference ranges; imaging parameter templates; role matrices; and integration mappings. File them in the TMF with effective-from dates to reconstruct “state at the time.”
Testing, Dress Rehearsals, and Cutover: Proving It Before Patients Feel It
Validation approach (CSA mindset). Scale rigor to risk while meeting expectations recognizable to FDA/EMA/PMDA/TGA: requirements → risk assessment → design → testing → release → change control → archive. Concentrate depth around CtQs: consent/eligibility, endpoint timing/method, IP/device integrity, and safety clocks. Keep evidence legible and retrievable.
Test design. Use layered testing: (1) unit tests for transformation functions (e.g., unit conversions, window calculators), (2) integration tests with realistic multi-system flows, (3) golden datasets covering edge cases (partial dates, DST transitions, cross-time-zone visits, rare eligibility/uncommon units), and (4) negative tests for malformed files, missing keys, or blinding-sensitive fields in blinded outputs.
Dual-run and parity. For migrations or major changes, run dual pipelines (old vs new) and compare row counts, key checksums, and clinical parity (e.g., analysis-ready endpoints, allocation, kit accountability). Investigate and document true differences (e.g., bug fixes) vs defects. Maintain side-by-side outputs until acceptance is signed by data management, programming, statistics, medical, and quality.
Dry runs & mock cutovers. Rehearse end-to-end with production-like volumes and timings. Simulate failure modes (late lab files, IRT outage, imaging backlog, ETL reject spikes). Verify backout plan, disaster recovery steps, and the ability to regenerate prior point-in-time outputs using archived snapshots and hashes. Capture performance metrics (throughput, latency, memory/CPU headroom) and record them with acceptance.
Change control and communications. Classify change requests (cosmetic vs structural/CtQ impacting). For moderate/major changes, require impact assessment on CtQs/estimands, regression scope, and a no-surprises plan for sites (downtime windows, data-entry freezes, job aids). For blinded trials, ensure communications are arm-agnostic; route any unblinding details to restricted queues.
Cutover mechanics. Freeze source systems where necessary, execute final delta loads via CDC, reconcile counts/hashes, run acceptance checklists, and record lock-step timestamps with UTC offsets. Disable old pipelines and privileges, then monitor closely for two cycles. If cutover touches lock-sensitive periods, consider a soft lock with waiver governance to minimize risk.
Acceptance and archive. Capture sign-offs, parity results, defect logs, release notes, and the configuration snapshot IDs. Place certified copies (with provenance, local time + UTC offset, and checksums) in the TMF. These artifacts let inspectors evaluate your controls without vendor engineering.
Running Day Two: Monitoring, Metrics, Evidence—and Pitfalls to Avoid
Operational monitoring. Build dashboards for pipeline health (job success, latency, backlog), data quality (rejects by reason, plausibility/range failures, cross-system mismatches), privacy/blinding hygiene (PHI export attempts, unblinded access logs), and time discipline (share of records with correct local time + UTC offset; NTP sync status). Alert on SLAs (e.g., lab turnaround) and CtQ breaching trends (e.g., endpoint heaping on last day of window).
Key performance indicators (examples).
- Row-count and checksum parity post-migration (target: 100% match or explained deltas).
- Reject-queue aging (target: ≤24 h median; zero >72 h for CtQ domains).
- Latency to availability for CtQ feeds (e.g., SAE to safety database ≤24–48 h, imaging read to EDC ≤48 h).
- Unit/time metadata completeness (≥99% records with unit + local time + UTC offset).
- Blinding/privacy incidents (target: 0 unmitigated; same-day deactivation on role change).
- Configuration snapshot availability without vendor engineering (target: 100%).
Inspection-ready evidence architecture. Maintain a TMF rapid-pull index for migration/integration that contains: STMs; transformation code versions; validation protocols and results; dual-run parity reports; release notes; configuration snapshots; audit-trail exemplars (who/what/when/why with time zone); interface specs/SLAs; and reconciliation attestations (EDC↔IRT, EDC↔LIMS, EDC↔imaging, EDC↔safety). Reviewers from FDA, EMA, PMDA, and TGA should be able to verify integrity without interviews, consistent with ICH principles and the WHO public-health lens.
Common pitfalls—and durable fixes.
- Time ambiguity → mandate ISO 8601 with local time + UTC offset, NTP sync, and DST documentation in all flows.
- Dictionary/version drift → freeze versions with effective dates; stage upgrades with dual-run; archive both outputs.
- Silent coercions in ETL → surface to reject queues with human-readable reasons; never auto-correct CtQ fields.
- Blind leaks in logs/reports → arm-agnostic dashboards; segregated unblinded queues; access logs for key/kit maps.
- Unit confusion → lock units at source; carry original and converted values; document conversion factors in STM and ADRG.
- Vendor “black boxes” → encode exportable audit trails/config snapshots and SLA’d retrieval into Quality Agreements; rehearse retrieval quarterly.
- “Retrain-only” CAPA → pair with system gates (window calculators, validation rules, eligibility gates) and verify with KPI movement.
One-page checklist (study-ready migration & integration).
- System of record declared per domain; lineage diagrams current; reconciliation keys stable.
- Source-to-Target Mappings versioned; unit/time handling explicit; controlled terminology fixed.
- ETL code version-controlled; transformation provenance (algorithm versions, hashes) stored.
- Interface specs/SLAs signed; reject queues and alerting active; privacy/blinding segregation enforced.
- Validation protocol executed (unit, integration, golden, negative tests); dual-run parity passed; sign-offs filed.
- Cutover plan with backout steps rehearsed; production configuration snapshots captured with effective dates.
- Operational dashboards live; KPIs reviewed; QTL breach process defined; CAPA loop measured.
- TMF rapid-pull index points to STM, code versions, snapshots, parity reports, audit trails, and reconciliation attestations—inspectable across global agencies.
Bottom line. Migration and integration are not IT chores—they are clinical quality operations. When you design around CtQs, enforce time/units discipline, validate transformations with provenance, dual-run before cutover, and keep evidence retrievable, your pipelines will protect participants, preserve endpoints, and stand up to scrutiny at the FDA, EMA, PMDA, TGA, within the ICH community, and aligned with the WHO mission.