Pre-Screening, EHR Mining & Referral Networks: Compliant Pipelines that Turn Real-World Data into Eligible Participants

Published on 16/11/2025

Designing a Privacy-Safe, High-Throughput Funnel from Health Records and Referrals to Informed Consent

Post updated on 06/05/2026

Strategy and governance: build a compliant foundation before touching a single record

High-performing enrollment programs treat pre-screening as an engineered system, not an afterthought. The core idea is simple: locate likely-eligible patients upstream using structured and unstructured data, route those patients to trusted clinicians and navigators, and convert interest into informed consent with minimal friction. To do this well across the USA, UK, and EU, sponsors and sites must anchor the pipeline in privacy law,

ethics oversight, and operational discipline. Start by defining a clear boundary between analytics and outreach. Any processing of protected health information PHI for recruitment must sit behind IRB/ethics approvals and written agreements with data custodians. For U.S. entities, that means HIPAA-compliant EHR mining governed by limited data sets or waivers, as appropriate, and documented via a data use agreement DUA with the covered entity. If a service provider handles PHI, execute a business associate agreement BAA that scopes permissible uses, safeguards, and breach reporting.

Ethics oversight is non-negotiable. Draft an IRB-approved prescreening plan that explains the data fields to be queried, the inclusion/exclusion logic, who will see raw records, and how patients will be contacted (letter, portal, clinician call center). Explicitly describe your consent management system strategy—opt-in collection, storage of contact preferences, and proof that marketing prohibitions are respected. When texting or calling in the U.S., incorporate TCPA SMS texting rules and do-not-call lists so outreach remains lawful and reputationally safe. In the EU/UK, mirror this rigor with GDPR-aligned legal bases, minimization, and transparency notices.

Architect the data flow before you write code. Federate queries to hospital EHRs, health information exchanges, and registries through FHIR interoperability endpoints where available; for legacy feeds, rely on HL7v2 or flat-file exports with documented lineage. Unify patient identities across sources using an EMPI patient matching service tuned for healthcare data peculiarities (nicknames, transposed digits, multiple addresses). Where identities must be linked across institutions without sharing direct identifiers, use privacy-preserving record linkage PPRL (cryptographic hashing, bloom filters) under IRB oversight so you can find the same person without re-identifying them unnecessarily.

From the first rehearsal run, insist on audit trail integrity. Every query, export, match, and outreach attempt should be timestamped, user-attributed, and immutable. Store query definitions and code in version control; store job logs in your validated DMS so you can recreate results. Publish a data catalog that lists each field used in inclusion/exclusion criteria automation and where that field lives (EHR problem list, labs, meds, encounters). This transparency turns vendor claims into evidence and gives QA something to verify besides good intentions.

Finally, draw a bright line between identification and persuasion. Clinical teams or site-affiliated outreach staff should make first contact whenever feasible; third-party recruiters can support but must follow the identical referral management workflow and scripts. Outreach language should explain why a person might be eligible, how their information was obtained, and how to opt out. This respect-first posture protects trust and reduces complaints that can derail otherwise solid pipelines.

Data operations: from messy records to precise eligibility signals at scale

Turning raw data into reliable pre-screen flags requires disciplined engineering. Begin with computable criteria. Transform protocol logic into a machine-readable specification and feed it to an eligibility criteria NLP parser that extracts candidates for diagnoses, medications, lab thresholds, procedures, and temporal windows. Validate the parse with clinicians—NLP suggests, humans decide—and formalize the result as a rules library used by analytics and sites alike. Where NLP is brittle (e.g., free-text imaging impressions), fall back to structured surrogates or curated lookups.

Next, unify patient identity so each person appears once. Use deterministic/probabilistic matching across names, dates of birth, addresses, MRNs, and phone numbers inside your EMPI patient matching stack, and tune thresholds by data source. At the boundaries between institutions, apply privacy-preserving record linkage PPRL; this permits de-duplication across partners without exchanging clear identifiers. With identity resolved, run de-duplication and lead scoring so navigators see one prioritized record per person. Scores should weigh study fit (number of criteria met), recency (last encounter within window), stability (medication adherence, clinic attendance), and logistics (distance to site), while never disadvantaging protected classes.

Now apply the computable rules. Build inclusion/exclusion criteria automation with transparent, testable logic: “adult age” from birthdate; “HbA1c ≥ 7.5% within 90 days” from labs; “no recent MI” from encounters and problem lists; “washout present” from meds. Where judgment is required (e.g., “no clinically significant ECG abnormality”), flag rather than fail and route to clinician review. Pin every rule to its data dictionary entry and author a unit test; criteria automation should be as testable as any safety-critical code.

Guard outreach with consent and channel sanity. Route candidates to your consent management system to check opt-in status, channel preferences, and language. Enforce opt-in recruitment compliance at the API layer so no downstream tool can bypass it. When texting, auto-screen messages and schedules against TCPA SMS texting rules (quiet hours, explicit consent, stop words). When emailing, honor unsubscribes instantly and log proof. Every action should flow into the referral management workflow queue with SLA timers so no lead languishes.

Finally, make the data work for the coordinators. Navigator consoles should show eligibility drivers (“meets HbA1c and BMI, age within range; ambiguous ECG”), list treating clinicians, and surface social context (language preference, transport barrier from intake). Provide one-click generation of IRB-approved outreach scripts in the correct language, and integrate ride codes or tele-visit scheduling where allowed. The pipeline succeeds when staff spend time talking to people, not wrestling with spreadsheets.

Referral networks: clinician trust, structured handoffs, and fair compensation

Few things convert like a clinician’s recommendation. Build clinical referral networks that let treating physicians identify candidates during routine care and hand off to research staff smoothly. Start with a compact toolkit: EMR-embedded alerts aligned to computable criteria; a one-page research summary; and a button that pushes a referral into the site’s referral management workflow. For health systems without tight EHR integration, stand up a secure web form or fax cover sheet with standardized fields (diagnosis, last visit, best time to call) and a signed authorization where applicable. Keep the data minimal and the process respectful of clinic time.

Relationships matter. Host short, CME-eligible lunch-and-learns for community physicians that cover protocol highlights, inclusion/exclusion nuances, benefits and risks communication, and what happens after a referral. Provide feedback loops—monthly summaries of dispositions (“10 referred, 6 pre-screened, 3 consented”)—so clinicians see value in participating. This reciprocity strengthens trust and surfaces eligibility misunderstandings early, reducing noise in your pre-screen funnel.

Resource the network ethically. When clinics or community partners invest time to educate and refer, compensate them via fair market value FMV payments for services actually rendered (education sessions, staff time to complete referral forms), never per-randomization or per-enrollment bounties. Document FMV logic, obtain IRB awareness where appropriate, and file everything in the TMF. If any party handles PHI beyond directory information, a business associate agreement BAA is required; if limited data sets are shared for prescreening, document scope in a data use agreement DUA. These artifacts tame legal risk and demonstrate control.

Train and support referrers. Provide quick videos or job aids that show how to identify candidates with the EMR, how to explain research neutrally, and how to use the consent preference capture. Route all questions to a single hotline staffed by coordinators who can schedule prescreens promptly. Where travel or time is a barrier, empower nurse navigators to offer ride vouchers immediately, and—when protocol allows—arrange tele-prescreens so the first in-person visit is high-yield.

Measure and improve continuously. Track outreach-to-prescreen conversion by referral source, time to first contact, and prescreen pass rate. Segment by language, age band, and insurance to see where workflows underperform. When conversion lags for a specific specialty, host a targeted refresher; when pass rates are low from a partner clinic, examine whether criteria are misinterpreted or whether your alerts are too loose. Ground every tweak in data and report back to partners so the network learns with you.

Keep reputational risk front of mind. Scripts should avoid overstating benefits; materials should be IRB-approved prescreening versions; and every outreach must respect opt-out preferences captured in the consent management system. Respect for patients and clinicians is not just ethical—it is operationally efficient, because trust shortens cycles.

Metrics, inspection posture, and the implementation checklist with authoritative anchors

What gets measured improves. Build a dashboard that shows end-to-end funnel health: source volumes (EHR, clinician, community), deduped leads after EMPI patient matching, time-to-first-contact, outreach-to-prescreen conversion, prescreen pass rate, eligible-to-consent conversion, and early-visit completion. Overlay quality signals: percent of leads blocked by opt-in recruitment compliance, message sends that violated TCPA SMS texting rules (should be zero), rule-engine defects in inclusion/exclusion criteria automation, and latency in the referral management workflow. Trendlists of top criteria failing (e.g., renal thresholds, concomitant meds) guide protocol clarifications and site coaching.

Inspection readiness depends on narrative plus evidence. Keep a binder (physical or virtual) with SOPs for HIPAA-compliant EHR mining, queue management, texting, email, and navigator scripts; validation summaries for PPRL, deterministic/probabilistic matching, and the eligibility criteria NLP parser; copies of the IRB-approved prescreening plan and outreach artifacts; signed data use agreement DUA and business associate agreement BAA; and audit trail integrity exports that prove who did what, when. File payment schedules that demonstrate fair market value FMV payments for partners. This record shows a controlled process from query to call to consent.

Anchor your approach to primary bodies—one authoritative link per domain to keep citations clean while aligning teams globally: U.S. expectations for recruitment and records at the Food & Drug Administration (FDA); EU/UK guidance and ethics framing via the European Medicines Agency (EMA); harmonized GCP and data standards at the International Council for Harmonisation (ICH); global equity and ethics resources from the World Health Organization (WHO); regional clinical research expectations through Japan’s PMDA; and Australian requirements at the TGA. Cite sparingly in documents, but build these anchors into SOPs and training so multinational teams share the same reference points.

Implementation checklist (mapped to high-value controls and keywords)

Write and approve the IRB-approved prescreening plan; execute data use agreement DUA/business associate agreement BAA as needed.
Stand up FHIR interoperability feeds; configure EMPI patient matching with deterministic/probabilistic matching and, between institutions, privacy-preserving record linkage PPRL.
Operationalize inclusion/exclusion criteria automation from an eligibility criteria NLP parser validated by clinicians.
Enforce opt-in recruitment compliance in the consent management system; guard outreach with TCPA SMS texting rules.
Run de-duplication and lead scoring; display eligibility drivers to navigators; route via the referral management workflow with SLAs.
Compensate partners using fair market value FMV payments tied to time and deliverables, not enrollments.
Maintain audit trail integrity for queries, exports, outreach, and dispositions; store artifacts in the TMF.
Monitor funnel metrics (EHR hits, outreach-to-prescreen conversion, pass rates); run CAPA on recurring defects.

When privacy, interoperability, and clinical trust come together, pre-screening stops being a manual grind and becomes a steady, ethical flow of well-matched candidates. That is how compliant data work and strong referral networks translate into faster, more representative enrollment without compromising the standards that regulators—and patients—expect.