Technical Architecture

Record ingestion: raw provider data to longitudinal history

How raw provider records are deduplicated, FHIR-normalized, and merged into a patient's longitudinal clinical history.

accepted arjun@mera.health Updated Jun 15, 2026

Raw clinical records arrive from dozens of heterogeneous sources — EHR exports, patient-uploaded PDFs, lab direct connections, and pharmacy feeds. The ingestion system transforms that noise into a single, deduplicated, FHIR-normalized longitudinal history per patient. This document describes the topology, the lifecycle, and the key implementation decisions.

System topology

The ingestion stack has four logical layers: connector adapters at the edge, a normalization service in the middle, a deduplication engine, and the longitudinal store that downstream features read from.

Ingestion topology: four layers from raw connector to longitudinal store

Data flow: from connector event to stored record

Each connector emits an event onto a Cloudflare Queue topic when a new document or data chunk arrives. The normalizer worker consumes that event, validates it against the FHIR R4 schema, and enriches it with patient context before handing off to the dedup engine.

Event path from source system to the FHIR store

Record lifecycle

Every record moves through five discrete stages. The pipeline cursor below shows the steady-state path — failure at any stage routes to the dead-letter queue rather than propagating a partial write.

Record lifecycle stages

Partial writes are not retried automatically

If the normalizer succeeds but the dedup engine times out, the event lands in the dead-letter queue. Ops must reprocess DLQ entries manually after diagnosing the root cause. Do not bypass the DLQ by re-ingesting the original source event — you will create duplicates.

Deduplication strategy

The dedup engine uses a Durable Object per patient to serialize all matching decisions for that patient. It computes a composite fingerprint from (resourceType, code.system, code.code, effectiveDateTime ± 24h) for clinical observations, and (resourceType, identifier.system, identifier.value) for documents and encounters.

A new record is considered a duplicate if its fingerprint matches an existing stored record within the same patient scope. When a match is found, the engine compares lastUpdated timestamps and retains the more recent version, updating the stored record in place rather than appending a new row.

Tip

The 24-hour window on effectiveDateTime handles common EHR clock-skew. Tighten it to zero for lab results with millisecond precision — use the dedupWindow config key per connector.

FHIR normalization contract

Every record leaving the normalizer must conform to the following shape. Connectors that cannot produce a conformant record send the raw payload to the dead-letter queue with a structured error attached.

types/fhir-record.ts

export interface FhirRecord {
resourceType: string;           // e.g. "Observation", "MedicationRequest"
id:           string;           // UUID v4 assigned by normalizer
subject:      { reference: string };  // "Patient/<mera-patient-id>"
meta: {
  lastUpdated:  string;         // ISO 8601 UTC
  source:       string;         // connector id, e.g. "ehr:epic-mychart"
  versionId:    string;         // monotonically increasing integer
};
// FHIR R4 resource fields follow — connector-specific
[key: string]: unknown;
}

The meta.source field is the primary attribution key — it tells downstream consumers which connector produced the record and enables per-source replay without touching records from other connectors.

Provenance and audit

Every write to the FHIR store appends a row to record_provenance with (record_id, connector_id, event_id, ingested_at).
The dedup engine logs its decision (new | updated | duplicate) alongside the winning fingerprint to dedup_log.
DLQ entries include the full raw payload, the normalizer error, and the original queue message ID for correlation.

See the longitudinal record PRD for the product goals this architecture serves, and the 2026 records roadmap for the delivery timeline.