BlogHIPAA

HIPAA-Safe CSV Import: What Schema-Only Mode Actually Does

Schema-only mode sends only column headers and up to three short sample rows — never the full file. Here is precisely what leaves your system, why so little leaving you usually means no DPA, and how the compliance posture changes when you switch to full-data mode.

The AdaptivMapr TeamHealthcare IntegrationsJune 9, 20267 min read

The fastest way to fail a healthcare security review is to send protected health information somewhere it did not need to go. The cleanest way to pass one is to not send it at all. AdaptivMapr’s default mode — schema-only — is built around exactly that idea: to propose a column mapping, the engine needs to understand the shape of your data, not its contents.

This post is the precise version of that claim. No marketing rounding — here is exactly what leaves your system, what does not, and what changes when you opt into full-data mode.

What actually leaves your system in schema-only mode

Two things, and nothing else:

The column headers. Every header in the file — because that is what the cascade matches against the template.
Up to three sample rows, each value clamped to 80 characters. A few short samples help the engine disambiguate a column whose header is uninformative (an id that is clearly a UUID, a column of two-letter country codes). Anything past the third row, or past 80 characters in a cell, is dropped before the request leaves the edge.

That clamp is not advisory. It is enforced at the HTTP boundary in every endpoint that accepts sample rows, through a single shared chokepoint in the parser, so schema-only and full-data code paths cannot diverge on what they are allowed to send. The full upload — every other row, every full-length value — never leaves your infrastructure in this mode. The engine itself is stateless: there is no application database, and what little it does hold lives in-process with a 24-hour time to live.

Why three rows of ≤80 characters changes the compliance math

HIPAA’s minimum-necessary principle asks you to disclose the least PHI required for the task. The task here is mapping a column called Geb_Datum to a birth-date field. That task is fully served by the header and a couple of clamped samples; it does not require the other 50,000 rows. By construction, schema-only mode discloses close to the minimum a mapping task can operate on.

Because the exposure is so constrained, schema-only mode is a data-minimization mode: only headers and a few clamped samples ever leave you. You can wire it into CI and run it against every partner file you receive, and in most cases you will not need a DPA for the privilege — there is simply not enough leaving your system to warrant one (we provide one on request when the data calls for it). Every map draws a small flat fee from your prepaid token wallet — there is no free tier — but schema-only never touches the metered AI layer, so it stays the cheapest way to work, and it is the mode we expect most integration work to live in.

A practical caution that belongs in every engineering README: three sample rows of free-text can still contain real identifiers if your source data does. Schema-only mode minimises exposure by volume and length; it does not redact. If your sample rows would contain PHI, send synthetic or de-identified samples — or send headers only.

When you need the rows: full-data mode

Some jobs genuinely require the data, not just its shape — validating every value against a code system, normalising a whole column, or committing a transformed resource. That is full-data mode, and it is gated behind an active PHI entitlement rather than being the default.

The compliance posture is deliberately different here:

The cascade still runs in-process, exactly as it does for schema-only. The earlier layers — statistics, heuristic, fuzzy, and semantic — never call out to a third party.
Only the layer-5 LLM step routes externally, and when it does it goes through a PHI-aware, OpenAI-compatible gateway. The request carries X-PHI: true and an X-Region header, so the gateway forces a PHI-eligible, in-region model rather than whatever default a generic LLM endpoint would pick.
A BAA is available for the full-data path. That is where the Business Associate Agreement and the jurisdiction guarantees are enforced.

We say HIPAA-aware, and we say a BAA is available. We do not say “HIPAA certified” — there is no such certification to hold, and any vendor claiming it is telling you something untrue. (Our SOC 2 work is in progress; we will say so plainly when it lands, and not before.)

How to choose, in one paragraph

If you are mapping headers — figuring out which partner column is the birth date, whether a file matches a template, what the diff against your canonical schema looks like — stay in schema-only. It sends almost nothing and usually needs no paperwork. Reach for full-data mode only when the operation genuinely needs the values, and accept the entitlement, the in-region routing, and the BAA that come with it. Most teams spend most of their time in schema-only mode, which is the point.

See the headers-only request shape and the full-data opt-in in the docs, watch a real header resolve in the patient demographics walkthrough, or read how the cascade keeps the metered LLM layer from firing on most columns in the LOINC resolution post.

HIPAA-Safe CSV Import: What Schema-Only Mode Actually Does

What actually leaves your system in schema-only mode

Why three rows of ≤80 characters changes the compliance math

When you need the rows: full-data mode

How to choose, in one paragraph

More from the blog

CSV to FHIR: Mapping Patient Demographics Step by Step

LOINC Codes in CSV: Resolving Free-Text Lab Values