BlogLOINC

LOINC Codes in CSV: Resolving Free-Text Lab Values

Lab CSVs almost never arrive with LOINC codes — they arrive as free text like "Hgb" or "Glukose nüchtern". Here is how the cascade and the loinc validator turn that free text into a coded lab result you can trust.

The AdaptivMapr TeamHealthcare Integrations8 min read

LOINC — Logical Observation Identifiers Names and Codes — is the universal standard for identifying laboratory and clinical observations. A LOINC code such as 718-7 means “Hemoglobin [Mass/volume] in Blood” to any system that speaks it, regardless of what the ordering lab happened to call it on the requisition.

The problem is that lab CSVs almost never arrive with LOINC codes in them. They arrive with free text.

Why lab CSVs arrive as free text

A LOINC code is the lab’s output identity, but the CSV you receive is usually an export from a system that stores the test under a local mnemonic. So you get columns and values like:

test_name,result,unit
Hgb,13.8,g/dL
Glukose nüchtern,5.4,mmol/L
WBC,6.2,10*3/uL
Kreatinin,88,umol/L

Four tests, two languages, three different shorthand conventions, zero codes. Hgb, HGB, Haemoglobin, and Hämoglobin are the same analyte to a human and four different strings to a parser. Mapping this by exact string match is hopeless; mapping it with a giant regex is a maintenance sinkhole.

The cascade against free-text test names

AdaptivMapr resolves these the same way it resolves any other header problem: a five-layer cascade, cheapest first, that stops the moment a value resolves. For the column-level mapping (which column is the test name, which is the result, which is the unit), the layers fall in order:

  1. Heuristic (hints). The lab_result_catalog template carries multilingual hints (DE / FR / IT / EN / ES) for each field. test_name maps cleanly because test, analyt, and untersuchung are all registered hints — after normalisation strips case, accents, and punctuation. Adding hints is the single highest-leverage way to catch new lab vocabulary, and it is deterministic and free.
  2. Fuzzy. Catches typos and token-order drift — Glukose nüchtern versus a hinted nüchtern glukose — using a token-set ratio over the normalised strings. Still pure compute, still free.
  3. Semantic. When the wording is a genuine paraphrase rather than a typo, cached embeddings compare the header against each field’s label and hints. This is the layer that earns its keep on the long tail of phrasings nobody thought to add a hint for.
  4. LLM. Only fires on the genuinely ambiguous remainder, and is constrained to the template’s allowed column set so it cannot invent a field. Because the three free layers above resolve the overwhelming majority of columns, the metered layer is rarely consulted — that short-circuit is the biggest cost lever in the system.

The full mechanism, layer by layer, is documented in the docs.

Where the loinc validator fits

Resolving which column holds the LOINC code is mapping. Confirming that the value in that column is a well-formed LOINC code is validation — a separate job, and the one the loinc validator does.

LOINC codes have a specific, checkable structure: a numeric body and a check digit, written as NNNNN-N — for example 718-7 for hemoglobin or 2345-7 for glucose. Theloinc validator enforces that format on commit, so a column that is supposed to carry codes but actually carries a stray local mnemonic, a transposed digit, or an empty cell is rejected rather than written through as if it were valid. It checks shape; pairing a free-text analyte to its canonical code is a terminology-resolution step on top of that, which is exactly why the cascade and the validator are distinct stages.

Other clinical validators sit alongside it for the rest of a lab row — units, reference ranges, and the numeric result itself — and you can read how each behaves in the validator reference.

A worked row

For the hemoglobin row above, the pieces land like this:

{
  "test_name": "Hgb",
  "loinc": "718-7",
  "result": 13.8,
  "unit": "g/dL"
}

The test_name column resolved at the heuristic layer; the loinc value, once supplied, passed the NNNNN-N format check; the numeric result and its unit were validated against their own field rules. Because lab_result_catalog is a higher-risk template, the mapping response flags requires_hitl: true so your pipeline can route the proposal to a reviewer before committing — terminology mapping is exactly the kind of step that warrants a human in the loop.

The takeaway

Lab data is messy because it is human-entered, multilingual, and full of local shorthand. The job is not to pretend that away with a brittle lookup table, but to layer cheap deterministic matching under progressively smarter (and more expensive) fallbacks, and to validate the result against the LOINC format before anything is committed. Start from the lab_result_catalog template, read the loinc validator reference, and consult loinc.org as the authoritative source for the codes themselves.

LOINC Codes in CSV: Resolving Free-Text Lab Values — AdaptivMapr — AdaptivMapr