Semantic Normalization In EHRs: Finding the right needle in the haystack

by Jim Shalaby

As EHR data has tremendously grown over the past nine years (incentivized by Meaningful Use under the HITECH Act of 2009), we’re faced with the growing challenge of retrieving accurate and consistent data, whether it be for quality reporting, Clinical Decision Support (CDS) or general analytics. Why is getting accurate and consistent data such a problem? Some would say it's due to lack of adoption of terminology standards or inconsistent use of clinical data models such as C-CDA. Although these are certainly contributors, they are only part of the larger problem.

Semantic Normalization, the process of representing information in a consistent and transparent way that enables querying the EHR in different ways and getting back consistent answers, is the very large missing piece. The current interoperability standards define constructs such as CCDs or FHIR resources that define standardized ways of representing and messaging clinical data between systems by defining “bindings” between terminology standards and clinical models. These are great stepping stones, but they aren’t flexible and often miss the true “meaning” of data that is being exchanged. This can be problematic when you’re using that same data for analytics and clinical decision support.

As an example, “find all patients with gentamicin allergies or intolerances” may seem like a simple query, but that information may be represented in many different ways in a chart outside of allergies and intolerances. Since current interoperability standards define a model for allergies and binds that model to RxNorm ingredients, we could easily query the EHR for allergies with the RxNorm code for the ingredient “gentamicin” (or a value set of all RxNorm codes for gentamicin). Here’s where we’re faced with our first “needle in the haystack” challenge: What if an “intolerance” or “adverse reaction” is represented in the patient’s problem list as “ototoxicity due to aminoglycosides exposure?” Since gentamicin is a member of the aminoglycosides family, it is equally likely to appear in the problem list as it is in the allergies or intolerances list. How do we know where to look in our clinical model to find the same meaning? Even if an EMR adhered to all current standards for this example we would still be faced with this challenge of having to know where to look for the right answer based on what we meant in our query. However, were we to ask the same question using a semantically normalized representation of this clinical fact we would greatly improve consistency and accuracy. Here’s the same query in a semantically normalized representation: “find all patients where substance involved is gentamicin as a causative agent for a condition.” Condition in a semantic model can have direct relationships to broad areas such as allergies, problems, observations and findings. The key here is having the right semantic model implemented in order to link/relate all of these concepts in a way that preserves meaning.

There are several national efforts that are trying to meet this challenge. The HL7 Clinical Information Modeling Initiative (CIMI) is focused on representing reusable modular clinical information models that can provide isosemantic representation of a collection of discrete clinical facts such as blood pressure, weight, laboratory observations. CIMI works collaboratively with other initiatives such as LEGO, CLIM and DCM.

Another effort that we have been closely involved with, is the AMA’s Integrated Health Model Initiative (IHMI). IHMI approaches this challenge by enabling consistent representation of clinical function, state and goal with respect to disease management. The notions of function, state and goal are pillars that broadly support value based care.  Ideally, clinical interventions should result in measurables clinical states that can be used to assess improvement in function towards patient specific goals. Semantic normalization is a key enabler to doing just that. IHMI collaborates with CIMI and FHIR towards improving/enabling CDS and analytics that support this aspect of disease management and value based care delivery.

Although it may sometimes seem that there is a lot of duplication of efforts between multiple initiatives, they are in fact addressing different aspects of the semantic normalization challenge. The HL7 CIMI efforts are focused on providing standards supporting isosemantic representations of clinical data to better enable analytics and CDS across a breadth of use cases. FHIR provides more consistent, richer interoperable messaging standards that can deliver better semantic “payloads” such as CIMI and IHMI.  IHMI is focused on supporting specifically the representation of function, state and goals in a semantically normalized way that can enable CDS and measurement of progress towards achieving patient specific goals.

With all of these efforts in flight, I’m confident we’re on the way to consistently finding the right needle in the haystack. In the meantime, there’s still a lot of ambiguous data out there to contend with. I’d love to hear your thoughts/comments on how you’re managing semantic normalization today, and what you think of these efforts. Take care for now!