Dgraph in Healthcare Interfaces (HL7) - an Edge/List ordering problem

This whole conversation pends on:

What I want to do

Use Dgraph as a db used alongside a healthcare interface engine for HL7 message storage, manipulation, and retrieval.

What I did

Tried to map HL7 2.x to RDF

The HL7 Spec comes in a few different versions. The FHIR spec does have some discussion around RDF, but of course it is not the same RDF that Dgraph uses, but similar in nature.

More specfically though, the 2.x versions of HL7 are not going away anytime soon, and these are what I deal with on a day to day basis with interfaces in healthcare environment.

For anyone interested in details, here is a somewhat easy to read guide for the HL7 2.5.1 version

Here is an example HL7 message:

MSH|^~\&|EPIC|EPIC|||2006010112009|4206|ORU^R01|1200|D|2.3||
PID|1||||TEST^PATIENT^^^ ^||19821126|M|||3247 MAIN^^MADISON^WI^53711^USA^^^|||||||82239|736765432||||
ORC|RE|163076^EPC|656^EPC||P||^^^2006001011040^^NORMALS^^||200601011751|4206^TEST^PROVIDER^ORDERING^^^||4206^TEST^PROVIDER^ORDERING^^^|1^^^1^^^^^|(608)888-8888||
OBR|1|163076^EPC|656^EPC|IMG001^CHEST XRAY||20060101||||||||||4206^TEST^PROVIDER^ORDERING^^^|(608)888-8888||1002^XR02^JY^^^^|||||XR|P||^^^200601011040^^NORMALS^^||||^Sample reason for exam text.|4207^TEST^RADIOLOGIST^^^^|4208^TEST^RESIDENT^^^^|4209^TEST^RADTECH^^^^||2006001011040|||||||||
OBX|1|ST|&GDT|1|Sample narrative text line 1.|
OBX|2|ST|&GDT|1|Sample narrative text line 2.|
OBX|3|ST|&IMP|2|Sample impression text.|

I want to decode this to RDF and then be able to encode it back into the same HL7 message as needed.

Each line of the message is called a segment.
A segment starts with three capital letters to define its segment type.
Segments can be repeated, which generally lead to incrementing the first field of the segment.
The fields in the segment are then separated by pipes|.
Sub fields are separated by carrots ^ and it is possible for fields and subfields to repeat using the tilde ~ which can all be replaced with other characters as defined in the MSH segment character 4, 5, and 6.
Multiple messages can be combined into a single message and each new message begins with a new MSH segment.

Fields and subfields are denoted by referencing the segment (conditionally its repeated length) the field position and sub-field position. For instance:

  • The patients last name is PID.5.1 = TEST
  • The patient’s first name is PID.5.2 = PATIENT
  • An order number is in field ORC.2.1 = 163076

The tricky part is that segment order is important. So any kind of RDF edge to a list must retain the original order. And on top of that, the position of the repeating segments is also important. For instance this message could contain additional segments in order after the last OBR, OBX, OBX, OBR, OBX, etc. These segments can be grouped together by their parents, the OCR is the parent of all the OBR segments (ORC.2.1 == OBR.2.1), and the OBR is the parent of the OBX segments. There are tons of other segments and message types, this is just a somewhat simple example. Here we start to see the graph like nature of these messages. And then also multiple messages are related to the same patient, and the same visit number, and the same order, test, result, registration, medication, etc.

So trying to map this to Dgraph RDF somehow leads me to a few different paradigms, but none of these work very well given that it is difficult on How to store ordered data?

Additionally these messages should allow somewhat easy transformation in their graph form to then be encoded into a new transformed HL7 message. Usually these transformations involve moving data from one field to another in the same segment, but they can be more complex such as moving segments entirely or injecting new segments at specific places into the message. This makes any kind or ordering even harder where the ordering is based on a chain when the chain would have to be broken and rebuilt. And almost impossible if ordering was based on a scalar value for each segment meaning that any new segment injection would lead to renumbering the entire message block(s).

At this point, I am under the realization that Dgraph as fit as it might be to allow for a schema-less structure that could ingest any type of structured data, fails when ingesting loosely structured data where order of edges is important.


Conclusion at this time is that I will be unable to use Dgraph for my healthcare interface message storage project until Dgraph can support actual ordering of edges like arrays that can be manipulated by inserting using indexes instead of sets that do not contain duplication and cannot be ordered.

Do I even have to mention how big data storage is in healthcare systems, and how much :moneybag: this could lead to for Dgraph. Fixing some things to make Dgraph work for healthcare data, will lead to faster adoption by healthcare organizations.

Hopefully this can act as another use case for Dgraph to see the importance of working with arrays instead of sets for edges and any list.


References: