ELT Challenge with DateTime Joins

amaster507 · September 1, 2022, 4:39pm

What I want to do

Extract and Load data into Dgraph and then Transform that data adding the links where applicable.

What I did

Presented this challenge to help DQL Mature

SQL Equivalence

(I actually had to run a query like this for a report from a SQL Server DBMS)

Schema

#GraphQL
type Event {
  id: ID!
  xid: Int! @id
  started: DateTime!
  activities: [Activity]
}
type Activity {
  id: ID!
  xid: Int! @id
  logged: DateTime!
  duringEvent: Event @hasInverse(field: "activities")
}

Some Data

#GraphQL
mutation {
  addEvent(input:[
    { xid: 1, started: "2020-01-01 00:00:00" }
    { xid: 2, started: "2020-01-01 04:55:15" }
    { xid: 3, started: "2020-05-01 16:23:45" }
  ]) { numUids }
  addActivity(input: [
    { xid: 1, logged: "2020-01-01 00:00:00" }
    { xid: 2, logged: "2020-01-01 01:20:15" }
    { xid: 3, logged: "2020-01-01 16:23:45" }
    { xid: 4, logged: "2020-02-01 00:00:00" }
    { xid: 5, logged: "2020-05-10 13:00:00" }
    { xid: 6, logged: "2020-02-01 18:30:00" }
  ]) { numUids }
}

Challenge

Using only DQL/GraphQL without doing scripting on a client/lambda transform the data to be able to show this output with a query:

{
  "queryEvent": [
    { "xid": 1, activities: [{"xid":1},{"xid":2}] },
    { "xid": 2, activities: [{"xid":3},{"xid":4}] },
    { "xid": 3, activities: [{"xid":5},{"xid":6}] }
  ]
{

No cheating and creating mutations based upon the xid fields.

Assumptions need to be followed:

Events start at their given time and continue until the next started Event.
The xid’s will not always be in the same order. Whatever solution you come up with should work when the xids are reassigned and the data is initially sorted in any random order.
Activities should be linked to the Event during which they occurred strictly by comparing the Event.started vs. Activity.logged values.
The solution should work also for millions of records if that was what the initial loaded data included.

EDIT: Added assumptive constraints and corrected copy/paste typos.

jdgamble555 · September 1, 2022, 6:43pm

I don’t understand what you’re querying here… why are there two xid’s?

J

amaster507 · September 1, 2022, 7:50pm

It was a very simplified version of the actual situation. Which might mean I created a XY Problem unintentionally.

So for the report I had, I was pulling a patients orders, and nurse documentation. The only thing that ties all of this together is the patient and dates. So for instance there was an order placed and then you want to get all of the documentation between that order and the next order. In a graph database, you would create an edge from the documentation to the order, but that is not how it is done in the SQL DBMS DR. So if you were to extract the data and load it into Dgraph and then try to run mutations to transform the data, you would have to solve the problem above.

Does that make sense at all?

The xid’s only purpose here was to point back to the data’s original source. So if I went and looked for the PK for the orders, and documentation in the original DBMS I could verify the accuracy of the data.

Topic		Replies	Views
Complex Aggregate Sort Query Dgraph	12	1149	December 11, 2018
Patterns for storing timeseries data in graph Misc kind:question	3	222	July 24, 2024
DateTime Functions are needed Dgraph kind:feature	12	2232	February 21, 2023
Groupby on multiple fields using DQL Dgraph kind:question	20	1778	September 16, 2022
Is dgraph fit for time series and finding equal value in array Dgraph mutation	5	1678	September 4, 2019