Client Side offline Dgraph GraphQL sync

Continuing the discussion from Query sever timestamps in GraphQL?:

What?!? Please tell me you are going to share this!?!

https://dgraph.io/docs/deploy/fast-data-loading/live-loader/

I used it to import 600+ MySQL databases into one Dgraph. Now my implementation is not a live stream right now, but with using timestamps from MySQL and eventually timestamps on Dgraph, I hope it can be one day soon to support users transitioning between our app versions. This docs describes how to import from a .rdf file, so the process would be client data → rdf file → live load to Dgraph.

Wondering though, that if you are running two exact dgraph databases, if there is a better way to sync. Are these at sync exact duplicates or does the client just contain part of the server? If these are exact duplicates after a sync then it may be possible to do some kind of advanced configuration and running the client as an alpha of the server and then when they come into contact the server alpha(s) get updated from the client alpha. Hmm… I wonder… if maybe there is a better way using grpc directly instead of using graphql or live loader directly.

If you are running Dgraph client side, then you must also be running a client side zero that is handling uids. This would probably be the hard part because how would the server zero know that the client zero has been utilizing a chunk of uids to know not to reuse those. This is probably the harder part.

Kinda reminds me of when I was doing client side MySQL slaves that when they came into contact with the master would push up changes. For the uid problem here, I just has a single slave and a single master, and the slave could use even AUTO_INCREMENT and the master used the odds. Maybe the team has already dealt with syncing offline/online dgraph clusters.

Will have to check with my bosses but as It’s not very complicated I don’t see a reason why not at some point in time. But in general, here are the steps:

  1. store synchronization timestamps on every object
  2. client: “hey server, give me all data after timestamp XX” → add data locally (generated golang GQL client and reflection helps a lot here)
  3. conflicts: for now we don’t really resolve conflicts. The data with newest “updatedAt” timestamp wins. But it should be ez-pz to notify the user in that case and let him decide.
  4. client: “hey server, here is my data since last sync” → clients sends batch-mutations
  5. breaking database changes: sync will inform client that his version is to old, client loads migratio scripts from server and updates local data. (not implemented yet)

Thanks for the live-loader link!

Clients should only sync their own data (from their account). We definitely don’t want customer A have access to customers B data.

This is why we use our own (UU)-IDs on every type. → addFoo(id: <uuid>)

Yeah, so a config trying to sync two alphas would not work then for your use case. This is pretty close to what we want to do eventually as well. The one difference we have is that we work with an individual’s own data, group data, and public data. A user can update some group data and public data based upon rules we define. So there will most likely be a lot of conflicts on syncs because many many users are updating the same things offline.