Call for Collaboration: Designing a Dgraph offline-first library

Hi @amaster507,

thank you once again for taking the lead here and pushing forward to get stuff done!
I’ll just add some requirements, challenges and ideas which you could add to your post if you like.

First, I’d like to clarify what use-cases we are talking about:

  • browser-only pure javascript library
  • database-lite service (e.g. implemented in golang)

I don’t think that the former case is really that important. If you really want offline support, you can always pack your app into electron and ship and start any needed services on demand (including any database services). If it is only about short internet outtages, I think @MichelDiz is correct in pointing to apollo state management. But this is not what we need here. We’re talking about possible weeks of downtime.

Also, as you said, WASM might still be an option. Therefore, I’ll concentrate on the latter case in this post.

I’d like to propose the obvious name “Dgraph GraphQLite” (or short DGQLite) instead of “thin client”. Also, by doing this, I want to emphazise that building a complete new client might be too much work and we could just build a lite version of dgraph, which already is capable of most requirements we want. Such a service could also be started by any app

In general, I see DGQLite very similar to dgraph itself except:

  • clustering etc.
  • high performance throughput (keeping RAM usage low)
  • keep it simple (to minimize cross-plattform efforts)
  • anything else that’s not needed offline-first (the target is again to keep RAM usage low)

Requirements

  • Full Cross-Plattform Support
  • Initialize DGQLite by posting a GQL schema (like with dgraph)
  • GQL endpoint serving the generated API (like with dgraph)
  • Synchronization with dgraph-alpha is a bonus (it can already be implemented with custom business logic - I will go into detail on this later)

Client
I’d propose to use the same apollo client everyone is using currently and outsource the “heavy-lifting” to DGQLite .

  • Apollo talks to DGQLite on localhost
  • DGQLite knows if the remote endpoint (e.g. Slash GraphQL) is reachable and forwards the request in that case
  • If the remote server is not reachable, DGQLite will handle the request itself and writes data to the local badger DB.

This has the advantage, that from a client perspective you can use exactly the same queries you are using today.

Challenges
By using a lite-version of dgraph, most of the challenges you mentioned (@amaster507) are already solved:

  • It would naturally support the same queries and mutations as dgraph alpha
  • not sure about N+1, but I guess that this is not an issue currently with dgraph?
  • schema updates: DGQLite could just introspect the schema of the main server when going online and do the same migration (i.e. schema posting to itself) that dgraph is doing currently.

Synchronization

It would be nice to leverage dgraphs cluster and replication mechanisms here. There, a similar problem needs to be solved, no?

The scope of the synchronization has to be limited of course - and I see @auth-rules to be a perfect fit here.

Than being said, I’d propose that for an initial DGQLite -MVP we shouldn’t pay too much attention to synchronization. We can add this functionality later on.

Anyway, here’s how we do synchronization currently using two dgraph instances and it works great:

  • do the synchronization on the application-layer
  • synchronize every type independently, starting with leaf-nodes and working the way up the dependency-tree
  • custom id field on every type that must be set when using the addFoo(...) mutation (all types implement interface ID { id: String! @id}
  • createdAt and updatedAt fields on every type
  • a list of deleted IDs with a deletedAt timestamp
  • syncedAt timestamps for every user for every type

S := DGQLite (in our case dgraph instance 1)
D := Slash GraphQL (in our case dgraph instance 2)

  1. S; “Hey D, do you have any new, changed or deleted Foos since syncedAt ?”
    S fetches all foos with queryFoo(filter:{syncedAt: {gte: time}}) and adds/updates/deletes them locally
  2. S: “Hey D, here are all new, changed and deleted Foos since syncedAt !”
    S builds addFoo(input: [ ... ]), updateFoo(...) and deleteID(...) mutations and send them to D
  3. update syncedAt timestamps on both sides

When conflicts arise (object with same ID is added/updated/deleted), currently, the object with the newest updatedAt or deletedAt timestamp wins.

The big advantage here is that the input- and query-types are the same for the generated GQL-API when using dgraph. Therefore, the objects obtained by queryFoo can directly be used in the corresponding addFoo and updateFoo mutations.

This makes writing synchronization logic very easy. And because we already use @auth rules on all types, we don’t need to care about syncing data that we don’t own as dgraph takes care of this.


tl;dr:

Don’t implement a complete new client - use a downstripped dgraph instance instead, get most of the features for free and save time on development.

Further Steps

Maybe we can also get a bit involvement from the @graphql team here?

Possible next steps:

  1. get an understanding about what features can be downstripped from dgraph
  2. write “official” requirements for DGQLite MVP
  3. define milestones for DGQLite MVP
  4. ???
  5. let’s get this done!
4 Likes