Moved from GitHub dgraph/4442
Posted by marvin-hansen:
Experience Report
Currently, I am building a rapidly growing GraphQL API with neo4j and some related tooling which allows me to just write a schema and the tooling generates the GraphQL API as well as DB CRUD operations. While the tooling is terribly ugly, it gets the job done 90% of the time. For the 10%, I had to resort to custom resolvers. This worked fine until I hit legacy SOAP service integration, which becomes very complex so I ended up querying the DB before using some templating to generate XML queries. Nothing terrible, but nothing fun either.
However, while looking at about half a dozen custom resolvers, I noticed a very distinct pattern:
- GraphQL entrance point (Say, AllPartnerProductsSortedByRegionSale)
- Querying the internal DB to get some keys required to construct an external query
- Build & execute an external query
- Apply filtering or sorting
- Return a GraphQL response
In many ways, the pattern of getting an internal GraphQL API call that requires information from an external resource is very common in practice and usually handled somewhere between the DB and frontend and, more often than not, handled as microservices.
What you wanted to do
I actually wanted to abstract away the boundaries between data & compute, and I wanted to abstract away the boundaries between internal & external resources.
What you actually did
I actually did most of that by re-formulating business processes as GraphQL endpoints and then implemented the underlying workflow through custom GraphQL resovers that were querying the internal DB as well as external resources & legacy services.
Why that wasn’t great, with examples
There are a couple of practical problems when doing just that with neo4j / GrandStack:
- Tooling relies on node.js / JS => Debugging disaster
- DB Quering can only be done through the public GrapQL API => Http overhead
- Daisy-chaining custom resolvers, well, you don’t do that => Very ugly.
Beyond that, there is a more fundamental issue:
Interweaving micro-services that integrate external resources with an internal data-graph, while technically feasible, isn’t adequately solved.
Neo4j comes with an APOC to call REST services, which only works with a cypher query. Technically, one can combine an APOC Rest call with the GrandStack & custom GraphQL resolver to expose REST as GraphQL while interweaving it with local data, but it would easily win a nomination for the ugliest hack possible let alone the fragility of patch-working together so many unrelated stuff.
The next best is ArangoDB that comes with FOXX, which does exactly this, embedding micro-services within their DB, but again, it uses JS (!) for building those microservices, but more importantly, exposing those mixed functionalities as one unified Graph hasn’t been solved adequately.
Also, I do not believe that embedding microservices inside a DB is a particularly terrific idea in terms of isolation and scalability when all it really takes is to call micro-service from within the DB.
What would be truly great?
A simple addition to DGraph consisting of just three simple bits:
-
A stored procedure-like DB function that can call a microservices.
A simple Go plugin, for example, would be sufficient. -
A simple @external(“ServiceName”) directive, which can be placed within the DB schema
-
An update to the query executor that links everything together by dispatching from the directive to the DB function to call the connected micro-service.
Specifically, when a query hits @external(“ServiceName”) @external directive entity, the query executor calls the corresponding DB function “ServiceName”, which then calls the micros-service and returns the value so that the query can be completed with either a result or the external part left empty in case something went wrong.
If you think clearly about it, a lot of microservices really only fetch data, which then gets combined with internal data, so doing exactly this during a GraphQL query already massively simplifies system integration, maintenance, and quite frankly, reduce complexity.
Taking the converged data & compute graph one step further, say a query returns a set of users and their recent purchases, that result can, then again stuffed into a GraphQL endpoint, which yields a machine learning WS endpoint that applies, say a page-rank, and returns the top-3
cat-related-shopping-addicted-users.
From a system engineering perspective, storing micro-service call signatures inside the DB makes a lot of sense, and leveraging this information to bind these calls to an @external directive to access a micro-service during query-time makes even more sense and, practically speaking, allows a very rapid build-up of a converged knowledge & compute graph that almost recursively reads its data, computes it, and stores results back to it.
Even interactions with large, distributed ML systems, say your Apache Spark, becomes a whole lot less of an integration issue once accessible during query time.
To the best of my knowledge, there is no graph database with native, first-class-citizen, support of calling external resources and accessing these external resources during query time.
For people who build systems rapidly and rely on external REST services to do so, this would be a complete game-changer in terms of streamlined development productivity.
Any external references to support your case
A recent discussion of the idea of converging data & compute graph:
Foxx
neo4j / Grandstack
https://grandstack.io/
Neo4J REST/APOC