Best Practices for Dgraph Microservices?

gja · September 29, 2020, 1:25pm

Hey All,

Are there any best practices for using dgraph in a microservices architecture? Usually in microservices arch, you would not have two different services talking to the same datastore (Shared Databases in Microservices Are a Problem - DZone Microservices / Database per service). This is because you’d typically want to be able to launch, scale and monitor each service separately and databases inherently become single points of failure, and queries can become noisy neighbours to other queries.

What’s the best practice for Dgraph if multiple microservices want to talk to it? I assume you’d want to have a set of dgraph instances right, rather than having a shared DB across things?

Tejas

mahlatsem · January 3, 2022, 12:04am

Hi,

For the next curious ones to reach this page.
I see a Dgraph team member @MichelDiz responded to similar question http://discuss.dgraph.io/t/dgraph-microservices-time-series-scalability-and-ci-cd/8993/2 saying

This will depend on your architecture. However, with Dgraph this is not necessary. It was made to scale horizontally and vertically.

The idea of been able to easily connect all related data such as a 360 view of a customer across the whole organisation as “promised” by graph databases is great, and while scalability is definitely one of the main issues with a monolithic database, i believe it’s the other concerns/forces mentioned in the articles @gja referenced above that are of greater concern.

The trickiest concern for me is being able to develop, deploy and scale independently, I’ve seen this with teams stuck where this is not possible because of a shared relational database as services from different teams end up directly referencing and even introducing integrity constraints against data owned by other services and it becomes impossible for the owning service to make any changes without breaking everyone else who is not ready to support the required changes because of competing priorities.

My second concern is access control not just for security but more for compliance concerns because if one of the services stores in scope data in the shared database then some serious thought and work may be required to keep all other services and people out of scope

Would be great to read of how those that have already taken the dgraph journey are solving for these concerns

amaster507 · January 3, 2022, 2:34am

If you are building a microservice you are probably beyond a startup/PoC and are working with a team. So why not use the enterprise version of Dgraph and use the namespacing so that you still have a singular data store but it is namespaced giving each microservice its own GraphQL endpoint and it’s own separated schema.

The trick would be a unified view of data, but that would still be possible with a federated joined namespace that acts as the unifier between all of the other namespaces.

Thoughts??

gja · January 3, 2022, 4:57am

Weighing in here since this is a topic I care a lot about. I think since this post has been made, Dgraph has introduced Multi tenancy, which I think is a step in the right direction. Also, Dgraph has introduced ~~read replicas~~ Learner Nodes, which one can use for slower best effort queries.

However, this is still a single point of failure in the Shared Nothing.

In the RDBMS world, the solution is usually to ring fence your databases into “absolutely cannot fail” and “I hope it doesn’t fail, but don’t mind if it does”, and then plan your resources accordingly. Analytics in particular gets very expensive very fast, and the risk of writing a query that starts making your entire system slow is usually not something I’d usually like to gamble.

OTOH, the shared nothing architecture does make it more difficult to have that 360’ view of your data. There is an entire ecosystem around moving data from critical dbs into non critical warehouses (fivetran / GoldenGate / etc…), though none of these support Dgraph yet.

So my current answer would be to do the same thing you’d do in a traditional microservices arch. One Dgraph cluster per critical service (maybe use multi-tenancy for non critical services to save cash, and for non-prod environments), and maybe move data into a warehouse with code (or via Dgraph’s new Change Data Capture / Kafka feature).

amaster507 · January 3, 2022, 5:21am

It is good to see you again here @gja! Thanks for the updated feedback.

How concerning is it to do the “Shared Nothing” when talking about Dgraph’s High Availability? Does this affect the concept at all, in that Dgraph itself becomes a limited failure risk.

What would you think about doing it with a poor-mans Dgraph namespacing? This would still give the 360 view of all data at once, and using DQL, you can update just the relevant portions of your schema with your specified prefix. Maybe this would need a custom layer setup to prevent schema updates to only update your own types/predicates based on what prefix each micro-service was granted.

To explain this better, let’s say the User microserver was given the prefix user_ and the Jobs microservice was given the jobs_ prefix. And the controlling layer would let any developer in the User service to update the schema for anything prefixed with user_ and likewise to any developer in the Jobs service. Then they could still link between the two as well, and keep the graph going traditionally.

This is all outside of the GraphQL generated API though, because that would be a singular controlled schema and it becomes harder to manage a singular schema spread across multiple services unless there was some automated process that stiches and validates the schema from different repos.

gja · January 3, 2022, 3:39pm

@amaster507 I don’t think Dgraph’s approach to HA and multi tenancy (either poor man’s MT or enterprise MT) is sufficient here. The goal to achieve is to avoid cascading failures.

We have to accept that all software is written by humans, and humans have a reputation for being idiots . Imagine I have two teams (T1 and T2) operating on the same Dgraph cluster. In Dgraph multi tenancy, each team will get their own prefix for 2 predicates (1-a, 1-b and 2-a, 2-b). Now Dgraph does not guarantee that groups are aligned along the multi tenancy boundary, so group 1 in Dgraph may get 1-a, and 2-b, while group 2 get 1-b and 2-a.

Now team T1 writes code that goes through 4 levels of code review, as they are a super critical path in the system. Team T2 on the other hand, doesn’t have the budget or the patience, mostly because they are an internal team in charge of building an app to order lunch for everyone. But someone on T2 writes a bad query. Like a really bad query. And accidentally calls it in a for loop that fires this query every 200ms (thanks to forgetting to await on a javascript promise in a loop).

Now the result: Team T1, who is just seeing normal load, starts seeing more and more latency on their API response time, even though there was no difference in load. Eventually, Dgraph starts giving more and more CPU over to the bad queries that team T2 writes, until T1 just stops working all together.

Sure, you could add a query-limit, which limits all queries to 500ms or whatever, but that’s an arbitrary limit, and affects all namespaces, so now you need buy in from multiple different teams, not just T1 and T2.

By problem is by no means unique to Dgraph, as all data stores suffer from the same issue. Hardware isolation (ie, separate hardware for each database) is the only solution I know of to solve this.

ETA/TL;DR: The point I’m trying to make is that most HA solutions work on solving hardware or network failure. Resource exhaustion / starvatio due to human error is a much more common problem, at least in my experience.

amaster507 · January 3, 2022, 3:46pm

so does a graphdb have any place then in a microservice?

I guess a service might need a graph for just their service but a graph that connects all services together in a 360 view is most likely impossible without that being it’s own service on top of all other services which in theory breaks the microservice architecture, right?

The only way to have a unified view would be with some kind of automated schema stitching and resolvers pointing to the individual microservices.

gja · January 3, 2022, 3:55pm

This is what I was hoping to have a discussion on in this thread way back when I posted this.

In the RDBMS world, you’ll take data from 2-3 RDBMSes and other dbs (like mongo), and then shove all that data into a query optimised database like redshift, and run your heavy queries there.

What’s the equivalent in the graph world?

I think a lot of teams are already taking data from RDBMS master databases and putting that data into a graph (that’s what Neo4j gained popularity as… a secondary data store)
There are a few teams out there with multiple Dgraph instances. What are people using for analytics and a 360 view after that? Shipping data via Kafka into another Dgraph? Into Redshift? This is the part was hoping to hear more stories about.

Juri · January 3, 2022, 10:32pm

‘a one graph database for everything’ that is exactly what dgraph is and that’s fantastic. if you really have fear that dgraph crashes, then use HA setup or a redis/CF-workers cache layer for frequent queries

amaster507 · January 3, 2022, 10:48pm

hehe, you don’t know who you are talking to, lol. @gja was part of the Dgraph team. He is very well aware of what Dgraph is, if you read above you will see the reasoning for separating not just databases but servers for microservices, advantages and disadvantages, yes.

gja:

I don’t think Dgraph’s approach to HA and multi tenancy (either poor man’s MT or enterprise MT) is sufficient here. The goal to achieve is to avoid cascading failures.

We have to accept that all software is written by humans, and humans have a reputation for being idiots . Imagine I have two teams (T1 and T2) operating on the same Dgraph cluster. In Dgraph multi tenancy, each team will get their own prefix for 2 predicates (1-a, 1-b and 2-a, 2-b). Now Dgraph does not guarantee that groups are aligned along the multi tenancy boundary, so group 1 in Dgraph may get 1-a, and 2-b, while group 2 get 1-b and 2-a.

Now team T1 writes code that goes through 4 levels of code review, as they are a super critical path in the system. Team T2 on the other hand, doesn’t have the budget or the patience, mostly because they are an internal team in charge of building an app to order lunch for everyone. But someone on T2 writes a bad query. Like a really bad query. And accidentally calls it in a for loop that fires this query every 200ms (thanks to forgetting to await on a javascript promise in a loop).

Now the result: Team T1, who is just seeing normal load, starts seeing more and more latency on their API response time, even though there was no difference in load. Eventually, Dgraph starts giving more and more CPU over to the bad queries that team T2 writes, until T1 just stops working all together.

Sure, you could add a query-limit, which limits all queries to 500ms or whatever, but that’s an arbitrary limit, and affects all namespaces, so now you need buy in from multiple different teams, not just T1 and T2.

By problem is by no means unique to Dgraph, as all data stores suffer from the same issue. Hardware isolation (ie, separate hardware for each database) is the only solution I know of to solve this.

Topic		Replies	Views
Dgraph, Microservices, Time series, Scalability and CI/CD Dgraph dgraph	8	2233	July 26, 2020
Mongo-dgraph data synchronization Dgraph	3	1158	April 30, 2019
Best way to create Microservices that are offered to other devs as APIs Dgraph	1	300	April 30, 2023
Is it possible to have a full distributed database? Users	13	2102	February 3, 2018
Evolve to Multi-model Database Misc	5	1688	July 28, 2016

Best Practices for Dgraph Microservices?

Related topics