Hi. Have been reading about Dgraph for the last few days. Had a lot of questions in relation to it. Asking them all in this thread. Sorry for making it huge.
-
In the microservices world, the recommendation is to have 1 db per service to enable scaling. Should we do that for dgraph as well cause then it will create isolated graphs without any connection in between different instances.
-
Let’s say we have isolated instances of dgraph, is it possible to refer to the nodes running on another dgraph instance? While we may be violating the microservices principles doing this, I was wondering how to have a unified view of the data graph of all the services together but still allow the flexibility of multiple teams to handle their own database pipelines.
-
Apollo Federation (https://www.apollographql.com/docs/apollo-server/federation/introduction/) handles GraphQL queries and distributes them to multiple graphql endpoints (microservices) depending on where they are running providing support for federated graphql. Is something similar possible in dgraph where I have multiple dgraph instances running each with their own graph and I do a query to graph using GraphQL± and you go back and run the individual GraphQL queries in the relevant GraphDB, aggregate and give back the response? This would enable push for graph database with microservices support.
-
Articles like these:
https://tdwi.org/articles/2017/03/14/good-bad-and-hype-about-graph-databases-for-mdm.aspx
suggest that Graph databases are not as performant as relational dbs when it comes to bulk queries or things like that and has cons as well. If I have to see what the cons of DGrapgh are, what should I look at before using it in production?
- I have different kinds of data that I use for my startup. And Time series is one of them. And as I see from this: Time Series in DGraph - #2 by MichelDiz
I see that DGraph is not optimized for Time Series. Any suggestions on how we can go about time series data when using DGraph? Should I go for external time series stores? Or any suggestions you might have?
- Pricing for microservices pattern
Let’s say that we use one DGraph instance per service (not sure about this yet) and so this will lead to a lot of DGraph instances and as I see the pricing for EE is per instance and might shoot up even if all the services are running in the same node in the K8 cluster. So, is there any thoughts you have about this?
-
One other question which I had mentioned about in an another thread is about using DGraph with data localization constraints taken into account. If considering laws in China, Russia and other places, you were to have data graph with dgraph to have them stored and processed in separate clusters in different regions but still do a query or mutation to them from one place (abstracting the complexity from the clients), is there some reference architecture about this from DGraph? I see that if you use tools like Vitess (https://vitess.io/) you can geo-shard for MySQL. How can I do it with DGraph?
-
One other question I had was about CI/CD pipelines. When you talk about databases, there are things to consider like handling database migrations/schema changes, rollback, roll forward and so on. For instance, for relational databases, if you are using something like Prisma, you have Prisma migrate (https://www.prisma.io/docs/reference/tools-and-interfaces/prisma-migrate) which you can use to version control and rollback/rollforward the changes in your schema and apply them in the pipelines as well. How can we do this in DGraph?
-
If there is one thing I am always against, that’s vendor lockin. Though DGraph supports spec compliant GraphQL as I see, still, its the only database supporting GraphQL± with underlying graph engine being Badger. So, how would you view this? How to ensure that a person using DGraph is not locked in to the ecosystem but rather embraces it and also is given the choice to move to something else later if needed?
-
If you notice GraphQL, there are projects like dataloader (GitHub - graphql/dataloader: DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.) to enable query batching when you have multiple queries fired per call and reduce the round trips to the database. Does DGraph have something inbuilt to solve this problem if you are doing multiple graph queries or should I still use dataloader to handle this?
While I did search for all of this, I did not find answer to most of them in them in the docs.
Thanks in advance.