I’ve been developing for over a decade. I’ve architected quite a few RDBS and last year started diving heavily into Hasura. I have a new client project and I feel like dgraph would be an excellent fit for what is looking to be a highly relational model. I feel pretty comfortable with graphql, and the concept of graph databases and see their value. Where I’m struggling is getting clear answers on the the why behind certain modeling approaches. I’d love some help getting answers to a few high level questions:
DGraph/Graphql specific questions:
When using the graphql schema option, does dgraph automatically index for aggregates?
Does dgraph create index automatically for anything besides predicates tagged with @search?
Impact of mutations on queries. In an RDBS, too many mutations against a large table can drastically impact query time. Is this a concern in dgraph?
Performance impacts of deeply nested where clauses?
Say I have a User type that has private fields, and a UserProfile that has publicly queryable fields. There are multiple types of Users with some overlap and slight differences. How would you model this when thinking about adding @auth to ensure only the UserProfile information is exposed?
General Graph database questions:
What are the benefits of using Unions?
When to denormalize? IE - Likes on a youtube video.
2b) Denormalizing photos. Assuming a very high traffic site would there be a benefit to denormalizing the image names into an array onto their associated item? Or is that overkill and just leave the photos separate and link them?
How find grained to break your schema down? Do you benefit more from a lot of smaller nodes composed into a larger one? Or less by having more relationships you have to traverse?
IE - Let’s say I’m building a real-estate app and I want users to be able to search by number of bathrooms. What would be the pros and cons of making a BathroomCount type that is referenced by homes vs just having an Integer on the Home type itself?
IE2 - The real estate app would search heavily by address, and parts of addresses. Do you break down each address part into its own pieces?
Pros and cons of more predicates vs more relationships. IE -
User has Messages.
Messages can be read.
You could model this as Message has isRead.
or by changing User to be
User has unread: [Messages]
User has read: [Messages]
Comparisons to traditional RDBS options appreciated, but not necessary.
Thanks in advance to anyone that weighs in.
Good morning @jrobber , and welcome to the community!
I want to clarify a few things: by “index” do you mean the same way RDBMSes do it? If so, that concept doesn’t really apply in the same way. The general idea is still valid (i.e indices in information retrieval) - there are still indices in Dgraph, but not in the same way that RDBMSes do it.
Dgraph shards data by predicate. This in turn is somewhat equivalent to turning on indices for every column in a table in a RDBMS. An index in Dgraph is closer in concept to a lookup table for direct lookup (it’s not though).
Now to answer your questions:
Dgraph/GraphQL specific questions - I will answer them briefly as I have a meeting:
No. A aggregateFoo {count} for example is not indexed, but rather computed at runtime.
The @search directive adds the specific kind of index (exact, hash, term, fuzzy, regexp) you need.
Not really, no.
Nope. In fact that is what Dgraph is extremely good at.
You are correct. @auth is the way to go.
General Graph database questions:
You mean graphql unions? TODO
By “denormalize” do you mean it in a SQL way? In graph databases you don’t have to worry about traditional denormalization.
You can leave the photos separate aand link them
It really depends on your purpose: representation, computation or storage. What do you want to optimize for? Dgraph is a really flexible database. So to take your bathroom count question, you can easily transform one schema (just having an int) to another (having a BathroomCount type).
No, you would have to break down the address yourself. Dgraph doesn’t have address breaking down built in
I think you misunderstand. Predicates ARE relationships. Here is a blog post I wrote, explaining the way we store things in graphs.
What are the benefits of using Unions? Unions are useful in expressing inherent heterogeneity that exists within any domain. See this blog for an example. Union is being leveraged to express a process api. In this example, a union is used to link an order with diverse activities such as shipment, invoicing etc. that interact with it. The on keyword that the union supports helps to express the order and associated aspects compactly, while Dgraph manages storage and servicing queries.
IMO, the biggest benefit is the speed of iteration in development of process oriented solutions like microservices or orchestration services.
The consumer (of data) is the king. There could be scenarios where you want to express data in such a way that the query performs much better. You might be ok to take more load in converting incoming data into such a format, rather than make each query perform lengthy traversals. This is very similar to CQRS, but the context is within the graph itself.
The second pattern of explicit ‘read’ and ‘unread’ might certainly perform better during reads as opposed to the first pattern, and this fact can motivate your final choice for a data model.
I was in the same boat. I came from MySQL. The terminology is completely different and what we know about indexing and optimizing schemas we almost need to forget completely.
These answers above are pretty good. Let me know if there are any other specific questions you have that I may be able to help with as I have time. I learned a lot the hard way so others don’t have to. I might add some more specific thoughts when I am back on my desktop and can type better.