How to influence the placement of predicates in dgraph?

From Dgraph Database Semantics , I read that dgraph places predicates across Alpha . I am trying to see if I can store Kubernetes YAMLs from a cluster in dgraph and run queries over them. I would like YAMLs from a single Kubernetes cluster to reside in one Alpha group and YAMLs from different Kubernetes clusters can live in different Alpha. Emperically I know that total size of all YAMLs in a single cluster is few MBytes. I intent to search within a single cluster more often than across clusters. So, keeping all YAMLs of a cluster in a single Alpha group will reduce # of rpcs when querying. You can see the types of YAML/JSON objects I am trying to store here: GitHub - appscodelabs/tasty-kube: Kubernetes test scripts

I am new to dgraph. Am I understanding this correctly? Is there a way to achieve this?

Yes, this is possible, but not exactly as you may think and there is some miss conception about how Dgraph works. The predicates are balanced between groups. Based on disk usage. For now, there’s no way to force where a specific set of predicates will be. But you can increase the time of balancing to infinity. That way you’d bypass the balancing. And then you can use /moveTablet?tablet=name&group=2. That’s a way to force it.

read this Add a Bulk Move Tablet and/or A deterministic scheme for Tablets (Also Geo-Sharding Support) if you wanna know more about it.

But the main problem relies on the predicate context. All YAMLs will use the same predicate, and as Dgraph is based on predicates and not values, it won’t work for you. Unless you have a predicate for each YAML you have. That means if you have thousands of YAMLs, you gonna have thousands of predicates on your schema. And you would have a Type and query for each predicate. This would be hard work to maintain.

Anyway, why this is important for you? to keep a set of predicates in specific groups.

BTW, this isn’t really possible. Groups will be any Alpha (If you have replicas a group will be a set of Alphas). You can’t control this (where the data goes in Alphas level). The logic of constraining nodes (YAMLs) to Alphas is not possible. There’s no way to Dgraph infer a specific node (checking its values) and throw the data to a specific alpha. It could work for Groups (in a different way), not Alphas.

Let’s say that group 3 holds the tablets for that data you are inserting. Dgraph will send it to that Alphas that compound that group (that may be several alphas or a single one). That way works. If you use the infinity balancing bypass and move the tablets to the group you wish. Again, but not in value level, just predicates.

Cheers.

2 Likes

I understand. It might be worth supporting a feature like “SubGraph” where users can define their preference for predicate placement. My YAML/JSON object can be quite deep (N=10) and it seems to me that in the pathological case this might result in 10 rpc calls when the actual data volume in pretty small. This is not an issue per se. I was just wondering.

1 Like

If all predicates you want are in the same group, Dgraph won’t do any unnecessary extra calls and it will stick to the first Alpha that received your request. It could do extra network calls in the case you have several groups(let’s say 10 groups) with even balancing shards.

I think N=10 is pretty small for Dgraph tho. You shouldn’t be worried about calls. I would start to be worried to fine-tune Dgraph with N=1k per request.