Just wondering what would be the best practice to typify the nodes in my dgraph database. Nodes can either be ‘topics’ or ‘phrases’ (might have more in the future.) There will be over 100 millions of nodes in production.
What would be the best approach, using an indexed predicate that defines takes a string value holding the node type:
That’s a good question and we have been recommending the first method i.e. defining a string type edge. The advantage is that you can just use the same predicate in your queries while checking for type instead of checking for a different predicate each time.
Both should be equally performant as they would generate equal sized index posting lists.
Will the second method not be better to equaly distribute edges over multiple machines in a dgraph cluster? With the first approach you will always have a type predicate for every node, which will probably also increase the index size / slowing down lookup performance of that “large” type predicate index?
Agreed. The second one is better. The first approach causes issues for us, because if you index on type edge, then it generates huge posting lists for us internally (topic → all nodes of type topic, phrase → all nodes of type phrase), which slows down everything.