Performance question: "x is_type boolean" or "x <type> value"

lazharichir · October 8, 2017, 11:39am

Just wondering what would be the best practice to typify the nodes in my dgraph database. Nodes can either be ‘topics’ or ‘phrases’ (might have more in the future.) There will be over 100 millions of nodes in production.

What would be the best approach, using an indexed predicate that defines takes a string value holding the node type:

_:0 <xid> "abc" .
_:0 <type> "topic" .
_:1 <xid> "xyz" .
_:1 <type> "phrase" .

Or, using an indexed boolean predicate labeled for each type:

_:0 <xid> "abc" .
_:0 <is_topic> true .
_:1 <xid> "xyz" .
_:1 <is_term> true .

What is the most suitable, performant and scalable solution for those who do have an idea?

pawan · October 8, 2017, 10:44pm

Hey @lazharichir

That’s a good question and we have been recommending the first method i.e. defining a string type edge. The advantage is that you can just use the same predicate in your queries while checking for type instead of checking for a different predicate each time.

Both should be equally performant as they would generate equal sized index posting lists.

pmualaba · October 14, 2017, 7:27pm

Will the second method not be better to equaly distribute edges over multiple machines in a dgraph cluster? With the first approach you will always have a type predicate for every node, which will probably also increase the index size / slowing down lookup performance of that “large” type predicate index?

peter · October 15, 2017, 10:30pm

That’s a good point. The second approach replaces one large predicate with many smaller predicates, which scales better when using a larger cluster.

mrjn · October 17, 2017, 12:47am

Agreed. The second one is better. The first approach causes issues for us, because if you index on type edge, then it generates huge posting lists for us internally (topic → all nodes of type topic, phrase → all nodes of type phrase), which slows down everything.

system · November 16, 2017, 12:47am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practice for node types still _name? Dgraph discussion , best-practice	1	733	June 21, 2019
Questions regarding predicates Dgraph	3	515	October 3, 2018
Revisit index syntax for new Type System Dgraph	3	474	November 19, 2019
[RFC] Naming of reserved predicate for type system Dgraph	15	955	March 26, 2019
Query for list of predicates from a single node Dgraph	1	698	September 19, 2019

Performance question: "x is_type boolean" or "x <type> value"

Related topics