Filtering is slow on large amount of data

diggy · June 15, 2020, 5:51am

antblood commented :

If any predicate is indexed then it is stored in the following way:

<predicate, predicate_value> => [uid1, uid2, uid3 .....]

hence in this case predicate sex will be stored as:

<sex, "f"> => [0x1, 0x3, 0x5, ....]
<sex, "m"> => [0x2, 0x4, 0x6 ....]

and predicate entity_key as:

<entity_key, "entity1"> => [0x1]
<entity_key, "entity2"> => [0x2]
<entity_key, "entity3"> => [0x3]
...

When we try to find out if the person with “entity800000” has sex predicate as “f”. First we get it’s uid from <entity_key, "entity800000"> => [0xC3500], then we traverse over the list <sex, "f"> => [0x1, 0x3, 0x5, ....] to check if the same uid is present in this list. Traversing over a long list makes this operation very slow.

In the case when we don’t index a predicate then it is stored in the following way:

<predicate, uid> => [value1, value2 ....]

hence in this case:

<sex, "0x1"> => ["f"]
<sex, "0x2"> => ["m"]
<sex, "0x3"> => ["f"]
...

in this case, checking the value of sex predicate for a node is very fast as we only need to traverse over one value.

Hence, it’s better not to index the predicates that can only have a few different values. Like in this case sex predicate has only two values “f” and “m”.

Time taken for query when we index sex predicate : 300 ms
Time taken for query when we don’t index sex predicate : 3 ms

Topic		Replies	Views
Filter performance Dgraph	15	863	March 27, 2020
Complex edge filtering with large data is too slow. I'm not sure what mistake i make.Can you give me some advice and help? Users	1	603	September 28, 2019
Significant Performance Degradation with More Conditions Dgraph	4	680	October 23, 2018
Sharing a little trick Dgraph	4	260	March 20, 2024
Query performance of large database (over 12g edges) Dgraph	5	1782	July 2, 2019

Filtering is slow on large amount of data

Related topics