Filtering is slow on large amount of data

antblood commented :

If any predicate is indexed then it is stored in the following way:

<predicate, predicate_value> => [uid1, uid2, uid3 .....]

hence in this case predicate sex will be stored as:

<sex, "f"> => [0x1, 0x3, 0x5, ....]
<sex, "m"> => [0x2, 0x4, 0x6 ....]

and predicate entity_key as:

<entity_key, "entity1"> => [0x1]
<entity_key, "entity2"> => [0x2]
<entity_key, "entity3"> => [0x3]
...

When we try to find out if the person with “entity800000” has sex predicate as “f”. First we get it’s uid from <entity_key, "entity800000"> => [0xC3500], then we traverse over the list <sex, "f"> => [0x1, 0x3, 0x5, ....] to check if the same uid is present in this list. Traversing over a long list makes this operation very slow.

In the case when we don’t index a predicate then it is stored in the following way:

<predicate, uid> => [value1, value2 ....]

hence in this case:

<sex, "0x1"> => ["f"]
<sex, "0x2"> => ["m"]
<sex, "0x3"> => ["f"]
...

in this case, checking the value of sex predicate for a node is very fast as we only need to traverse over one value.

Hence, it’s better not to index the predicates that can only have a few different values. Like in this case sex predicate has only two values “f” and “m”.

Time taken for query when we index sex predicate : 300 ms
Time taken for query when we don’t index sex predicate : 3 ms