Moved from GitHub dgraph/2713
Posted by makitka2007:
dgraph 1.0.9
due to required index lookup for every @filter condition, filtering is slow on large amounts of data.
test data: 10m entities created with “sex” predicate having random value “m” or “f”.
since index is required for filtering, it’s created both for entity_key
and sex
predicates.
entity_key: string @index(exact) .
sex: string @index(exact) .
10m entities loaded:
_:node$x <entity_key> "entity$x" .
_:node$x <sex> "$sex" .
query to get 1 entity takes 1ms:
{
get_entity(func: eq(entity_key, "entity600000")) {
uid
}
}
adding filter by “sex” predicate slows it down to 7 seconds:
{
get_entity(func: eq(entity_key, "entity900000")) @filter(eq(sex, "f")) {
uid
}
}
because filter loads all ~5m entities having sex="f"
into memory.
need to improve filters not to use index when index doesn’t exist or by some special directive.
if I use filtering on edge facet it works fast as expected (1 ms):
{
get_entity(func: eq(entity_key, "entity800000")) @cascade {
uid
attrs @facets(eq(sex, "f"))
}
}
so, predicate filters should use the same logic as edge facets filters (if index is not created or there is a special directive not to use index on this predicate).