Slow filtering of string field in complex query

Dgraph metadata

Dgraph version : v21.03.0
Dgraph codename : rocket
Dgraph SHA-256 : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true

What I want to do

I’m building a dictionary service based on dgraph and the dictionary entries - and the queries to generate them - can be quite complex with many meanings, submeanings, translations and other data that can be part of a dictionary entry.

I try to lookup/generate dictionary entries and some of my filters use a “language” field like this:

      translations : translation @filter(eq(language, "hun") AND uid_in(dentry, 0x40ee58)) {
            ...
      }

“language” is always a 3 letter abbreviation of the language.

These filters can take 200-400ms according to jaeger in some queries although the language field is hash indexed and so it should be equally fast for any filter in theory.

What I did

I created an example query, ran it and collected traces using jaeger. The schema and partial screenshot of the jaeger UI is also included in the archive:

slow_query.zip (2.9 MB)

Am I doing something wrong here, or what could be the reason for these slow “language” field lookups?

Thanks!

Perhaps you could try splitting the language predicate into translation.language and SeeAlso.language. There might be fewer nodes for the lookup to process.

Thanks for the tip! I now have 1.492.535 language predicates. Does it seem too much for this query type?

Another thought: is the order of the conditions in @filter() significant? I tried to swap them but I got the same speed, so it probably doesn’t matter.

And yet another idea: would it help if instead of having a language string I would have a node of type Language and so I would have an uid_in for the specific Language node I’m looking for.

@filter(uid_in(language, <ID of "hun" language>) AND uid_in(dentry, 0x40ee58)

Would uid_in be faster?

@anand could you please help me with this, whether uid_in() would be faster than eq() with a string value in my case?

All uid functions are theoretically faster than any other. Cuz UIDs are address pointers.

So, you add the language value at the word level? I would prefer to have a node structure in which each node is a language and the words are connected to it. So you can use UID functions with it.

e.g.:

{
 ABC as var(func: eq(language, "ABC")) 
 DEF as var(func: eq(language, "DEF")) 

q(func: ...) {
...
translations : translation 
@filter(uid_in(dentry, uid(ABC) 
    AND uid_in(dentry, uid(DEF))) {
            ...
      }
}

}

@MichelDiz thanks for the tip I will try this out!