Slow filtering of string field in complex query

beepsoft · August 4, 2021, 8:30pm

Dgraph metadata

Dgraph version : v21.03.0
Dgraph codename : rocket
Dgraph SHA-256 : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true

What I want to do

I’m building a dictionary service based on dgraph and the dictionary entries - and the queries to generate them - can be quite complex with many meanings, submeanings, translations and other data that can be part of a dictionary entry.

I try to lookup/generate dictionary entries and some of my filters use a “language” field like this:

      translations : translation @filter(eq(language, "hun") AND uid_in(dentry, 0x40ee58)) {
            ...
      }

“language” is always a 3 letter abbreviation of the language.

These filters can take 200-400ms according to jaeger in some queries although the language field is hash indexed and so it should be equally fast for any filter in theory.

What I did

I created an example query, ran it and collected traces using jaeger. The schema and partial screenshot of the jaeger UI is also included in the archive:

slow_query.zip (2.9 MB)

Am I doing something wrong here, or what could be the reason for these slow “language” field lookups?

Thanks!

anand · August 5, 2021, 4:17pm

Perhaps you could try splitting the language predicate into translation.language and SeeAlso.language. There might be fewer nodes for the lookup to process.

beepsoft · August 5, 2021, 6:01pm

Thanks for the tip! I now have 1.492.535 language predicates. Does it seem too much for this query type?

Another thought: is the order of the conditions in @filter() significant? I tried to swap them but I got the same speed, so it probably doesn’t matter.

beepsoft · August 5, 2021, 6:07pm

And yet another idea: would it help if instead of having a language string I would have a node of type Language and so I would have an uid_in for the specific Language node I’m looking for.

@filter(uid_in(language, <ID of "hun" language>) AND uid_in(dentry, 0x40ee58)

Would uid_in be faster?

beepsoft · August 23, 2021, 5:14pm

@anand could you please help me with this, whether uid_in() would be faster than eq() with a string value in my case?

MichelDiz · August 24, 2021, 2:51pm

All uid functions are theoretically faster than any other. Cuz UIDs are address pointers.

So, you add the language value at the word level? I would prefer to have a node structure in which each node is a language and the words are connected to it. So you can use UID functions with it.

e.g.:

{
 ABC as var(func: eq(language, "ABC")) 
 DEF as var(func: eq(language, "DEF")) 

q(func: ...) {
...
translations : translation 
@filter(uid_in(dentry, uid(ABC) 
    AND uid_in(dentry, uid(DEF))) {
            ...
      }
}

}

beepsoft · August 24, 2021, 3:24pm

@MichelDiz thanks for the tip I will try this out!

Topic		Replies	Views
Query to slow, how to optimize query Dgraph	5	471	April 25, 2021
Query is very slow while adding le function for float predicate in filter Dgraph area:performance	6	1184	November 15, 2022
V1.0.12 slower for some queries Users	4	455	April 6, 2019
Filtering is slow on large amount of data Dgraph dgraph , status:accepted , priority:p1 , popular , area:performance	5	1152	June 15, 2020
Slow query when apply @filter or order to predicates Dgraph kind:question , kind:enhancement , kind:bug , area:performance , ticket:created	5	1182	May 6, 2021

Slow filtering of string field in complex query

Dgraph metadata

What I want to do

What I did

Related topics