Uid_not_in query filter to exclude some ids

mrjn · March 22, 2021, 10:20pm

The universe of uids that these filters are operating upon is dictated by the parent’s UID result set. Call it U. So, within U, we’ll identify and remove (NOT uid_in). The query processor won’t go looking over the entire DB, just within the U.

My memory is a bit faint on whether we execute the ANDs in sequence or concurrently. Assuming it is sequential, I think you can run uid_in first, just because it’s a cheaper operation which doesn’t do any retrieval. And then put eq(city, Berlin) after.

koculu · March 22, 2021, 10:34pm

@mrjn
if the city “Berlin” is indexed, then the list returned by this filter condition will be much smaller in size than the one returning from NOT uid_in(<0x7>)

Hence, I expect Dgraph to do less by executing the index lookup first.
Isn’t it possible?

Also,
if we forget about uids at all,
a similar scenario using 2 eq(predicate,value) filter condition with AND connection,

@filter(eq(city, "Berlin") AND eq(age,41))

There are 3 possible approach to reduce the dataset.

Sequentially, city=‘Berlin’ first, then with the reduced dataset do age = 41
Sequentially, age=41 first, then with the reduced dataset do city=‘Berlin’
In parallel, execute city=‘Berlin’ and age=41 in parallel and do an intersection at the end

Depends on the data, the winner might change here.
I am curious how Dgraph deal with this.

mrjn · March 23, 2021, 2:40am

We pass the parent UID list over to the filters, so they don’t need to search the world. In effect what NOT filter does is, look at the list of passed UIDs, and then remove the 0x7 from it. No lookup required at all.

In general, for these scenarios, the smaller the parent UID list, the better. So, the idea is pass as small a set of UID list to the filters as possible. So, in an AND filter, better to do them in sequence to reduce that size, than to do it in parallel and deal with bigger sets.

koculu · March 23, 2021, 2:46am

Ok great. At least something is starting to shape in my mind.
Especially “No lookup required at all for NOT uid filters” part is really satisfying.
Thank you.

Topic		Replies	Views
Query for nodes that do not have predicate with connection to specific user node Dgraph kind:question , area:querylang , dql	5	258	March 27, 2024
How to use "NOT" in @filter query Dgraph dgraph , dql	2	578	January 25, 2023
Using uid_in when the uid is not known in advance but rather another predicate is known Dgraph	2	651	July 20, 2020
Comparing two predicate counts in a filter Dgraph kind:question	9	1540	January 30, 2023
Filtering UIDs in DQL Upsert Query Issues kind:question	1	479	April 2, 2021

Uid_not_in query filter to exclude some ids

Related topics