Uid_not_in query filter to exclude some ids

The universe of uids that these filters are operating upon is dictated by the parent’s UID result set. Call it U. So, within U, we’ll identify and remove (NOT uid_in). The query processor won’t go looking over the entire DB, just within the U.

My memory is a bit faint on whether we execute the ANDs in sequence or concurrently. Assuming it is sequential, I think you can run uid_in first, just because it’s a cheaper operation which doesn’t do any retrieval. And then put eq(city, Berlin) after.

@mrjn
if the city “Berlin” is indexed, then the list returned by this filter condition will be much smaller in size than the one returning from NOT uid_in(<0x7>)

Hence, I expect Dgraph to do less by executing the index lookup first.
Isn’t it possible?

Also,
if we forget about uids at all,
a similar scenario using 2 eq(predicate,value) filter condition with AND connection,

@filter(eq(city, "Berlin") AND eq(age,41))

There are 3 possible approach to reduce the dataset.

  1. Sequentially, city=‘Berlin’ first, then with the reduced dataset do age = 41
  2. Sequentially, age=41 first, then with the reduced dataset do city=‘Berlin’
  3. In parallel, execute city=‘Berlin’ and age=41 in parallel and do an intersection at the end

Depends on the data, the winner might change here.
I am curious how Dgraph deal with this.

We pass the parent UID list over to the filters, so they don’t need to search the world. In effect what NOT filter does is, look at the list of passed UIDs, and then remove the 0x7 from it. No lookup required at all.

In general, for these scenarios, the smaller the parent UID list, the better. So, the idea is pass as small a set of UID list to the filters as possible. So, in an AND filter, better to do them in sequence to reduce that size, than to do it in parallel and deal with bigger sets.

1 Like

Ok great. At least something is starting to shape in my mind.
Especially “No lookup required at all for NOT uid filters” part is really satisfying.
Thank you.

1 Like