Ver slow when query attributes not exist

Moved from GitHub dgraph/4812

Posted by JimWen:

What version of Dgraph are you using?

  • Dgraph version : v1.2.1
  • Dgraph SHA-256 : 3f18ff84570b2944f4d75f6f508d55d902715c7ca2310799cc2991064eb046f8
  • Commit SHA-1 : ddcda92
  • Commit timestamp : 2020-02-06 15:31:05 -0800
  • Branch : HEAD
  • Go version : go1.13.5

Have you tried reproducing the issue with the latest release?

Not yet with 2.x version, but nothing fix notice founded from the releases note

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

The data is User behavior in app like click/chat/signin/signup etc. And so the schme is like

type User{

gid_user
name
}

type Action{
type
t
net
from
to
with
}

gid_user: int @index(int) .
name: string @index(hash) .

type:string @index(hash) .
t:dateTime @index(hour) .
net:string @index(hash) .
from:uid @reverse .
to:uid @reverse .
with:[uid] @count @reverse .

and the graph is like
user1 - from- action1-to-user2
user1 - from- action2-to-user3
user1 - from- action3-to-user4

now we have user 5 million nodes and 2 billion action nodes, i want to query user’s action list like followings, erver action node has type and t

{
	res(func: eq(gid_user, 52953149)) {
		src
		~from {
			type
			t
		}
	}
}

then i want to query net like followings, not every action node has net

{
	res(func: eq(gid_user, 52953149)) {
		src
		~from (orderdesc:t) {
			net
		}
	}
}

Expected behaviour and actual result.

The problem is when i just query the whole action list of a user with attributes all exist, it’s about 5-20ms, but when i query with attributes that may not exist, the time >200ms which is more then 100 times.

Why does this happen, the total action of a user is at most 2000, it shoud’t be so slow. I guess when use filter or order, it just query the whole index kv to match what we want and not exist a pre filter like bloom filter?

harshil-goel commented :

We are working on Filtering is slow on large amount of data · Issue #2713 · dgraph-io/dgraph · GitHub, and it should be the same issue. Please track the progress in that thread.