Very slow with filter or order querying

Moved from GitHub dgraph/4810

Posted by JimWen:

What version of Dgraph are you using?

  • Dgraph version : v1.2.1
  • Dgraph SHA-256 : 3f18ff84570b2944f4d75f6f508d55d902715c7ca2310799cc2991064eb046f8
  • Commit SHA-1 : ddcda92
  • Commit timestamp : 2020-02-06 15:31:05 -0800
  • Branch : HEAD
  • Go version : go1.13.5

Have you tried reproducing the issue with the latest release?

Not yet with 2.x version, but nothing fix notice founded from the releases note

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

The data is User behavior in app like click/chat/signin/signup etc. And so the schme is like

type User{

gid_user
name
}

type Action{
type
t
net
from
to
with
}

gid_user: int @index(int) .
name: string @index(hash) .

type:string @index(hash) .
t:dateTime @index(hour) .
net:string @index(hash) .
from:uid @reverse .
to:uid @reverse .
with:[uid] @count @reverse .

and the graph is like
user1 - from- action1-to-user2
user1 - from- action2-to-user3
user1 - from- action3-to-user4

now we have user 5 million nodes and 2 billion action nodes, i want to query user’s action list like followings

{
	res(func: eq(gid_user, 52953149)) {
		src
		~from {
			type
			t
			to {
				expand(_all_)
			}
			with {
				expand(_all_)
			}
		}
	}
}

then i want to query order by time like followings

{
	res(func: eq(gid_user, 52953149)) {
		src
		~from (orderdesc:t) {
			type
			t
			to {
				expand(_all_)
			}
			with {
				expand(_all_)
			}
		}
	}
}

then i want to query one type action like followings

{
	res(func: eq(gid_user, 52953149)) {
		src
		~from @filter(eq(type, "signin")) {
			type
			t
			to {
				expand(_all_)
			}
			with {
				expand(_all_)
			}
		}
	}
}

then i want to query user’s action begin from a time like followings

{
	res(func: eq(gid_user, 52953149)) {
		src
		~from @filter(ge(t, "2020-02-02T21:24:24+08:00"))  {
			type
			t
			to {
				expand(_all_)
			}
			with {
				expand(_all_)
			}
		}
	}
}

Expected behaviour and actual result.

The problem is when i just query the whole action list of a user, it’s about 5-20ms, but when i query with order or filter on one index, the time >500ms which is more then 100 times.

Why does this happen, the total action of a user is at most 2000, then with filter or order it shoud’t be so slow. I guess when use filter or order, it just query the whole index kv to match what we want?

harshil-goel commented :

We are working on Filtering is slow on large amount of data · Issue #2713 · dgraph-io/dgraph · GitHub, and it should be the same issue. Please track the progress in that thread.