Moved from GitHub dgraph/5217
Posted by fwereade:
May be related to Something happens with indexes or reverse references spontaneously being messed up or deleted · Issue #5160 · dgraph-io/dgraph · GitHub
What version of Dgraph are you using?
Seen in 1.1.1, 1.2.1, 1.2.2, 20.3.0; seems worse in latest version.
Have you tried reproducing the issue with the latest release?
Yes
What is the hardware spec (RAM, OS)?
Ubuntu 18.04.4 on an EC2 m5a.8xlarge (128GB of RAM) (using an io1 EBS volume with 3000 iops provisioned).
Steps to reproduce the issue (command/config used to run Dgraph).
dgraph alpha --lru_mb 6000 --zero localhost:5080 --query_edge_limit 9223372036854775807
dgraph zero -w zw --telemetry=false
- Load about 23M edges representing the source code of a large go project.
- Run a messy generated query (see below) and get results indicating success (below).
- Repeatedly load about 500 more edges, representing multiple independent copies of the source code of a small test project, and run a simple query against the new project. (The only point of contact between the two is that each project is reachable from the root node with UID 1 via
___child
edges.) - Run the original query (for the large project), and see results indicating failure (below).
- Restart zero and alpha, and observe that the failure still happens exactly as in (4).
Note that I’m not certain that (3) is necessary to trigger the failure, but the mutation spam seems to be sufficient to make it happen reliably within an hour or so.
Expected behaviour and actual result.
We expect that adding more data to one part of the graph would not change the results returned by a query concentrating on another part of the graph. We actually see that some parts of the query which filter on the ___s_name
predicate start returning no results.
Note that:
- part of the query – which uses the same input nodes and doesn’t filter on
___s_name
– still returns correct results. - another part of the query – which uses different input nodes and filters on
___s_name
– still returns correct results. -
___s_name
has a “hash” index.
Query
Forgive the mess; it’s autogenerated, and I’m sure it could be made much nicer, but it currently runs well enough for our purposes in general.
{
root as root(func: uid(1)) {
uid
}
_Y(func: uid(root)) {
_Z as ___child {
uid
}
uid
}
// Note that there's a successful filter on ___s_name here.
_X as _a(func: eq(___kind,"git.repo")) @filter(eq(___s_owner,"juju") and eq(___s_host,"github.com") and eq(___s_name,"juju") and uid(_Z)) {
___s_owner
___s_host
___s_name
uid
}
// Following 9 blocks are not interesting.
_V(func: uid(_X)) {
_W as ___child {
uid
}
uid
}
_f as _b(func: eq(___kind,"git.commit")) @filter(eq(___s_sha,"ad1c30d8cad8736ff19de9440a066bacee58b743") and uid(_W)) {
___s_sha
uid
}
_d(func: uid(_f)) {
_e as ___child {
uid
}
uid
}
_c(func: eq(___kind,"gotypes.project")) @filter(uid(_e)) {
uid
}
_U(func: uid(_f)) @recurse(depth: 6) {
_T as ___child
uid
}
_S as _g(func: eq(___commonkind,"common.dir")) @filter(eq(___s_filename,".") and uid(_T)) {
___s_filename
uid
}
_Q(func: uid(_S)) {
_R as ___child {
uid
}
uid
}
_P as _h(func: eq(___kind,"gotypes.package")) @filter(uid(_R)) {
___s_name
uid
}
_N(func: uid(_P)) {
_O as ___child {
uid
}
uid
}
// This is where it starts to get interesting; there are 3 very similar constructs all based on _q
_q as _i(func: eq(___kind,"gotypes.named")) @filter(uid(_O)) {
___s_name
uid
}
// First example, down to _k
_o(func: uid(_q)) {
_p as ___child {
uid
}
uid
}
_n as _j(func: eq(___kind,"gotypes.method")) @filter(uid(_p)) {
uid
}
_l(func: uid(_n)) {
_m as ___reference {
uid
}
uid
}
_k(func: eq(___kind,"gotypes.func")) @filter(eq(___s_name,"Kill") and uid(_m)) {
___s_name
uid
}
// Second example, down to _s
_w(func: uid(_q)) {
_x as ___child {
uid
}
uid
}
_v as _r(func: eq(___kind,"gotypes.method")) @filter(uid(_x)) {
uid
}
_t(func: uid(_v)) {
_u as ___reference {
uid
}
uid
}
_s(func: eq(___kind,"gotypes.func")) @filter(eq(___s_name,"Wait") and uid(_u)) {
___s_name
uid
}
// Third example, down to _z
_L(func: uid(_q)) {
_M as ___child {
uid
}
uid
}
_K as _y(func: eq(___kind,"gotypes.method")) @filter(uid(_M)) {
uid
}
_I(func: uid(_K)) {
_J as ___reference {
uid
}
uid
}
_H as _z(func: eq(___kind,"gotypes.func")) @filter(uid(_J)) {
uid
}
// Irrelevant from here on.
_F(func: uid(_H)) {
_G as ___link {
uid
}
uid
}
_E as _A(func: eq(___kind,"gotypes.func_decl")) @filter(uid(_G)) {
uid
}
_D(func: uid(_E)) @recurse(depth: 1001) {
_C as ___child
uid
}
_B(func: eq(___kind,"gotypes.go_stmt")) @filter(uid(_C)) {
___s_filename
___i_start_offset
___i_end_offset
uid
}
}
Success
block: bytes-of-json → result-count
root: 15 -> 1
_a: 82 -> 1
_b: 74 -> 1
_c: 20 -> 1
_d: 52 -> 1
_g: 40 -> 1
_h: 30733 -> 706
_i: 341963 -> 6971
_j: 1335866 -> 68934
_k: 6324 -> 165
_l: 3844097 -> 68934
_o: 1532028 -> 6971
_r: 1335866 -> 68934
_s: 6834 -> 178
_t: 3844097 -> 68934
_w: 1532028 -> 6971
_y: 1335866 -> 68934
_z: 440528 -> 22738
_A: 337067 -> 17396
_B: 6740 -> 99
_D: 42797376 -> 17396
_F: 1128102 -> 22738
_I: 3844097 -> 68934
_L: 1532028 -> 6971
_N: 830905 -> 706
_Q: 13646 -> 1
_U: 7847179 -> 1
_V: 67 -> 1
_Y: 2348 -> 1
Failure
block: bytes-of-json → result-count
root: 15 -> 1
_a: 82 -> 1
_b: 74 -> 1
_c: 20 -> 1
_d: 52 -> 1
_g: 40 -> 1
_h: 30733 -> 706
_i: 341963 -> 6971
_j: 1335866 -> 68934
_k: 2 -> 0 * smaller, filter is rejecting everything
_l: 3844097 -> 68934
_o: 1532028 -> 6971
_r: 1335866 -> 68934
_s: 2 -> 0 * smaller, filter is rejecting everything
_t: 3844097 -> 68934
_w: 1532028 -> 6971
_y: 1335866 -> 68934
_z: 440528 -> 22738
_A: 337067 -> 17396
_B: 6740 -> 99
_D: 42797376 -> 17396
_F: 1128102 -> 22738
_I: 3844097 -> 68934
_L: 1532028 -> 6971
_N: 830905 -> 706
_Q: 13646 -> 1
_U: 7847179 -> 1
_V: 67 -> 1
_Y: 3982 -> 1 * bigger, but not unexpected, root has more children now
Note that _I
, _t
, and _l
are identical in both cases; they feed into _k
, _s
and _z
respectively, in which _k
and _s
filter on ___s_name
(and start returning no results) and _z
doesn’t (and returns the exact same results even when _k
and _s
start failing).