Query results are flaky and inconsistent with Offset usage

Moved from GitHub dgraph/4713

Posted by koculu:

What version of Dgraph are you using?


Have you tried reproducing the issue with the latest release?

I tried with v1.2.0 and this is still reproducible.

What is the hardware spec (RAM, OS)?

i7-6850K 3.60 Ghz, 6 Cores
64 GB DDR4 3200 Mhz
Samsung Evo SSD
Windows 10

Steps to reproduce the issue (command/config used to run Dgraph).

Download zip file: https://drive.google.com/open?id=15zyfs3j9A8Kzf_gIS1KXT4CSbA9RJl-e
execute dgraph/run-dgraph.bat
This will start dgraph instances.
Execute query.txt included in the zip file.

Expected behaviour and actual result.

The query results are not consistent with offset.

There are 4 queries inside the query.txt file.
First query is flaky. Rarely it returns 1 and mostly 8 which is true.
Second query always returns 69069 (that looks okay)
Third Query returns 0 (incorrect)
Fourth Query returns 0 (incorrect)

koculu commented :

If I drop index from pm-copy-count predicate, query result becomes correct.
I have faced similar situations before. When an index created on a predicate, offset and count results and normal queries become unstable by time.

MichelDiz commented :

Please, provide the dataset in RDF. I can’t run Windows right now. it will take time to do so. My virtual box don’t wanna install Windows.

koculu commented :

The issue is related to the current state of the database and exporting and re-importing would clear the problem.

dhagrow commented :

We are currently experiencing an issue very much like this. We have a query that looks like this:

  q(func: type(node)) @cascade {
    ~user.node @filter(eq(user.name, "dave")) {
      ~process.user @filter(gt(process.started_at, "2020-03-25T00:00:00")) {

Using that query, the earliest value for process.started_at in the results is 2020-03-26T17:04:22.98387Z, and we get 13 “process” nodes.

If the filter is set to 2020-03-26T17:05:00, the earliest value for process.started_at is 2020-03-26T17:06:01.320919Z, with 12 “process” nodes, which is correct (one node is filtered out).

However, if the filter is removed, the earliest value for process.started_at is 2020-03-25T15:57:05.245431Z, with 597 “process” nodes. That date is clearly greater than the date set by the original filter.

When using the filter, regardless of how far back we set the date, the first result is 2020-03-26T17:04:22.98387Z.

This is a scenario I’ve been able to reproduce just now. We first noticed this problem when a query would return some results (but not always all the expected results), and a few seconds later would return no results at all.

The only mild indication of anything wrong on the server end is this warning on 1 (of 6) alpha:

W0326 17:51:42.073138       1 draft.go:1014] Raft.Ready took too long to process: Timer Total: 655ms. Breakdown: [{sync 655ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries:
1. MustSync: true

dhagrow commented :

This simpler query also exhibits the same problem:

  q(func: type(process)) @filter(gt(process.started_at, "2020-03-25T00:00:00")) {

MichelDiz commented :

@dhagrow your issue seems not related to offset. Please create a new issue with details of your context.

dhagrow commented :

@MichelDiz Ok, will do. Some of the behavior seemed similar to what we are seeing, so I suspected it might be the same underlying issue.

MichelDiz commented :

@dhagrow I already saw that you created it.

Even it is similar tho, you have to create a new issue. As it is not 100% the same issue. We can’t track two issues on the same ticket. Certainly, your issue would be forgotten in the middle of the conversation.

We just tag duplicated only in those issues that are 100% equal. In his case, he thinks it is related to “offset”, which is a very specific functionality. And you’re not doing the same.

But if you see that have some issue similar to yours, you need to tag them. As it helps to give more context. And further investigation about duplicated.