Something happens with indexes or reverse references spontaneously being messed up or deleted

Moved from GitHub dgraph/5160

Posted by nodeworks:

What version of Dgraph are you using?

Dgraph version : v1.2.0
Dgraph SHA-256 : 62e8eccb534b1ff072d4a516ee64f56b7601e3aeccb0b2f9156b0a8e4a1f8a12
Commit SHA-1 : 24b4b7439
Commit timestamp : 2020-01-27 15:53:31 -0800
Branch : HEAD
Go version : go1.13.5

Have you tried reproducing the issue with the latest release?

No

What is the hardware spec (RAM, OS)?

Mac OSx using Docker:

  • Mac OSx Version: 10.15.3
  • Docker version:
    – Engine: 19.03.8
    – Compose: 1.25.4
    – Docker Desktop: 2.2.0.5 (43884) Stable

Steps to reproduce the issue (command/config used to run Dgraph).

I am using Docker compose. The alpha instance is run with this command:

dgraph alpha --my=server:7080 --normalize_node_limit=10000000 --lru_mb=10240 --zero=zero:5080 --whitelist=172.21.0.1:172.30.1.1,127.0.0.1:127.0.2.1 --export=/exports --bindall=true --jaeger.collector=http://crimson_api_jaeger:14268 -p ./out/0/p

The zero instance is run with this command:

dgraph zero --my=zero:5080

I’ve attached my postings, schema, and rdfs. I did a bulk import from a backup about a week ago and it’s been working fine until today. The bulk import command i’m using is:

docker exec -it crimson_api_zero dgraph bulk -f /exports/current/g01.rdf.gz -s /exports/current/g01.schema.gz --reduce_shards=1 --zero=localhost:5080

One standout way to test is to use this query (workflow_id) is an entity:

{
  q(func: type(workflow_id), orderdesc: workflow_id.id, first: 10000) {
    uid
    workflow_id.id
  }
}

results in this:

{
  "data": {
    "q": [
      {
        "uid": "0x958f2b",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f33",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f36",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f3a",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f62",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f68",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f6b",
        "workflow_id.id": 0
      },
      {
        "uid": "0x958f6f",
        "workflow_id.id": 0
      },
      {
        "uid": "0x95b0cd",
        "workflow_id.id": 0
      },
      {
        "uid": "0x95b0cf",
        "workflow_id.id": 0
      }
    ]
  }

then run this query:

{
  q(func: type(workflow_id)) {
    uid
    workflow_id.id
  }
}

which results in the data showing up correctly:

{
  "data": {
    "q": [
      {
        "uid": "0x958c75",
        "workflow_id.id": 750
      },
      {
        "uid": "0x958c76",
        "workflow_id.id": 454
      },
      {
        "uid": "0x958c77",
        "workflow_id.id": 565
      },
      {
        "uid": "0x958c78",
        "workflow_id.id": 608
      },
      {
        "uid": "0x958c79",
        "workflow_id.id": 203
      },
      {
        "uid": "0x958c7a",
        "workflow_id.id": 330
      },
      {
        "uid": "0x958c7b",
        "workflow_id.id": 494
      },
      {
        "uid": "0x958c7c",
        "workflow_id.id": 601
      },
      {
        "uid": "0x958c7d",
        "workflow_id.id": 85
      },
.......

You can see it doesn’t make sense. The workflow_id.id field has an “int” index. This query was working correctly up until yesterday.

Expected behaviour and actual result.

See above.

Files:
Message me for files as they are sensitive and private

nodeworks commented :

I did a bulk import of the exported data I just backed up and the indexes / reverse references are back. It’s really bizarre. This happens to other “types” within the system to. And this happens spontaneously. Doing an export and then a import of the data seems to fix it somehow, if not temporarily.

danielmai commented :

Can you try with v20.03.1? We had made some bug fixes there, including #5255.

Jaeger traces for the query that doesn’t work and the one that does would be good to see as well, so sharing those would be useful if you have them.

danielmai commented :

Hey @nodeworks after upgrading to v20.03.1 are you still seeing this issue?

nodeworks commented :

Hey @danielmai, I haven’t experienced this issue after upgrading to v20.03.1. This is even after I tried following the steps to recreate the issue. So I believe we can mark this as fixed!