Missing data after live upload

Report a Dgraph Bug

Hey team,
On the server (one node) I ran:

curl --request POST   --url http://127.0.0.1:8080/admin   --header 'content-type: application/graphql'   --data 'mutation {
  export(input: {
    format: "rdf"
    destination: "/host/dg-export"
  }) {
    response {
      message
      code
    }
  }
}'
  1. Downloaded the dump to my dev computer
  2. imported them using live loader
dgraph live --files /import/g01.rdf.gz --schema /import/g01.schema.gz --zero dg-zero:5080

Then I noticed some of the data is missing even though the RDF file and achema file declare them correctly.
Checked the origin server and the data is there. (query that return data on it wont return the same on the dev one)

The revent type is (copied from the dumped schema):

<tnnt.cn>:string @index(exact) . 
<tnnt.id>:string @index(hash) @upsert .
<tnnt.account>:string @index(exact) . 
<tnnt.isDeleted>:bool @index(bool) . 
<tnnt.isEnabled>:bool . 

type <Tenant> {
        tnnt.id
        tnnt.account
        tnnt.cn
        tnnt.isDeleted
        tnnt.isEnabled
}

The RDF file has the data:

$ grep 0x27c879 g01.rdf

<0x27c879> <tnnt.cn> "nsg42"^^<xs:string> .
<0x27c879> <tnnt.id> "4f31eb6e-16af-4f65-bb43-bf32a8b4ed6f"^^<xs:string> .
<0x27c879> <dgraph.type> "Tenant"^^<xs:string> .
<0x27c879> <tnnt.account> "nsg42"^^<xs:string> .
<0x27c879> <tnnt.isDeleted> "false"^^<xs:boolean> .
<0x27c879> <tnnt.isEnabled> "true"^^<xs:boolean> .

And the missing predicate are:

  • tnnt.account
  • tnnt.isDeleted
  • tnnt.isEnabled

Any idea?

What version of Dgraph are you using?

Dgraph Version
$ dgraph version
 
[Decoder]: Using assembly version of decoder
Page Size: 4096

Dgraph version   : v20.11.3
Dgraph codename  : tchalla-3
Dgraph SHA-256   : c3c1474369415e74b1a59ec7053cd1e585c9d55fe71243c72c48e313728d995a
Commit SHA-1     : 8d3eb766c
Commit timestamp : 2021-03-31 17:28:12 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit http://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

Have you tried reproducing the issue with the latest release?

Nope

What is the hardware spec (RAM, OS)?

  • Server is Ubuntu 20.04
  • Dev is Fedora release 34

Both running dgraph using docker (same image)

Steps to reproduce the issue (command/config used to run Dgraph).

$ cat docker-compose.yml

version: "3.2"
services:
  zero:
    container_name: dg-zero
    image: dgraph/dgraph:v20.11.3
    volumes:
      - $PWD/data:/dgraph
    ports:
      - 5080:5080
      - 6080:6080
    restart: on-failure
    command: dgraph zero --my=zero:5080
  alpha:
    container_name: dg-alpha
    image: dgraph/dgraph:v20.11.3
    volumes:
      - $PWD/data:/dgraph
      - /tmp:/host
    ports:
      - 8080:8080
      - 9080:9080
      - 7080:7080
    restart: on-failure
    command: dgraph alpha --my=server:7080 --lru_mb=2048 --zero=zero:5080 --my=alpha:7080 --whitelist 172.0.0.0/8
  ratel:
    container_name: dg-ratel
    image: dgraph/dgraph:v20.11.3
    ports:
      - 8000:8000

Expected behaviour and actual result.


hi @mbn18 ,

Strange, I’ve tried to live load the schema and rdf data you shared here in Dgraph v20.11.3 and all the data seems inserted correctly.

I’ve ran this query:

{
  node(func: uid(0x27c879)) {
    uid
    expand(_all_) {
      uid
      expand(_all_)
    }
  }
}

and it returning expected data (based on the rdf file you shared)

{
  "data": {
    "node": [
      {
        "uid": "0x27c879",
        "tnnt.id": "4f31eb6e-16af-4f65-bb43-bf32a8b4ed6f",
        "tnnt.account": "nsg42",
        "tnnt.cn": "nsg42",
        "tnnt.isDeleted": false,
        "tnnt.isEnabled": true
      }
    ]
  },

Same for the tnnt.isDeleted and tnnt.isEnabled

Can you try to start from scratch and live load the data again, once done can you share the output of live loader

Best,

Hey @omar ,

Ran on a 3rd server and the import worked well for the same steps.
Will try tomorrow on the failed node (though I did tested twice, 1st with schema flag and 2nd time without)

Hey @omar ,

Tested again on the fault dev server and the data is missing.
The server spec is:
AMD Ryzen 9 5900X 12-Core Processor
32GB RAM
1GB NVMe SSD drive

Software:
Fedora 34
FS - is btrfs

This is the procedure I followed to import the data (on both servers).

  1. Delete Dgraph data dir
  2. start dockers
  3. Run dgraph live

The query to validate if data exist:

{
  q(func:type(Tenant)) {
    expand(_all_)
  }
}

Attached:

  • dgraph log when I init the DB
  • dgraph log while dgraph live run
  • dgraph live output

Best,
Miki

dgraph-init.log (26.4 KB)
dgraph-import.log (92.5 KB)
dgraph-live.log (4.1 KB)

Hi @omar ,
Any news regarding this issue?

Are you sure the ones you are looking for are really missing or you have picked up some nodes that wouldn’t have them?

Have you checked the exported RFD file and compared to the source?

Do you still have the old cluster files? Did you export for some reason other than upgrade?

Hey @MichelDiz ,

Yes, same query yield different answers on servers that are running same dockers and follow the same import.

Yes, example is in the first post.

Still have the exported file.
The reason for the export was that I wanted to copy production data to dev server.

Thanks