Dgraph data not appearing after successful bulk import

We upgraded to Dgraph v1.1, and after importing our exported data, the data and schema are nowhere to be found.

Here was our process:
We exported our data; shut down and removed all zero, alpha and ratel nodes; cleared the existing data directories, and started the zero nodes.
We ran gunzip on the schema file and then ran the bulk import on the zero leader nodes, which completed successfully (as far as we can tell). We then started the Alpha and Ratel nodes. When we connect to Dgraph from Ratel, there is no schema. The Explorer shows no data. Our queries that previously returned data are not returning any data. What could be the issue?

Here is the output from the bulk import that indicates it was successful:

root@dec09:/dgraph# dgraph bulk -f /dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz -s /dgraph/export/dgraph.r1233498.u1021.2005/g01.schema --map_shards=2 --reduce_shards=1 --http localhost:8002 --zero=localhost:5082
[Decoder]: Using assembly version of decoder
I1021 21:47:36.031380 78 init.go:98]

Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.

{
“DataFiles”: “/dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz”,
“DataFormat”: “”,
“SchemaFile”: “/dgraph/export/dgraph.r1233498.u1021.2005/g01.schema”,
“OutDir”: “./out”,
“ReplaceOutDir”: false,
“TmpDir”: “tmp”,
“NumGoroutines”: 2,
“MapBufSize”: 67108864,
“SkipMapPhase”: false,
“CleanupTmp”: true,
“NumReducers”: 1,
“Version”: false,
“StoreXids”: false,
“ZeroAddr”: “localhost:5082”,
“HttpAddr”: “localhost:8002”,
“IgnoreErrors”: false,
“CustomTokenizers”: “”,
“NewUids”: false,
“MapShards”: 2,
“ReduceShards”: 1
}
Connecting to zero at localhost:5082
Processing file (1 out of 1): /dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz
[21:47:37Z] MAP 01s nquad_count:216.0k err_count:0.000 nquad_speed:214.6k/sec edge_count:216.0k edge_speed:214.6k/sec
[21:47:38Z] MAP 02s nquad_count:453.3k err_count:0.000 nquad_speed:225.9k/sec edge_count:453.3k edge_speed:225.9k/sec
[21:47:39Z] MAP 03s nquad_count:615.0k err_count:0.000 nquad_speed:204.5k/sec edge_count:691.4k edge_speed:229.9k/sec
[21:47:40Z] MAP 04s nquad_count:785.9k err_count:0.000 nquad_speed:196.1k/sec edge_count:1.033M edge_speed:257.9k/sec
[21:47:41Z] MAP 05s nquad_count:1.035M err_count:0.000 nquad_speed:205.7k/sec edge_count:1.295M edge_speed:257.3k/sec
[21:47:42Z] MAP 06s nquad_count:1.223M err_count:0.000 nquad_speed:202.7k/sec edge_count:1.656M edge_speed:274.5k/sec
[21:47:43Z] MAP 07s nquad_count:1.426M err_count:0.000 nquad_speed:202.7k/sec edge_count:1.935M edge_speed:275.1k/sec
[21:47:44Z] MAP 08s nquad_count:1.605M err_count:0.000 nquad_speed:199.8k/sec edge_count:2.250M edge_speed:280.0k/sec
[21:47:45Z] MAP 09s nquad_count:1.634M err_count:0.000 nquad_speed:180.9k/sec edge_count:2.308M edge_speed:255.5k/sec
GC: 20. InUse: 927 MB. Idle: 51 MB
[21:47:46Z] MAP 10s nquad_count:1.634M err_count:0.000 nquad_speed:162.9k/sec edge_count:2.308M edge_speed:230.0k/sec
Shard tmp/shards/000 -> Reduce tmp/shards/shard_0/000
Shard tmp/shards/001 -> Reduce tmp/shards/shard_0/001
[21:47:47Z] REDUCE 11s 22.41% edge_count:517.3k edge_speed:517.3k/sec plist_count:517.3k plist_speed:517.3k/sec
Schema for pred options specifies that this is not a list but more than one UID has been found. Forcing the schema to be a list to avoid any data loss. Please fix the data to your specifications once Dgraph is up.
Schema for pred products specifies that this is not a list but more than one UID has been found. Forcing the schema to be a list to avoid any data loss. Please fix the data to your specifications once Dgraph is up.
[21:47:48Z] REDUCE 12s 79.48% edge_count:1.834M edge_speed:1.831M/sec plist_count:1.331M plist_speed:1.329M/sec
[21:47:48Z] REDUCE 12s 100.00% edge_count:2.308M edge_speed:1.251M/sec plist_count:1.804M plist_speed:978.3k/sec
Total: 12s

What could be causing this?

Based on this message from the bulk loader, you need to update your schema to use [uid] for your predicates that describe one-to-many uid relationships.

See this doc for all the details about migrating to Dgraph v1.1: Get started with Dgraph

1 Like

We edited the schema file and repeated the import process, and the data is still not appearing.

root@dec07:/dgraph# dgraph bulk -f /dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz -s /dgraph/export/dgraph.r1233498.u1021.2005/g01.schema --map_shards=2 --reduce_shards=1 --http localhost:8002 --zero=localhost:5082
[Decoder]: Using assembly version of decoder
I1021 22:58:14.859411 37 init.go:98]

Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.

{
“DataFiles”: “/dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz”,
“DataFormat”: “”,
“SchemaFile”: “/dgraph/export/dgraph.r1233498.u1021.2005/g01.schema”,
“OutDir”: “./out”,
“ReplaceOutDir”: false,
“TmpDir”: “tmp”,
“NumGoroutines”: 2,
“MapBufSize”: 67108864,
“SkipMapPhase”: false,
“CleanupTmp”: true,
“NumReducers”: 1,
“Version”: false,
“StoreXids”: false,
“ZeroAddr”: “localhost:5082”,
“HttpAddr”: “localhost:8002”,
“IgnoreErrors”: false,
“CustomTokenizers”: “”,
“NewUids”: false,
“MapShards”: 2,
“ReduceShards”: 1
}
Connecting to zero at localhost:5082
Processing file (1 out of 1): /dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz
[22:58:15Z] MAP 01s nquad_count:200.9k err_count:0.000 nquad_speed:199.6k/sec edge_count:200.9k edge_speed:199.6k/sec
[22:58:16Z] MAP 02s nquad_count:430.2k err_count:0.000 nquad_speed:214.4k/sec edge_count:430.2k edge_speed:214.4k/sec
[22:58:17Z] MAP 03s nquad_count:611.6k err_count:0.000 nquad_speed:203.4k/sec edge_count:684.6k edge_speed:227.7k/sec
[22:58:18Z] MAP 04s nquad_count:775.8k err_count:0.000 nquad_speed:193.6k/sec edge_count:1.013M edge_speed:252.8k/sec
[22:58:19Z] MAP 05s nquad_count:1.022M err_count:0.000 nquad_speed:204.1k/sec edge_count:1.280M edge_speed:255.5k/sec
[22:58:20Z] MAP 06s nquad_count:1.201M err_count:0.000 nquad_speed:200.0k/sec edge_count:1.613M edge_speed:268.4k/sec
[22:58:21Z] MAP 07s nquad_count:1.431M err_count:0.000 nquad_speed:204.1k/sec edge_count:1.941M edge_speed:277.0k/sec
[22:58:22Z] MAP 08s nquad_count:1.616M err_count:0.000 nquad_speed:201.8k/sec edge_count:2.271M edge_speed:283.6k/sec
[22:58:23Z] MAP 09s nquad_count:1.634M err_count:0.000 nquad_speed:181.4k/sec edge_count:2.308M edge_speed:256.2k/sec
[22:58:24Z] MAP 10s nquad_count:1.634M err_count:0.000 nquad_speed:163.3k/sec edge_count:2.308M edge_speed:230.6k/sec
GC: 20. InUse: 965 MB. Idle: 35 MB
Shard tmp/shards/000 → Reduce tmp/shards/shard_0/000
Shard tmp/shards/001 → Reduce tmp/shards/shard_0/001
[22:58:25Z] REDUCE 11s 14.44% edge_count:333.3k edge_speed:333.3k/sec plist_count:333.3k plist_speed:333.3k/sec
[22:58:26Z] REDUCE 12s 53.32% edge_count:1.231M edge_speed:1.230M/sec plist_count:897.4k plist_speed:897.2k/sec
[22:58:27Z] REDUCE 13s 99.81% edge_count:2.303M edge_speed:1.151M/sec plist_count:1.800M plist_speed:899.8k/sec
[22:58:28Z] REDUCE 13s 100.00% edge_count:2.308M edge_speed:1.002M/sec plist_count:1.804M plist_speed:783.1k/sec
Total: 13s

Can you share the exact steps you’ve done to load the data via bulk loader? You’ll need to keep the same Zero using during the bulk loader when starting the Alphas to run the cluster.

  1. Connected to an Alpha node via docker-standalone exec -it dgraph-alpha /bin/bash
  2. Ran export via curl localhost:8082/admin/export
  3. Exited container
  4. Copied export directory to shared directory
  5. Backed up data files from Alpha and Zero nodes by copying to different shared directory
  6. Stopped Ratel nodes
  7. Stopped Alpha nodes
  8. Stopped Zero nodes
  9. Removed Ratel, Alpha, and Zero containers
  10. Deleted all data files
  11. Started Zero nodes
  12. Identified leader node from logs
  13. Ran gunzip on exported schema
    14 Edited uid to [uid] in all instances in the unzipped schema file
  14. Copied exported schema file and data file to Dgraph zero leader node
  15. Ran this command on the Dgraph zero leader node:
  16. dgraph bulk -f /dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz -s /dgraph/export/dgraph.r1233498.u1021.2005/g01.schema --map_shards=2 --reduce_shards=1 --http localhost:8002 --zero=localhost:5082
  17. Started the Alpha containers
  18. Started the Ratel container
  19. Inspected data in Ratel

Between steps 16 and 17 you need to make sure that Alpha is set up to read p directory outputted from the bulk loader.

By default the bulk loader outputs to ./out/0/p. You need to move it to Alpha’s current working directory (or change the --postings flag to specify the p directory path) before starting the Alpha process.

1 Like

This is what we get when we start our first Alpha node after copying the ./out/0/p files into the Alpha working directory:

2019/10/21 23:40:23 Error while creating badger KV posting store error: manifest has unsupported version: 7 (we support 4)
I1021 23:40:49.627346 1 init.go:88]
Dgraph version : v1.0.16
Commit SHA-1 : 0590ee95
Commit timestamp : 2019-07-11 11:52:54 -0700
Branch : HEAD
Go version : go1.12.5
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
I1021 23:40:49.627513 1 run.go:461] x.Config: {DebugMode:false PortOffset:2 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000}
I1021 23:40:49.627529 1 run.go:462] worker.Config: {ExportPath:export NumPendingProposals:256 Tracing:1 MyAddr:dec07.overstock.com:7082 ZeroAddr:dec07.overstock.com:5082 RaftId:0 ExpandEdge:true WhiteListedIPRanges: MaxRetries:-1 StrictMutations:false SnapshotAfter:10000 AbortOlderThan:5m0s}
I1021 23:40:49.627559 1 run.go:463] edgraph.Config: {PostingDir:p BadgerTables:mmap BadgerVlog:mmap WALDir:w MutationsMode:0 AuthToken: AllottedMemory:5250}
I1021 23:40:49.627599 1 server.go:115] Setting Badger table load option: mmap
I1021 23:40:49.627603 1 server.go:127] Setting Badger value log load option: mmap
I1021 23:40:49.627608 1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x1f86470 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
I1021 23:40:49.630394 1 node.go:88] All 0 tables opened in 0s
I1021 23:40:49.634572 1 node.go:88] Replaying file id: 0 at offset: 0
I1021 23:40:49.634604 1 node.go:88] Replay took: 24.612µs
I1021 23:40:49.634672 1 server.go:115] Setting Badger table load option: mmap
I1021 23:40:49.634679 1 server.go:127] Setting Badger value log load option: mmap
I1021 23:40:49.634683 1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x1f86470 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
2019/10/21 23:40:49 Error while creating badger KV posting store error: manifest has unsupported version: 7 (we support 4)
. . .
[Container is restarting]

The Dgraph versions for your Dgraph Bulk Loader and Dgraph Alpha do not match. They should both be using the same version, e.g., v1.1.0.

Bulk loader version:

Alpha version:

1 Like

That’s definitely the issue. We ran the wrong command to update the Alpha version. Thanks for catching it.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.