Bulk loader --> data not accessible

Hi all

This topic is very similar to http://discuss.dgraph.io/t/bulk-loader-no-data-schema-are-imported-into-the-dgraph-server/3138, still the suggested solution there does not work in my case. I went through a number of typical steps to load data into dgraph:

  1. dgraph zero
  2. dgraph bulk -r goldendata.rdf -s goldendata.schema --http localhost:8090 --zero localhost:5080
  3. dgraph server --lru_mb 2048 --zero localhost:5080 -o 10

-o 10 in the server statement sets the server http port to 8090, which was also considered in the bulk statement.

However, as indicated also in the link above, there seems to be no data, neither in the ratel interface, nor in a pydgraph client I am using.

The following output I receive from the three scripts:

zero:

dgraph zero

Dgraph version   : v1.0.8
Commit SHA-1     : 1dd8376f
Commit timestamp : 2018-08-31 10:47:07 -0700
Branch           : HEAD

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.


Setting up grpc listener at: 0.0.0.0:5080
Setting up http listener at: 0.0.0.0:6080
2018/09/25 14:42:48 raft.go:363: Restarting node for dgraphzero
2018/09/25 14:42:48 pool.go:108: == CONNECTED ==> Setting localhost:7090
2018/09/25 14:42:48 raft.go:567: INFO: 1 became follower at term 11
2018/09/25 14:42:48 raft.go:315: INFO: newRaft 1 [peers: [1], term: 11, commit: 29711, applied: 26647, lastindex: 29711, lastterm: 11]
Running Dgraph zero...
2018/09/25 14:42:51 raft.go:749: INFO: 1 is starting a new election at term 11
2018/09/25 14:42:51 raft.go:594: INFO: 1 became pre-candidate at term 11
2018/09/25 14:42:51 raft.go:664: INFO: 1 received MsgPreVoteResp from 1 at term 11
2018/09/25 14:42:51 raft.go:580: INFO: 1 became candidate at term 12
2018/09/25 14:42:51 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 12
2018/09/25 14:42:51 raft.go:621: INFO: 1 became leader at term 12
2018/09/25 14:42:51 node.go:301: INFO: raft.node: 1 elected leader 1 at term 12
2018/09/25 14:42:58 pool.go:162: Echo error from localhost:7090. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:7090: connect: connection refused"
2018/09/25 14:43:00 zero.go:365: Got connection request: id:1 addr:"localhost:7090" 
2018/09/25 14:43:00 zero.go:474: Connected: id:1 addr:"localhost:7090" 
2018/09/25 14:43:08 pool.go:173: Connection established with localhost:7090
2018/09/25 14:43:18 oracle.go:91: Purged below ts:100011, len(o.commits):0, len(o.rowCommit):0

bulk:

dgraph bulk -r goldendata.rdf -s goldendata.schema --http localhost:8090 --zero localhost:5080
{
	"RDFDir": "goldendata.rdf",
	"SchemaFile": "goldendata.schema",
	"DgraphsDir": "out",
	"TmpDir": "tmp",
	"NumGoroutines": 8,
	"MapBufSize": 67108864,
	"ExpandEdges": true,
	"SkipMapPhase": false,
	"CleanupTmp": true,
	"NumShufflers": 1,
	"Version": false,
	"StoreXids": false,
	"ZeroAddr": "localhost:5080",
	"HttpAddr": "localhost:8090",
	"MapShards": 1,
	"ReduceShards": 1
}
The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit.
Current max open files limit: 1024
2018/09/25 13:45:10 loader.go:77: Connecting to zero at localhost:5080
MAP 01s rdf_count:340.0 rdf_speed:335.1/sec edge_count:1.108k edge_speed:1.092k/sec
MAP 02s rdf_count:787.0 rdf_speed:390.6/sec edge_count:2.559k edge_speed:1.270k/sec
MAP 03s rdf_count:1.428k rdf_speed:473.6/sec edge_count:4.623k edge_speed:1.533k/sec
MAP 04s rdf_count:2.723k rdf_speed:678.1/sec edge_count:9.004k edge_speed:2.242k/sec
MAP 05s rdf_count:89.59k rdf_speed:17.86k/sec edge_count:288.1k edge_speed:57.44k/sec
MAP 06s rdf_count:700.9k rdf_speed:116.5k/sec edge_count:2.301M edge_speed:382.5k/sec
MAP 07s rdf_count:1.030M rdf_speed:146.9k/sec edge_count:3.394M edge_speed:483.8k/sec
MAP 08s rdf_count:1.121M rdf_speed:139.8k/sec edge_count:3.695M edge_speed:461.0k/sec
MAP 09s rdf_count:1.121M rdf_speed:124.3k/sec edge_count:3.695M edge_speed:409.8k/sec
2018/09/25 13:45:20 merge_shards.go:36: Shard tmp/shards/000 -> Reduce tmp/shards/shard_0/000
REDUCE 10s [15.20%] edge_count:561.5k edge_speed:561.5k/sec plist_count:373.8k plist_speed:373.8k/sec
REDUCE 11s [77.44%] edge_count:2.862M edge_speed:2.861M/sec plist_count:996.8k plist_speed:996.7k/sec
^[TREDUCE 12s [100.00%] edge_count:3.695M edge_speed:1.847M/sec plist_count:1.429M plist_speed:714.5k/sec
REDUCE 12s [100.00%] edge_count:3.695M edge_speed:1.300M/sec plist_count:1.429M plist_speed:502.8k/sec
Total: 12s

server:

dgraph server --lru_mb 2048 --zero localhost:5080 -o 10

Dgraph version   : v1.0.8
Commit SHA-1     : 1dd8376f
Commit timestamp : 2018-08-31 10:47:07 -0700
Branch           : HEAD

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.


2018/09/25 14:43:00 server.go:118: Setting Badger option: ssd
2018/09/25 14:43:00 server.go:134: Setting Badger table load option: mmap
2018/09/25 14:43:00 server.go:147: Setting Badger value log load option: none
2018/09/25 14:43:00 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
2018/09/25 14:43:00 gRPC server started.  Listening on port 9090
2018/09/25 14:43:00 HTTP server started.  Listening on port 8090
2018/09/25 14:43:00 groups.go:80: Current Raft Id: 1
2018/09/25 14:43:00 worker.go:86: Worker listening at address: [::]:7090
2018/09/25 14:43:00 pool.go:108: == CONNECTED ==> Setting localhost:5080
2018/09/25 14:43:00 groups.go:107: Connected to group zero. Assigned group: 0
2018/09/25 14:43:00 draft.go:76: Node ID: 1 with GroupID: 1
2018/09/25 14:43:00 draft.go:963: Restarting node for group: 1
2018/09/25 14:43:00 raft.go:567: INFO: 1 became follower at term 16
2018/09/25 14:43:00 raft.go:315: INFO: newRaft 1 [peers: [1], term: 16, commit: 33742, applied: 33710, lastindex: 33742, lastterm: 16]
2018/09/25 14:43:00 mutation.go:141: Done schema update predicate:"_share_hash_" value_type:STRING directive:INDEX tokenizer:"exact" 
2018/09/25 14:43:00 index.go:927: Dropping predicate: [gasPrice]
2018/09/25 14:43:00 groups.go:507: Got address of a Zero server: localhost:5080
2018/09/25 14:43:00 schema.go:63: Deleting schema for predicate: [gasPrice]
2018/09/25 14:43:00 index.go:927: Dropping predicate: [gas]
2018/09/25 14:43:01 schema.go:63: Deleting schema for predicate: [gas]
2018/09/25 14:43:03 raft.go:749: INFO: 1 is starting a new election at term 16
2018/09/25 14:43:03 raft.go:594: INFO: 1 became pre-candidate at term 16
2018/09/25 14:43:03 raft.go:664: INFO: 1 received MsgPreVoteResp from 1 at term 16
2018/09/25 14:43:03 raft.go:580: INFO: 1 became candidate at term 17
2018/09/25 14:43:03 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 17
2018/09/25 14:43:03 raft.go:621: INFO: 1 became leader at term 17
2018/09/25 14:43:03 node.go:301: INFO: raft.node: 1 elected leader 1 at term 17
2018/09/25 14:43:08 server.go:289: Got schema: [predicate:"_share_hash_" value_type:STRING directive:INDEX tokenizer:"exact" ]
2018/09/25 14:43:08 mutation.go:141: Done schema update predicate:"_share_hash_" value_type:STRING directive:INDEX tokenizer:"exact" 

Thank you for any support!

Remember, you can not use Dgraph Server at the same time as Dgraph BulkLoad.

You do not need this flag, and you do not even need to flag this PORT range for a Bulkload.

Are you doing this below?

is simple, BulkLoad creates the output folder “ Out/* ” each new folder with a number would be a Shard. You can control this by editing the flag “ --reduce_shards ”. Once you have a output done, just get those files in “ out/0/* ” and move to a Dgraph Server from scratch. And run your server.

Or copy the Dgraph binarie to that path and start Dgraph Server from there.
Ref: After using the latest version, both the client and ratelUI are very slow queries - #14 by MichelDiz

Cheers.

Yes, I am executing the above steps in the specified order, i.e. 2: bulk load until done, 3. starting server.

I have this flag because I retrieve another error if I start it without. Seems my port 8080 is blocked, but no clue why (netstat and ps do not indicate any usage of 8080).

What do you mean with “move to a dgraph server”? My dgraph exec is in /usr/local/bin, where it should be. Am I supposed to move the data there or move the files out to the data?
Thanks!

Maybe some Dgraph Server instance is running in the background. Check for PID. Try to free that port.

If you do not install Dgraph, you maybe using the binaries locally. That would be the strategy. But if installed globally, just run the command in the same Path.

The contents of the Bulkload Output must be accessible to the Dgraph Server via the flag “-p /mypath/out/0”