Having trouble with bulk loading

Hello!

I’m trying my hand at using the bulk loader in preparation for migrating into 1.1.0, but I’m having some trouble.

In my docker-compose.yml here are the relevant parts:

  run:
    image: dgraph/dgraph:v1.0.17
    volumes:
      - ./:/tierraserver
      - ./dgraphExport:/export
      - type: volume
        source: ./dgraph-test
        target: /dgraph
        volume:
          nocopy: true

  zero-test:
    image: dgraph/dgraph:v1.0.17
    volumes:
      - ./dgraphTest:/dgraph
    ports:
      - 5082:5080
      - 6082:6080
    restart: on-failure
    command: dgraph zero --my=zero-test:5080
    
  dgraph-test:
    image: dgraph/dgraph:v1.0.17
    volumes:
      - type: volume
        source: ./dgraph-test
        target: /dgraph
        volume:
          nocopy: true
    ports:
      - 8082:8080
      - 9082:9080
    restart: on-failure
    command: dgraph alpha --my=dgraph-test:7080 --lru_mb=2048 --zero=zero-test:5080

As you can see I’m making the dgraph volume visible on my host machine to see what’s going on.

So when I run

docker-compose up zero-test

I see the ‘zw’ folder being created.

Then I use my poorly named “run” service (which I use just to run whatever dgraph command I like) to do the bulk load. For now I’m keeping it simple, with the intention of using only the single dgraph alpha. I run this command (baring in mind I use -r instead of -f because this is still v1.0.17):

docker-compose run run dgraph bulk -r /export/gbbackup/g01.rdf.gz -s /export/gbbackup/g01.schema.gz --zero=zero-test:5080

And something seems to happen! This is the output:

Dgraph version   : v1.0.17
Commit SHA-1     : 42deb299
Commit timestamp : 2019-08-30 12:46:31 -0700
Branch           : HEAD
Go version       : go1.12.7

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.


{
	"RDFDir": "/export/gbbackup/g01.rdf.gz",
	"JSONDir": "",
	"SchemaFile": "/export/gbbackup/g01.schema.gz",
	"DgraphsDir": "out",
	"TmpDir": "tmp",
	"NumGoroutines": 2,
	"MapBufSize": 67108864,
	"ExpandEdges": true,
	"SkipMapPhase": false,
	"CleanupTmp": true,
	"NumShufflers": 1,
	"Version": false,
	"StoreXids": false,
	"ZeroAddr": "zero-test:5080",
	"HttpAddr": "localhost:8080",
	"IgnoreErrors": false,
	"CustomTokenizers": "",
	"MapShards": 1,
	"ReduceShards": 1
}
Connecting to zero at zero-test:5080
badger 2019/09/13 16:25:09 INFO: All 0 tables opened in 0s
Processing file (1 out of 1): /export/gbbackup/g01.rdf.gz
MAP 01s nquad_count:803.0 err_count:0.000 nquad_speed:797.5/sec edge_count:2.409k edge_speed:2.393k/sec
MAP 02s nquad_count:56.01k err_count:0.000 nquad_speed:27.91k/sec edge_count:165.1k edge_speed:82.27k/sec
MAP 03s nquad_count:246.0k err_count:0.000 nquad_speed:81.79k/sec edge_count:1.013M edge_speed:336.7k/sec
MAP 04s nquad_count:392.3k err_count:0.000 nquad_speed:97.87k/sec edge_count:1.394M edge_speed:347.9k/sec
MAP 05s nquad_count:392.3k err_count:0.000 nquad_speed:78.31k/sec edge_count:1.394M edge_speed:278.4k/sec
badger 2019/09/13 16:25:15 DEBUG: Storing value log head: {Fid:0 Len:45 Offset:4667554}
badger 2019/09/13 16:25:15 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2019/09/13 16:25:15 INFO: Running for level: 0
badger 2019/09/13 16:25:15 DEBUG: LOG Compact. Added 106009 keys. Skipped 0 keys. Iteration took: 26.6421ms
badger 2019/09/13 16:25:15 DEBUG: Discard stats: map[]
badger 2019/09/13 16:25:15 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 50.3939ms
badger 2019/09/13 16:25:15 INFO: Compaction for level: 0 DONE
badger 2019/09/13 16:25:15 INFO: Force compaction on level 0 done
MAP 06s nquad_count:392.3k err_count:0.000 nquad_speed:65.26k/sec edge_count:1.394M edge_speed:232.0k/sec
Shard tmp/shards/000 -> Reduce tmp/shards/shard_0/000
badger 2019/09/13 16:25:15 INFO: All 0 tables opened in 0s
REDUCE 07s [91.13%] edge_count:1.271M edge_speed:1.271M/sec plist_count:456.8k plist_speed:456.8k/sec
badger 2019/09/13 16:25:16 DEBUG: Storing value log head: {Fid:0 Len:42 Offset:53099616}
badger 2019/09/13 16:25:16 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2019/09/13 16:25:16 INFO: Running for level: 0
badger 2019/09/13 16:25:17 DEBUG: LOG Compact. Added 549311 keys. Skipped 0 keys. Iteration took: 141.0213ms
badger 2019/09/13 16:25:17 DEBUG: Discard stats: map[]
badger 2019/09/13 16:25:17 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 300.2591ms
badger 2019/09/13 16:25:17 INFO: Compaction for level: 0 DONE
badger 2019/09/13 16:25:17 INFO: Force compaction on level 0 done
REDUCE 07s [100.00%] edge_count:1.394M edge_speed:1.681M/sec plist_count:549.1k plist_speed:662.1k/sec
Total: 07s

So something is loading

I know the data is there because I can make it work with the live loader (but sadly that means I don’t get to preserve my uids, which would be ideal).

But in my ‘dgraphTest’ folder I see no p, or w folders. Nor do I see an ‘out’ folder, as indicated in the deploy page.

If I then run an alpha, it’s acting on an empty set of data. Where has that data gone?

So clearly I’m missing something simple, which my lack of experience is keeping me blind to. How do I actually use the bulk-loaded data? Where is this legendary ‘.out/’ folder kept?

I have a simple bulk script that you can work on GitHub - OpenDgraph/Dgraph-Bulk-Script: Just a simple Sh to use Dgraph's Bulk Loader.

If you still have any trouble with it, just ping me here. You just need to update the version. That all gonna work as

Hey Michel!

I came across that script myself! I’ll happily use it, but I would also like to actually understand what I’m doing. I feel like I’m 80% there. What step am I missing?

Thanks for the quick reply though!

Also, where can I get more detailed information on how to bulk load? The deploy page is pretty scant…

And where is the “out” folder?!

I’m not sure. I could not fully understand your steps.

On docs

Another way would be to checking the code, but there is not much to look there. I believe your issue is related to the Docker context.

It should be in the context that you ran BulkLoader. It doesn’t go anywhere else.

Cheers.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.