Bulk loading data to single AWS instance

Hello,

I have followed the instructions to run Docker Compose on a single AWS instance. However, the instructions only show how to get the servers running and don’t provide details on bulk data loading.

I have made some changes to the instructions in order to get bulk loading but I am having issues getting the server to use the data loaded.

Following is a summary of the steps I have followed and I would appreciate if somebody can provide some inside of what is missing.

  1. I have build an AWS instance using the docker machine command. This step works as I was able to verify the AWS instance running on my EC2 console. Following creating I have run the commands to set the docker environment on my console.

  2. I have started the dgraph zero using docker-compose. Following is the yml file

version: “3.2”
services:
zero:
image: dgraph/dgraph:latest
volumes:
- ./MyData:/dgraph
ports:
- 5080:5080
- 6080:6080
restart: on-failure
command: dgraph zero --my=zero:5080

$ docker-compose -f dc_dgzero.yml up -d

  1. After running the “docker-compose -f dc_dgzero.yml up -d” command I was expecting the data in MyData folder to be mapped and copied to the docker instance, but that does not seem to be done.

  2. Since data was not copied to the docker instance, I copied the data and schema using the following commands

$ docker cp MyData/data.rdf.gz dockerstartup_zero_1:/dgraph

$ docker cp MyData/schema.rdf.gz dockerstartup_zero_1:/dgraph

  1. Run dgraph bulk to load the data. The command runs without any issues and the console shows successful completion.

$ docker exec -it dockerstartup_zero_1 dgraph bulk -r data.rdf.gz -s schema.rdf.gz --map_shards 1 --reduce_shards 1 --zero zero:5080

  1. Start dgraph server and ratel using docker compose using the following yml file and command

version: “3.2”
services:
server:
image: dgraph/dgraph:latest
volumes:
- /MyData:/dgraph
ports:
- 8080:8080
- 9080:9080
restart: on-failure
command: dgraph server --my=server:7080 --memory_mb=2048 --zero=zero:5080 --postings out/0/p
ratel:
image: dgraph/dgraph:latest
ports:
- 8000:8000
command: dgraph-ratel

$ docker-compose -f dc_dgserver_ratel.yml up -d

  1. I run docker compose logs to see the results and it seems like the dgraph server starts but does not load the data. I opened ratel running on the AWS instance and all my queries return no data. I also checked the schema and the schema was not loaded.

I would appreciate if somebody could help identifying what is missing or if I am doing something wrong.

Thanks in advance,
Marcelo

Can you share the logs from when the Dgraph server starts running? I suppose the correct postings directory is not being picked up somehow.

@pawan Following are the logs.

Logs dgraph bulk command:

$ docker exec -it dockerstartup_zero_1 dgraph bulk -r data.rdf.gz -s schema.rdf.gz --map_shards 1 --reduce_shards 1 --zero zero:5080
{
“RDFDir”: “data.rdf.gz”,
“SchemaFile”: “schema.rdf.gz”,
“DgraphsDir”: “out”,
“TmpDir”: “tmp”,
“NumGoroutines”: 4,
“MapBufSize”: 67108864,
“ExpandEdges”: true,
“SkipMapPhase”: false,
“CleanupTmp”: true,
“NumShufflers”: 1,
“Version”: false,
“StoreXids”: false,
“ZeroAddr”: “zero:5080”,
“HttpAddr”: “localhost:8080”,
“MapShards”: 1,
“ReduceShards”: 1
}
The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See man ulimit for details of how to change the limit.
Current max open files limit: 1048576
2018/04/23 15:52:25 loader.go:77: Connecting to zero at zero:5080
MAP 01s rdf_count:3.801k rdf_speed:3.786k/sec edge_count:11.16k edge_speed:11.12k/sec
MAP 02s rdf_count:15.09k rdf_speed:7.531k/sec edge_count:44.37k edge_speed:22.14k/sec
MAP 03s rdf_count:393.7k rdf_speed:131.1k/sec edge_count:1.161M edge_speed:386.5k/sec
MAP 04s rdf_count:843.3k rdf_speed:210.6k/sec edge_count:2.423M edge_speed:605.0k/sec
MAP 05s rdf_count:1.248M rdf_speed:249.4k/sec edge_count:3.552M edge_speed:709.8k/sec
MAP 06s rdf_count:1.576M rdf_speed:262.4k/sec edge_count:4.508M edge_speed:750.7k/sec
MAP 07s rdf_count:1.838M rdf_speed:262.0k/sec edge_count:5.273M edge_speed:751.8k/sec
MAP 08s rdf_count:1.987M rdf_speed:245.2k/sec edge_count:5.710M edge_speed:704.6k/sec
MAP 09s rdf_count:2.139M rdf_speed:234.8k/sec edge_count:6.240M edge_speed:684.8k/sec
MAP 10s rdf_count:2.168M rdf_speed:214.4k/sec edge_count:6.346M edge_speed:627.6k/sec
MAP 11s rdf_count:2.168M rdf_speed:195.1k/sec edge_count:6.346M edge_speed:571.1k/sec
MAP 12s rdf_count:2.168M rdf_speed:179.0k/sec edge_count:6.346M edge_speed:523.9k/sec
MAP 13s rdf_count:2.168M rdf_speed:165.3k/sec edge_count:6.346M edge_speed:483.9k/sec
REDUCE 14s [8.93%] edge_count:566.6k edge_speed:566.6k/sec plist_count:351.1k plist_speed:351.1k/sec
REDUCE 15s [18.84%] edge_count:1.196M edge_speed:1.195M/sec plist_count:887.5k plist_speed:887.3k/sec
REDUCE 16s [28.80%] edge_count:1.828M edge_speed:913.5k/sec plist_count:1.319M plist_speed:659.5k/sec
REDUCE 17s [45.83%] edge_count:2.909M edge_speed:969.3k/sec plist_count:1.877M plist_speed:625.4k/sec
REDUCE 18s [69.85%] edge_count:4.433M edge_speed:1.107M/sec plist_count:2.110M plist_speed:527.0k/sec
REDUCE 19s [94.47%] edge_count:5.995M edge_speed:1.198M/sec plist_count:2.394M plist_speed:478.4k/sec
REDUCE 19s [100.00%] edge_count:6.346M edge_speed:1.079M/sec plist_count:2.616M plist_speed:444.8k/sec
Total: 19s

Logs for zero, server and ratel

$ docker-compose logs
Attaching to dockerstartup_ratel_1, dockerstartup_server_1, dockerstartup_zero_1
ratel_1 | 2018/04/23 15:53:42 Listening on port 8000…
server_1 | 2018/04/23 15:53:42 groups.go:88: Current Raft Id: 0
server_1 | 2018/04/23 15:53:42 worker.go:99: Worker listening at address: [::]:7080
server_1 | 2018/04/23 15:53:42 gRPC server started. Listening on port 9080
server_1 | 2018/04/23 15:53:42 HTTP server started. Listening on port 8080
server_1 | 2018/04/23 15:53:42 pool.go:108: == CONNECT ==> Setting zero:5080
server_1 | 2018/04/23 15:53:42 groups.go:115: Connected to group zero. Assigned group: 1
server_1 | 2018/04/23 15:53:42 draft.go:180: Node ID: 1 with GroupID: 1
server_1 | 2018/04/23 15:53:42 node.go:240: Group 1 found 0 entries
server_1 | 2018/04/23 15:53:42 draft.go:947: New Node for group: 1
server_1 | 2018/04/23 15:53:42 raft.go:567: INFO: 1 became follower at term 0
server_1 | 2018/04/23 15:53:42 raft.go:315: INFO: newRaft 1 [peers: , term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
server_1 | 2018/04/23 15:53:42 raft.go:567: INFO: 1 became follower at term 1
server_1 | 2018/04/23 15:53:42 node.go:118: Setting conf state to nodes:1
server_1 | 2018/04/23 15:53:42 raft.go:749: INFO: 1 is starting a new election at term 1
server_1 | 2018/04/23 15:53:42 raft.go:580: INFO: 1 became candidate at term 2
server_1 | 2018/04/23 15:53:42 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 2
server_1 | 2018/04/23 15:53:42 raft.go:621: INFO: 1 became leader at term 2
server_1 | 2018/04/23 15:53:42 node.go:301: INFO: raft.node: 1 elected leader 1 at term 2
server_1 | 2018/04/23 15:53:42 groups.go:356: Serving tablet for: predicate
server_1 | 2018/04/23 15:53:42 mutation.go:158: Done schema update predicate:“predicate” value_type:STRING list:true
zero_1 | Setting up grpc listener at: 0.0.0.0:5080
zero_1 | Setting up http listener at: 0.0.0.0:6080
zero_1 | 2018/04/23 15:51:20 node.go:240: Group 0 found 0 entries
zero_1 | 2018/04/23 15:51:20 raft.go:567: INFO: 1 became follower at term 0
zero_1 | 2018/04/23 15:51:20 raft.go:315: INFO: newRaft 1 [peers: , term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
zero_1 | 2018/04/23 15:51:20 raft.go:567: INFO: 1 became follower at term 1
zero_1 | Running Dgraph zero…
zero_1 | 2018/04/23 15:51:20 node.go:118: Setting conf state to nodes:1
zero_1 | 2018/04/23 15:51:23 raft.go:749: INFO: 1 is starting a new election at term 1
zero_1 | 2018/04/23 15:51:23 raft.go:580: INFO: 1 became candidate at term 2
zero_1 | 2018/04/23 15:51:23 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 2
zero_1 | 2018/04/23 15:51:23 raft.go:621: INFO: 1 became leader at term 2
zero_1 | 2018/04/23 15:51:23 node.go:301: INFO: raft.node: 1 elected leader 1 at term 2
zero_1 | 2018/04/23 15:53:42 zero.go:336: Got connection request: addr:“server:7080”
zero_1 | 2018/04/23 15:53:42 pool.go:108: == CONNECT ==> Setting server:7080
zero_1 | 2018/04/23 15:53:42 zero.go:445: Connected
zero_1 | 2018/04/23 15:53:50 oracle.go:75: purging below ts:1, len(o.commits):0, len(o.aborts):0, len(o.rowCommit):0

Thanks

Can you go to the container running dgraph server and see if it has the out/0/p folder? I am very sure its an issue with the location of the p folder.

@pawan Thanks for your help. I had a typo in the server yml file volumes configuration. After I fixed the typo, server is starting as expected.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.