I’ve started familiarizing myself with dgraph following the docs and the tour.
When I run the dgraph binaries and load the 1million.rdf.gz there’s no problem, the terminal output is:
Processing 1million.rdf.gz [ 2s] Txns: 41 RDFs: 41000 RDFs/sec: 20459 Aborts: 0 [ 4s] Txns: 91 RDFs: 91000 RDFs/sec: 22748 Aborts: 0 [ 6s] Txns: 164 RDFs: 164000 RDFs/sec: 27322 Aborts: 0 [ 8s] Txns: 233 RDFs: 233000 RDFs/sec: 29114 Aborts: 0 [ 10s] Txns: 300 RDFs: 300000 RDFs/sec: 29999 Aborts: 0 [ 12s] Txns: 368 RDFs: 368000 RDFs/sec: 30661 Aborts: 0 [ 14s] Txns: 436 RDFs: 436000 RDFs/sec: 31143 Aborts: 0 [ 16s] Txns: 497 RDFs: 497000 RDFs/sec: 31062 Aborts: 0 [ 18s] Txns: 555 RDFs: 555000 RDFs/sec: 30829 Aborts: 0 [ 20s] Txns: 625 RDFs: 625000 RDFs/sec: 31250 Aborts: 0 [ 22s] Txns: 679 RDFs: 679000 RDFs/sec: 30860 Aborts: 0 [ 24s] Txns: 745 RDFs: 745000 RDFs/sec: 31038 Aborts: 0 [ 26s] Txns: 803 RDFs: 803000 RDFs/sec: 30883 Aborts: 0 Number of TXs run : 845 Number of RDFs processed : 844056 Time spent : 27.487949534s RDFs processed per second : 31261
The docs make docker look like the preferred method for deploying dgraph, but I’ve been unable to load data when running in containers. I started with the docker compose example but at first couldn’t figure out how to mount the directory where I’d downloaded 1million.rdf.gz. The example has a “volumes” section under each service that looks like:
volumes: - type: volume source: dgraph target: /dgraph volume: nocopy: true
Docker documentation does not clarify what any of that means (or I couldn’t find it) and I wasn’t making any headway changing the source or target. But replacing that block with:
volumes: - /Users/nfeldman/learn/dgraph:/dgraph
“works” (files in ~/learn/dgraph are visible in the container) for reasons that are unclear. The only other change I made was to set
When I then run
docker exec -it dgraph_zero_1 dgraph live -r 1million.rdf.gz --zero localhost:5080 -d server:9080 -c 1 It fails due to what looks like a connection timeout. It starts out OK:
Processing 1million.rdf.gz [ 2s] Txns: 35 RDFs: 35000 RDFs/sec: 17497 Aborts: 0 [ 4s] Txns: 48 RDFs: 48000 RDFs/sec: 12000 Aborts: 0 [ 6s] Txns: 78 RDFs: 78000 RDFs/sec: 12999 Aborts: 0 [ 8s] Txns: 132 RDFs: 132000 RDFs/sec: 16500 Aborts: 0 ...
showing fewer “RDFs/sec” on each update, then does:
[ 1m34s] Txns: 608 RDFs: 608000 RDFs/sec: 6468 Aborts: 0 Error while mutating Assigning IDs is only allowed on leader. [ 1m36s] Txns: 608 RDFs: 608000 RDFs/sec: 6333 Aborts: 1
and eventually terminates:
[ 5m38s] Txns: 668 RDFs: 668000 RDFs/sec: 1976 Aborts: 1 2019/02/18 01:30:39 transport is closing github.com/dgraph-io/dgraph/x.Fatalf /ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:115 github.com/dgraph-io/dgraph/dgraph/cmd/live.handleError /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:140 github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).request /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:182 github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).makeRequests /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:194 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1333
I attempted to load the data in docker multiple times (dropping the db and reloading the schema before each attempt), and it always fails. It always processes fewer triples per update, but it doesn’t always have that “Error while mutating Assigning IDs” message (and sometimes it has that message more than once).
It also fails in the same way if I don’t change the volumes block from what is given in the example and instead copy the archive into the container first, as suggested in this comment.
Why does it take so long and eventually fail to load in docker?