If we run the bulk loader following the instructions using one zero
node and one bulk
instance, and then copy the p
directories to each of the alpha
nodes. Once we start the cluster, all mutations on the graph fail with a message like the following:
rpc error: code = Unknown desc = Uid: [195656434] cannot be greater than lease: [0]
Before that, when the cluster comes up, there is a message about the GraphQL schema updating and then the indices are deleted:
I0121 14:32:30.032892 1 admin.go:709] namespace: 0. Skipping GraphQL schema update. newSchema
.Version: 10010, oldSchema.Version: 0, schemaChanged: true.
I0121 14:32:30.034067 1 mutation.go:204] Max open files limit: 1048576
I0121 14:32:30.034576 1 index.go:783] Deleting indexes for
I0121 14:32:30.034672 1 index.go:783] Deleting indexes for
I0121 14:32:30.034708 1 index.go:783] Deleting indexes for
This seems to be unrecoverable. To be clear, we followed the instructions exactly and copied the p
directories to the alpha
servers, but we did not copy the zw
directory to one of the zero
servers (since that is not in the instructions).
To fix the problem, we copied the zw
directory from the server we ran the bulk
and zero
processes on to one of the servers in the zero
cluster. We are using docker swarm for orchestration and the service names for the original zero
server (during bulk) and the zero
server we copied the zw
directory to are the same. There are some forum posts that hint at this (e.g. Serving bulk-loaded data (HA cluster) - #12 by EnricoMi).
I have not seen a definitive forum post and the documentation should be fixed to reflect the fact that the zw
directory created during the bulk load process must be in the final cluster. This should be as easy as inserting a new step in the list here:
- Run bulk loader only on one server
- Copy (or use rsync) the p directory to the other servers (the servers you will be using to start the other Alpha nodes)
- Copy (or use rsync) the zw directory to one of the zero servers (the servers you will be using to start the other Zero nodes); note, the host name must match
- Now, start all Alpha nodes at the same time
We are using dgraph v21.03.2
, but I think this issue is the same for prior versions as well.