I have a dgraph setup with a single zero and alpha, both on v1.1.1. After letting them sit for about 5 minutes the zero instance shows some weird behavior. It messages the alpha instance to remove a tablet, because it doesn’t belong to the first group:
I0105 15:06:59.027571 6 zero.go:696] Tablet: mailEvent does not belong to group: 1. Sending delete instruction.
W0105 15:07:08.785903 6 zero.go:660] While deleting predicates: rpc error: code = Canceled desc = context canceled
mailEvent
here isn’t a predicate in my schema, but a type. For me there are two things weird about this:
- why does the zero think that
mailEvent
is sitting in the wrong group. Having only a single zero and alpha, surely there is only one group. - why does this request time out on the alpha? The alpha does receive this command and executes it. However I did notice that during that time the alpha is not available and kind of ‘locks’
The dgraph log:
I0105 15:06:59.033492 6 index.go:1000] Dropping predicate: [mailEvent]
I0105 15:06:59.033639 6 log.go:34] Writes flushed. Stopping compactions now...
I0105 15:07:03.223523 6 log.go:34] Got compaction priority: {level:0 score:1.74 dropPrefix:[0 0 9 109 97 105 108 69 118 101 110 116]}
I0105 15:07:03.223615 6 log.go:34] Running for level: 0
I0105 15:07:07.793044 6 log.go:34] LOG Compact 0->1, del 2 tables, add 1 tables, took 4.569400874s
I0105 15:07:07.793110 6 log.go:34] Compaction for level: 0 DONE
W0105 15:07:08.785311 6 draft.go:1107] While sending membership to Zero. Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0105 15:07:12.031669 6 log.go:34] LOG Compact 1->1, del 1 tables, add 1 tables, took 4.238465817s
I0105 15:07:12.031913 6 log.go:34] DropPrefix done
I0105 15:07:12.032020 6 log.go:34] Resuming writes
I0105 15:07:12.032046 6 schema.go:79] Deleting schema for predicate: [mailEvent]
I0105 15:15:56.214144 6 draft.go:383] Creating snapshot at index: 24813. ReadTs: 23369.
For me this is a problem because the alpha, like I said isn’t available everytime the zero is doing this. Dgraph is basically under 0 load during all of this. The instances are running in two seperate containers in a single k8s pod. These are the flags I use to run the services:
dgraph alpha \
--my=127.0.0.1:7080 \
--lru_mb 2048 \
--zero 127.0.0.1:5080
dgraph zero --my=127.0.0.1:5080