Zero UpdateMembership weirdly deletes predicates

I have a dgraph setup with a single zero and alpha, both on v1.1.1. After letting them sit for about 5 minutes the zero instance shows some weird behavior. It messages the alpha instance to remove a tablet, because it doesn’t belong to the first group:

I0105 15:06:59.027571       6 zero.go:696] Tablet: mailEvent does not belong to group: 1. Sending delete instruction.
W0105 15:07:08.785903       6 zero.go:660] While deleting predicates: rpc error: code = Canceled desc = context canceled

mailEvent here isn’t a predicate in my schema, but a type. For me there are two things weird about this:

  1. why does the zero think that mailEvent is sitting in the wrong group. Having only a single zero and alpha, surely there is only one group.
  2. why does this request time out on the alpha? The alpha does receive this command and executes it. However I did notice that during that time the alpha is not available and kind of ‘locks’

The dgraph log:

I0105 15:06:59.033492       6 index.go:1000] Dropping predicate: [mailEvent]
I0105 15:06:59.033639       6 log.go:34] Writes flushed. Stopping compactions now...
I0105 15:07:03.223523       6 log.go:34] Got compaction priority: {level:0 score:1.74 dropPrefix:[0 0 9 109 97 105 108 69 118 101 110 116]}
I0105 15:07:03.223615       6 log.go:34] Running for level: 0
I0105 15:07:07.793044       6 log.go:34] LOG Compact 0->1, del 2 tables, add 1 tables, took 4.569400874s
I0105 15:07:07.793110       6 log.go:34] Compaction for level: 0 DONE
W0105 15:07:08.785311       6 draft.go:1107] While sending membership to Zero. Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0105 15:07:12.031669       6 log.go:34] LOG Compact 1->1, del 1 tables, add 1 tables, took 4.238465817s
I0105 15:07:12.031913       6 log.go:34] DropPrefix done
I0105 15:07:12.032020       6 log.go:34] Resuming writes
I0105 15:07:12.032046       6 schema.go:79] Deleting schema for predicate: [mailEvent]
I0105 15:15:56.214144       6 draft.go:383] Creating snapshot at index: 24813. ReadTs: 23369.

For me this is a problem because the alpha, like I said isn’t available everytime the zero is doing this. Dgraph is basically under 0 load during all of this. The instances are running in two seperate containers in a single k8s pod. These are the flags I use to run the services:

dgraph alpha \
  --my=127.0.0.1:7080 \
  --lru_mb 2048 \
  --zero 127.0.0.1:5080
dgraph zero --my=127.0.0.1:5080
3 Likes

Is that delete resulting in lost data?

The cluster incorrectly logging that a type name is being deleted as a predicate is filed in this existing GitHub issue: https://github.com/dgraph-io/dgraph/issues/4473.

There’s no data loss happening here. Seems like Dgraph is internally storing the type name as a predicate as well, and because there’s no data for the predicate (since it’s a type) Dgraph is cleaning it up, hence the message.

1 Like

Yes, as @dmai said, I never experienced any data loss.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.