I setup a dgraph (v0.9.3) cluster with 3 zero instances and 10 servers running in a single pod of Kubernetes.
One of the server is having an “Assert failed” error
This particular container cannot restart properly and always runs into the same error.
Maybe it’s reassigning the predicates when a mutation was received? as it happens ~10mins after ingesting starts.
Is there a way to get around this issue?
it’s strange that only 1 of the 10 dgraph instances has this problem.
dgraph zero log
Groups sorted by size: [{gid:20 size:4554185} {gid:26 size:4657067} {gid:21 size:5071232} {gid:18 size:5787171} {gid:11 size:6685647} {gid:27 size:6792231} {gid:29 size:7302747} {gid:23 size:7506995} {gid:15 size:15740063} {gid:1 size:17982928}]
2017/12/13 08:15:32 tablet.go:170: size_diff 13428743
2017/12/13 08:15:32 tablet.go:87: Going to move predicate _dummy_ from 1 to 20
2017/12/13 08:15:32 node.go:162: SENDING: MsgApp 1-->3
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgAppResp 3-->1
2017/12/13 08:15:32 node.go:162: SENDING: MsgApp 1-->3
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgAppResp 3-->1
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 08:15:32 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 08:15:32 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 08:15:32 node.go:162: SENDING: MsgApp 1-->3
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgAppResp 3-->1
2017/12/13 08:15:32 node.go:162: SENDING: MsgApp 1-->3
2017/12/13 08:15:32 tablet.go:91: Error while trying to move predicate _dummy_ from 1 to 20: rpc error: code = Unavailable desc = transport is closing
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgAppResp 3-->1
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 08:15:32 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 08:15:32 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 08:15:32 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 08:15:33 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 08:15:35 zero.go:293: Got connection request: id:1 addr:"localhost:7092"
2017/12/13 08:15:35 zero.go:389: Connected
2017/12/13 08:15:38 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 08:15:38 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 08:15:42 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 08:15:43 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 08:15:50 zero.go:293: Got connection request: id:1 addr:"localhost:7092"
2017/12/13 08:15:50 zero.go:389: Connected
2017/12/13 08:15:52 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 08:15:53 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 08:15:57 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 08:15:57 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 08:16:02 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 08:16:03 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
...(a lot of repetitive logs)
2017/12/13 20:10:02 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:10:03 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:10:12 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:10:13 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:10:22 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:10:23 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:10:32 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:10:33 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:10:38 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 20:10:38 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 20:10:42 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:10:43 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:10:52 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:10:53 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:10:57 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 20:10:57 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 20:11:00 zero.go:293: Got connection request: id:1 addr:"localhost:7092"
2017/12/13 20:11:00 zero.go:389: Connected
2017/12/13 20:11:02 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:11:03 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:11:12 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:11:13 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:11:22 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:11:23 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:11:32 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:11:33 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:11:38 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 20:11:38 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 20:11:42 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:11:43 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:11:52 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:11:53 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:11:57 node.go:485: RECEIVED: MsgReadIndex 3-->1
2017/12/13 20:11:57 node.go:162: SENDING: MsgReadIndexResp 1-->3
2017/12/13 20:12:02 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:12:03 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:12:12 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:12:13 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:12:22 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:12:23 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2017/12/13 20:12:32 oracle.go:372: No healthy connection found to leader of group 1
2017/12/13 20:12:33 pool.go:168: Echo error from localhost:7092. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure