@mikehawkes I see the following in your logs
I0811 23:02:09.694392 35 draft.go:523] Creating snapshot at index: 8145811. ReadTs: 8963338.
I0811 23:02:10.417434 36 oracle.go:107] Purged below ts:8963338, len(o.commits):6, len(o.rowCommit):154
runtime/cgo: pthread_create failed: Resource temporarily unavailable
W0811 23:04:05.176556 35 groups.go:835] No membership update for 10s. Closing connection to Zero.
E0811 23:04:06.804167 35 groups.go:796] Unable to sync memberships. Error: rpc error: code = Canceled desc = context canceled. State: <nil>
E0811 23:04:06.869526 35 groups.go:744] While sending membership update: rpc error: code = Unavailable desc = transport is closing
E0811 23:04:06.869825 35 groups.go:896] Error in oracle delta stream. Error: rpc error: code = Unavailable desc = transport is closing
W0811 23:04:06.870025 35 pool.go:254] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = transport is closing
W0811 23:04:06.870105 35 draft.go:1211] While sending membership to Zero. Error: rpc error: code = Unavailable desc = transport is closing
E0811 23:04:06.889557 35 groups.go:744] While sending membership update: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"
E0811 23:04:07.290211 35 groups.go:744] While sending membership update: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"
I0811 23:04:08.057900 35 groups.go:856] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0811 23:04:08.057926 35 groups.go:865] Got Zero leader: localhost:5080
E0811 23:04:08.058262 35 groups.go:877] Error while calling Oracle rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"
E0811 23:04:08.290335 35 groups.go:744] While sending membership update: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"
I0811 23:04:09.058456 35 groups.go:856] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0811 23:09:09.290847 35 draft.go:1269] Found 1 old transactions. Acting to abort them.
I0811 23:09:09.290872 35 draft.go:1272] Done abortOldTransactions for 1 txns. Error: No connection exists
github.com/dgraph-io/dgraph/worker.init
/tmp/go/src/github.com/dgraph-io/dgraph/worker/draft.go:1218
runtime.doInit
/usr/local/go/src/runtime/proc.go:5414
runtime.doInit
/usr/local/go/src/runtime/proc.go:5409
runtime.doInit
/usr/local/go/src/runtime/proc.go:5409
runtime.doInit
/usr/local/go/src/runtime/proc.go:5409
runtime.main
/usr/local/go/src/runtime/proc.go:190
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1373
I0811 23:10:09.290683 35 draft.go:1269] Found 1 old transactions. Acting to abort them.
I0811 23:10:09.290980 35 draft.go:1272] Done abortOldTransactions for 1 txns. Error: No connection exists
github.com/dgraph-io/dgraph/worker.init
/tmp/go/src/github.com/dgraph-io/dgraph/worker/draft.go:1218
runtime.doInit
/usr/local/go/src/runtime/proc.go:5414
runtime.doInit
/usr/local/go/src/runtime/proc.go:5409
runtime.doInit
/usr/local/go/src/runtime/proc.go:5409
runtime.doInit
/usr/local/go/src/runtime/proc.go:5409
runtime.main
/usr/local/go/src/runtime/proc.go:190
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1373
The message runtime/cgo: pthread_create failed: Resource temporarily unavailable
might be the reason for your crashes. I’ve never seen this kind of error message before.
From the logs, it looks like there is a cgo
crash and followed by that raft
starts having issues and the node is not able to communicate with other nodes.
@mikehawkes Have you tried running dgraph in a different environment? I think it might be because of some environment issues. You can try running the dgraph binary and not the standalone docker image.
If you can help me with all the details about where you’re running dgraph and how you’re running dgraph, I can try to reproduce the crash and investigate it further.
@mikehawkes are you running dgraph on macOS?