I have two alpha instances running within docker on two separate machines. As far as I can tell, both machines are identical. I start pumping data into them and everything looked to be working great. However, after a few hours, I came back and found that one of the instances had been streaming warnings like these:
W1113 03:35:13.841935 1 draft.go:958] Raft.Ready took too long to process: Timer Total:
204ms. Breakdown: [{sync 204ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1113 03:35:14.678171 1 draft.go:958] Raft.Ready took too long to process: Timer Total:
682ms. Breakdown: [{sync 682ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1113 03:35:16.272393 1 draft.go:958] Raft.Ready took too long to process: Timer Total:
1.594s. Breakdown: [{sync 1.594s} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
And it had finally crashed with a segfault.
Here is a portion of the segfault log dump:
fatal error: fault [signal SIGSEGV: segmentation violation code=0x1 addr=0xc030000000 pc=0x11ef2b1]
goroutine 1281391 [running]:
runtime.throw(0x175d0cb, 0x5) /usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc00023bb88 sp=0xc00023bb58 pc=0x9d0f
d2
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:397 +0x401 fp=0xc00023bbb8 sp=0xc00023bb88 pc
=0x9e7151
github.com/dgraph-io/dgraph/vendor/github.com/dgryski/go-groupvarint.Decode4(0xc00023bc30, 0x4
, 0x4, 0xc02ffffff0, 0x9, 0x10)
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/dgryski/go-groupvarint/decod
e_amd64.s:11 +0x11 fp=0xc00023bbc0 sp=0xc00023bbb8 pc=0x11ef2b1
github.com/dgraph-io/dgraph/codec.(*Decoder).unpackBlock(0xc015a0b1d0, 0xc01dd16bc0, 0x2, 0x8)
/tmp/go/src/github.com/dgraph-io/dgraph/codec/codec.go:145 +0x226 fp=0xc00023bc68 sp=0
xc00023bbc0 pc=0x11efaa6
github.com/dgraph-io/dgraph/codec.(*Decoder).LinearSeek(0xc015a0b1d0, 0xb5d9, 0x8, 0xc00344445
8, 0x4a0)
/tmp/go/src/github.com/dgraph-io/dgraph/codec/codec.go:250 +0x62 fp=0xc00023bc98 sp=0xc00023bc68 pc=0x11f00b2
github.com/dgraph-io/dgraph/algo.IntersectCompressedWithLinJump(0xc015a0b1d0, 0xc00343a000, 0x192b, 0x192b, 0xc00023bd30)
/tmp/go/src/github.com/dgraph-io/dgraph/algo/uidlist.go:79 +0xcd fp=0xc00023bcf8 sp=0xc00023bc98 pc=0x11f0fdd
github.com/dgraph-io/dgraph/algo.IntersectCompressedWith(0xc01db56b80, 0x0, 0xc01db568c0, 0xc01db56bc0)
/tmp/go/src/github.com/dgraph-io/dgraph/algo/uidlist.go:64 +0x134 fp=0xc00023bd58 sp=0
xc00023bcf8 pc=0x11f0e64
github.com/dgraph-io/dgraph/posting.(*List).Uids(0xc002905bc0, 0x16e36, 0x0, 0xc01db568c0, 0x1
76abfe, 0x11, 0x0)
/tmp/go/src/github.com/dgraph-io/dgraph/posting/list.go:903 +0x186 fp=0xc00023bde0 sp=
0xc00023bd58 pc=0x11fea96
github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func1(0x0, 0x1, 0xc01176bc8
0, 0xc01de35900)
/tmp/go/src/github.com/dgraph-io/dgraph/worker/task.go:674 +0x11ff fp=0xc00023bf80 sp=
0xc00023bde0 pc=0x1375a2f
github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func2(0xc002905a40, 0xc01e$
637c0, 0x0, 0x1)
/tmp/go/src/github.com/dgraph-io/dgraph/worker/task.go:691 +0x3a fp=0xc00023bfc0 sp=0x
c00023bf80 pc=0x1375e8a runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00023bfc8 sp=0xc00023bfc0 pc=0xa010d1
created by github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings /tmp/go/src/github.com/dgraph-io/dgraph/worker/task.go:690 +0x3b3
goroutine 1 [semacquire, 1660 minutes]:
sync.runtime_Semacquire(0xc0004127f8)
/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc0004127f0)
/usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.setupServer()
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:419 +0x838
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run()
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:565 +0xd4c
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1(0xc0004a2000, 0xc0003cee40, 0x0, 0x2
)
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:87 +0x52
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute(0xc0004a2000, 0xc
0003cee00, 0x2, 0x2, 0xc0004a2000, 0xc0003cee00)
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:702 +0x289
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x22121a0, 0x0, 0x0, 0x0)
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:783 +0x2ca
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).Execute(...) /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:736
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:68 +0x85
main.main()
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/main.go:33 +0x99
goroutine 5 [chan receive]:
github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x22b9b20)
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:882 +0x8
b
created by github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.init.0
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:410 +0x$
72
goroutine 19 [select]: github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view.(*worker).start(0xc0002880a0)
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view/worker.go:154 +0x100
created by github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view.init.0 /tmp/go/src/github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view/worker.go:3
2 +0x57
goroutine 20 [syscall, 1660 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
I’ve started it up again and it almost immediately began to stream the above warnings. No apparent problems with the disk, so I’m not sure what else might cause this.