Slow syncs and Segfault

I have two alpha instances running within docker on two separate machines. As far as I can tell, both machines are identical. I start pumping data into them and everything looked to be working great. However, after a few hours, I came back and found that one of the instances had been streaming warnings like these:

W1113 03:35:13.841935       1 draft.go:958] Raft.Ready took too long to process: Timer Total:
204ms. Breakdown: [{sync 204ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1113 03:35:14.678171       1 draft.go:958] Raft.Ready took too long to process: Timer Total:
682ms. Breakdown: [{sync 682ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1113 03:35:16.272393       1 draft.go:958] Raft.Ready took too long to process: Timer Total:
1.594s. Breakdown: [{sync 1.594s} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true

And it had finally crashed with a segfault.
Here is a portion of the segfault log dump:

fatal error: fault                                                                            [signal SIGSEGV: segmentation violation code=0x1 addr=0xc030000000 pc=0x11ef2b1]              
                                                                                              goroutine 1281391 [running]:                                                                  
runtime.throw(0x175d0cb, 0x5)                                                                         /usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc00023bb88 sp=0xc00023bb58 pc=0x9d0f
d2                                                                                            
runtime.sigpanic()                                                                            
        /usr/local/go/src/runtime/signal_unix.go:397 +0x401 fp=0xc00023bbb8 sp=0xc00023bb88 pc
=0x9e7151                                                                                     
github.com/dgraph-io/dgraph/vendor/github.com/dgryski/go-groupvarint.Decode4(0xc00023bc30, 0x4
, 0x4, 0xc02ffffff0, 0x9, 0x10)                                                               
        /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/dgryski/go-groupvarint/decod
e_amd64.s:11 +0x11 fp=0xc00023bbc0 sp=0xc00023bbb8 pc=0x11ef2b1                               
github.com/dgraph-io/dgraph/codec.(*Decoder).unpackBlock(0xc015a0b1d0, 0xc01dd16bc0, 0x2, 0x8)
        /tmp/go/src/github.com/dgraph-io/dgraph/codec/codec.go:145 +0x226 fp=0xc00023bc68 sp=0
xc00023bbc0 pc=0x11efaa6                                                                      
github.com/dgraph-io/dgraph/codec.(*Decoder).LinearSeek(0xc015a0b1d0, 0xb5d9, 0x8, 0xc00344445
8, 0x4a0)
        /tmp/go/src/github.com/dgraph-io/dgraph/codec/codec.go:250 +0x62 fp=0xc00023bc98 sp=0xc00023bc68 pc=0x11f00b2                                                                       
github.com/dgraph-io/dgraph/algo.IntersectCompressedWithLinJump(0xc015a0b1d0, 0xc00343a000, 0x192b, 0x192b, 0xc00023bd30)                                                                   
        /tmp/go/src/github.com/dgraph-io/dgraph/algo/uidlist.go:79 +0xcd fp=0xc00023bcf8 sp=0xc00023bc98 pc=0x11f0fdd                                                                       
github.com/dgraph-io/dgraph/algo.IntersectCompressedWith(0xc01db56b80, 0x0, 0xc01db568c0, 0xc01db56bc0)                                                                                     
        /tmp/go/src/github.com/dgraph-io/dgraph/algo/uidlist.go:64 +0x134 fp=0xc00023bd58 sp=0
xc00023bcf8 pc=0x11f0e64                                                                      
github.com/dgraph-io/dgraph/posting.(*List).Uids(0xc002905bc0, 0x16e36, 0x0, 0xc01db568c0, 0x1
76abfe, 0x11, 0x0)                                                                            
        /tmp/go/src/github.com/dgraph-io/dgraph/posting/list.go:903 +0x186 fp=0xc00023bde0 sp=
0xc00023bd58 pc=0x11fea96                                                                     
github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func1(0x0, 0x1, 0xc01176bc8
0, 0xc01de35900)                                                                              
        /tmp/go/src/github.com/dgraph-io/dgraph/worker/task.go:674 +0x11ff fp=0xc00023bf80 sp=
0xc00023bde0 pc=0x1375a2f                                                                     
github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func2(0xc002905a40, 0xc01e$
637c0, 0x0, 0x1)
       /tmp/go/src/github.com/dgraph-io/dgraph/worker/task.go:691 +0x3a fp=0xc00023bfc0 sp=0x
c00023bf80 pc=0x1375e8a                                                                       runtime.goexit()                                                                              
        /usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00023bfc8 sp=0xc00023bfc0 pc=0xa010d1                                                                                         
created by github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings                         /tmp/go/src/github.com/dgraph-io/dgraph/worker/task.go:690 +0x3b3                     
                                                                                              goroutine 1 [semacquire, 1660 minutes]:                                                       
sync.runtime_Semacquire(0xc0004127f8)                                                         
        /usr/local/go/src/runtime/sema.go:56 +0x39                                            
sync.(*WaitGroup).Wait(0xc0004127f0)                                                          
        /usr/local/go/src/sync/waitgroup.go:130 +0x65                                         
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.setupServer()                                    
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:419 +0x838            
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run()                                            
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:565 +0xd4c            
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1(0xc0004a2000, 0xc0003cee40, 0x0, 0x2
)                                                                                             
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:87 +0x52              
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute(0xc0004a2000, 0xc
0003cee00, 0x2, 0x2, 0xc0004a2000, 0xc0003cee00)
        /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:702 +0x289                                                                                         
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x22121a0, 0x0, 0x0, 0x0)                                                                                     
        /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:783 +0x2ca                                                                                         
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).Execute(...)                     /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:736  
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()                                              
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:68 +0x85                   
main.main()                                                                                   
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/main.go:33 +0x99                       
                                                                                              
goroutine 5 [chan receive]:                                                                   
github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x22b9b20)  
        /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:882 +0x8
b                                                                                             
created by github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.init.0                   
        /tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:410 +0x$
72
goroutine 19 [select]:                                                                        github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view.(*worker).start(0xc0002880a0)  
        /tmp/go/src/github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view/worker.go:154 +0x100                                                                                     
created by github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view.init.0                      /tmp/go/src/github.com/dgraph-io/dgraph/vendor/go.opencensus.io/stats/view/worker.go:3
2 +0x57                                                                                                                                                                                     
goroutine 20 [syscall, 1660 minutes]:                                                         
os/signal.signal_recv(0x0)                                                                    
        /usr/local/go/src/runtime/sigqueue.go:139 +0x9c                                       
os/signal.loop()                                                                              
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x22                                   
created by os/signal.init.0                                                                   
        /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

I’ve started it up again and it almost immediately began to stream the above warnings. No apparent problems with the disk, so I’m not sure what else might cause this.

Both of my alphas were crashed with segfaults this morning. The first alpha began logging the same slow syncs, but not nearly as many.

Thank you for the post. I will assign someone to look at the issue as soon as possible.

Hi,

Would you be able to share your ‘p’ directory – where the data is so we can reproduce this issue and fix it?

Thanks.

Hi,

Please tell me which version of dgraph you are running on? It looks like we have fixed a similar issue and the master branch has the fix. If you would upgrade to master and try and reproduce it, that would be very helpful.

Thanks

Hi Shekar,

I can’t share my current ‘p’ directory, though I could potentially try and reproduce with mocked data. I will grab the master branch first, and see if the issue manifests overnight.

Sounds good. Thanks for the prompt reply.

Hello Shekar,

No crashes overnight running on master. I will assume it’s fixed, but will let you know if anything changes.

Thanks!

I’m afraid I’ve gotten another crash, sometime last night. I assume it was a segfault, but my backlog does not go back far enough to show me the beginning of the stack trace.

I have two alphas running on the master branch (717710a26ba56e5ea05f3b4f4429779c6038f7fc), compiled from source on my host machine. The binaries were running in two separate DigitalOcean droplets. The first was not logging slow sync warnings, but the second one was.

It was the first one that crashed. The second one is still running and logging the warnings.

If you need my ‘p’ directory, I can try to reproduce with mock data over the weekend. Let me know if there’s anything else I can try.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.