OOM on 1.7Gb shapshot


(Nikita Zaletov) #1

hi all

we get this error right after a message about 1.7Gb snapshot sent
is that size normal? why it’s sending snapshots of such size?

what does this number mean? seems like it’s growing sequentially then being reset and growing again
thanks

Sending SNAPSHOT Time elapsed: 02m28s, bytes sent: 1.7 GB, speed: 12 MB/sec
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x130027f, 0x16)
	/usr/local/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc2b8000000, 0x20000000, 0x1d13038)
	/usr/local/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0x1cf9860, 0x20000000, 0x8d418, 0x7fe1b8036cd8)
	/usr/local/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0x1cf9860, 0xf697, 0x0)
	/usr/local/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0x1cf9860, 0xf697, 0x1d13048, 0x0)
	/usr/local/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0x1cf9860, 0xf697, 0x101, 0x14dedc0)
	/usr/local/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
	/usr/local/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0x1cf9860, 0xf697, 0x2b8010101, 0x84bd40)
	/usr/local/go/src/runtime/mheap.go:758 +0x8a
runtime.largeAlloc(0x1ed2d78a, 0xc000170101, 0x885556)
	/usr/local/go/src/runtime/malloc.go:1019 +0x97
runtime.mallocgc.func1()
	/usr/local/go/src/runtime/malloc.go:914 +0x46
runtime.systemstack(0x0)
	/usr/local/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1229

goroutine 204118513 [running]:
runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc00017b888 sp=0xc00017b880 pc=0x881a00
runtime.mallocgc(0x1ed2d78a, 0x1192420, 0x57c601, 0xc2588da000)
	/usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xc00017b928 sp=0xc00017b888 pc=0x832f66
runtime.makeslice(0x1192420, 0x1ed2d78a, 0x1ed2d78a, 0xc00017b9a0, 0x830f63, 0x11e9aa0)
	/usr/local/go/src/runtime/slice.go:70 +0x77 fp=0xc00017b958 sp=0xc00017b928 pc=0x86a957
github.com/dgraph-io/dgraph/protos/pb.(*KVS).Marshal(0xc05469e380, 0x129c8c0, 0xc05469e380, 0x7fe1ec2bb488, 0xc05469e380, 0x7fe117451001)
	/ext-go/1/src/github.com/dgraph-io/dgraph/protos/pb/pb.pb.go:5826 +0x49 fp=0xc00017b9b0 sp=0xc00017b958 pc=0xcb5ae9
google.golang.org/grpc/encoding/proto.codec.Marshal(0x129c8c0, 0xc05469e380, 0x0, 0x0, 0x0, 0x0, 0x0)
	/ext-go/1/src/google.golang.org/grpc/encoding/proto/proto.go:70 +0x19c fp=0xc00017ba30 sp=0xc00017b9b0 pc=0xc5188c
google.golang.org/grpc/encoding/proto.(*codec).Marshal(0x1d112c0, 0x129c8c0, 0xc05469e380, 0x0, 0x0, 0x0, 0x0, 0x0)
	<autogenerated>:1 +0x46 fp=0xc00017ba78 sp=0xc00017ba30 pc=0xc52006
google.golang.org/grpc.encode(0x7fe238c8c000, 0x1d112c0, 0x129c8c0, 0xc05469e380, 0x129c8c0, 0xc05469e380, 0x3344000, 0xc00017bbd0, 0x832b3c)
	/ext-go/1/src/google.golang.org/grpc/rpc_util.go:487 +0x5e fp=0xc00017baf8 sp=0xc00017ba78 pc=0xc5f6ee
google.golang.org/grpc.(*serverStream).SendMsg(0xc010b76000, 0x129c8c0, 0xc05469e380, 0x0, 0x0)
	/ext-go/1/src/google.golang.org/grpc/stream.go:712 +0xc9 fp=0xc00017bc10 sp=0xc00017baf8 pc=0xc6d8b9
github.com/dgraph-io/dgraph/protos/pb.(*workerStreamSnapshotServer).Send(0xc064710580, 0xc05469e380, 0x1cf2e80, 0xedf01)
	/ext-go/1/src/github.com/dgraph-io/dgraph/protos/pb/pb.pb.go:4272 +0x49 fp=0xc00017bc48 sp=0xc00017bc10 pc=0xcac1e9
github.com/dgraph-io/dgraph/worker.(*streamLists).streamKVs.func1(0xc05469e380, 0xc00017bdcc, 0x3)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/stream_lists.go:221 +0x2d1 fp=0xc00017bd80 sp=0xc00017bc48 pc=0xf90411
github.com/dgraph-io/dgraph/worker.(*streamLists).streamKVs(0xc0842f6a00, 0x1417da0, 0xc030d9c2a0, 0x12fa31f, 0x10, 0xc0270906c0, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/stream_lists.go:252 +0x534 fp=0xc00017bf48 sp=0xc00017bd80 pc=0xf79eb4
github.com/dgraph-io/dgraph/worker.(*streamLists).orchestrate.func2(0xc0270908a0, 0xc0842f6a00, 0x1417da0, 0xc030d9c2a0, 0x12fa31f, 0x10, 0xc0270906c0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/stream_lists.go:77 +0x6a fp=0xc00017bfa8 sp=0xc00017bf48 pc=0xf8faea
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc00017bfb0 sp=0xc00017bfa8 pc=0x883ae1
created by github.com/dgraph-io/dgraph/worker.(*streamLists).orchestrate
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/stream_lists.go:76 +0x238

goroutine 1 [semacquire, 2777 minutes]:
sync.runtime_Semacquire(0xc0004f21b8)
	/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc0004f21b0)
	/usr/local/go/src/sync/waitgroup.go:130 +0x64
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.setupServer()
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:340 +0x663
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run()
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:439 +0xa4a
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.0.func1(0xc00035a240, 0xc000530000, 0x0, 0x6)
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:71 +0x52
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute(0xc00035a240, 0xc0000789c0, 0x6, 0x6, 0xc00035a240, 0xc0000789c0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:702 +0x2d3
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1c6eb80, 0x7, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:783 +0x2dc
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).Execute(0x1c6eb80, 0x8580ad, 0x1c77408)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra/command.go:736 +0x2b
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:57 +0x36
main.main()
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:33 +0x88

goroutine 19 [chan receive]:
github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x1cf35a0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:882 +0x8b
created by github.com/dgraph-io/dgraph/vendor/github.com/golang/glog.init.0
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/golang/glog/glog.go:410 +0x203

goroutine 21 [syscall, 2777 minutes]:
os/signal.signal_recv(0x13379b0)
	/usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 22 [chan receive]:
github.com/dgraph-io/dgraph/x.init.1.func1()
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/metrics.go:95 +0x7e
created by github.com/dgraph-io/dgraph/x.init.1
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/metrics.go:90 +0x52c

goroutine 28 [chan receive]:
github.com/dgraph-io/dgraph/worker.(*rateLimiter).bleed(0x1d1108c)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/proposal.go:60 +0x95
created by github.com/dgraph-io/dgraph/worker.init.0
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/proposal.go:45 +0x41

goroutine 121 [chan receive]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.(*WaterMark).process(0xc00053a180)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/watermark.go:192 +0x22a
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.(*WaterMark).Init
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/watermark.go:73 +0xbf

goroutine 122 [chan receive, 2 minutes]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.(*WaterMark).process(0xc00053a1c0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/watermark.go:192 +0x22a
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.(*WaterMark).Init
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/watermark.go:73 +0xbf

goroutine 123 [select]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*DB).updateSize(0xc000050e00, 0xc0000a0440)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:946 +0x125
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.Open
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:261 +0x7b1

goroutine 124 [select]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*levelsController).runWorker(0xc000142000, 0xc0000a0e40)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/levels.go:248 +0x151
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*levelsController).startCompact
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/levels.go:233 +0x82

goroutine 125 [select]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*levelsController).runWorker(0xc000142000, 0xc0000a0e40)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/levels.go:248 +0x151
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*levelsController).startCompact
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/levels.go:233 +0x82

goroutine 126 [select]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*levelsController).runWorker(0xc000142000, 0xc0000a0e40)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/levels.go:248 +0x151
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*levelsController).startCompact
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/levels.go:233 +0x82

goroutine 127 [chan receive, 13 minutes]:
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*DB).flushMemtable(0xc000050e00, 0xc0000a0e60, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:873 +0x179
created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.Open
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:274 +0xe01

goroutine 111 [select, 2 minutes]:
...

(Michel Conrado) #2

Hi Nikita,

Can you share your Dgraph Version? Seems you’re in a old version.
Do an Upgrade https://docs.dgraph.io/deploy#upgrade-database

Cheers.


(Nikita Zaletov) #3

in which version it was fixed? we use 1.0.10, can’t upgrade to 1.0.11 and 1.0.12rc5 because of some bugs in these versions


(Michel Conrado) #4

We never recommend update to direct RCs. Always update to a properly released version.
We only recommend RCs in specific cases. As a specific solution that was worked on that RC.

Also, you do not have to migrate completely to a more current version. Make a clone of the original data and test until you feel comfortable.

Cheers.


(Nikita Zaletov) #5

thats i did. but 1.0.11 contains a bug so our queries dont work there
so, this issue was fixed after 1.0.10? or before? because we have such oom error on 1.0.10


(Michel Conrado) #6

1.0.10 is 3 months old. Since then We have a few commits related to Snapshot. Check https://github.com/dgraph-io/dgraph/search?q=snapshot&type=Commits


(Manish R Jain) #7

After the restart, did it continue to go OOM?

Also, can you paste the logs? Looks like another replica was asking for the snapshot.


(Nikita Zaletov) #8

after restart it ran fine. but crashed again in a day
please find log here: https://drive.google.com/file/d/14BfRvWsbcBgtm3RgMDQr6PxdebtxSaFL/view?usp=sharing


(Nikita Zaletov) #9

just got it again if it helps - https://drive.google.com/file/d/1wSW2KXaTZ0-ttmRufLuiEsrD7Tr09ARr/view?usp=sharing

(first row actually contains OOM error but google sheet hides it for some reason. in CSV it exists)


(Manish R Jain) #10

Hmm… Hard to say from the logs. What issue are you seeing in 12-rc? Possible to move to it? We’re cutting a new release today, worth trying out. We haven’t seen any OOM issues in the 12 series.


(Nikita Zaletov) #11

we have issues with rc5 described here: https://github.com/dgraph-io/dgraph/issues/2980 which are apparently fixed, so will wait for next rc to check it
(on 1.0.11 we had issues with vars: https://github.com/dgraph-io/dgraph/issues/2832 which are already fixed as well)


(Michel Conrado) #12

Hey Nikita,

If you need to advance the process of testing. You can build Dgraph from Master easily.

Only for Linux or Darwin OS with Go lang installed and “make” ready to build.

Run:

go get github.com/dgraph-io/dgraph/…

cd to golang path

Maybe ~/go/… or ~/.go/…

look for src and inside of it github.com and then dgraph.io
go to Dgraph repo dir itself and go to /dgraph/dgraph/

you gonna see a “Makefile” there.

Run:

Make

when the build finishes you will have a binary with the version from master.
Also you can checkout to any other branch to build.

Cheers.


(Manish R Jain) #13

I won’t recommend using master. Use release/v1.0 branch. We’re going to cut an RC coming week (final one for v1.0.12), so either build from that branch, or just wait for the RC.


(Nikita Kazarian) #14

@makitka did you solve this problem? And can you email me please - elvis1616@gmail.com