Extreme memory usage when constantly query and mutate data

pjolep · November 11, 2019, 7:25am

Seems like constantly doing queries and mutations memory that Alpha nodes use is extremely hight.
(some discussion is Can Dgraph do 10 Billion Nodes? - #6 by MichelDiz as well)

In our case to handle about 6M nodes needs 30GB of memory on each of 3 Apha’s running in the cluster. This is too much and it increase as amount if data increases.

As shown on the picture below, extreme value was with about 6M nodes in database. Memory dropped when I dropped data from database. This shows that memory is getting up and up as more and more data are in database.

All details and tests can be found here:

Is there possibility to someone take a look of this and do memory profiling or testing on your test environment?

This is show stopper for us as we will have a way more data than 6M and having 200GB of RAM to handle only 20M nodes will be too expensive and non sense.

pawan · November 11, 2019, 10:26pm

Thanks for sharing the well documented test setup @pjolep. We are investigating this and would have an update for you soon.

pjolep · November 14, 2019, 1:43pm

In addition, might be useful. I realized that mutations are using memory. On the picture you will see A point where memory dropped when turned off mutations (so, same amount of data, same number of queries).

Point B happened when I executed roll restart of all Alpha nodes.

dmai · November 20, 2019, 4:53pm

Can you try setting the environment variable GODEBUG=madvdontneed=1 when running the Dgraph binaries?

I asked around in the Gophers performance Slack channel and was pointed to this open issue report about memory not being released by Go runtime:

github.com/golang/go

runtime: GC: heap idle is not released to linux

opened 03:35AM - 31 Jul 19 UTC

closed 12:29AM - 02 Nov 20 UTC

Anteoy

OS-Linux NeedsInvestigation FrozenDueToAge

### What version of Go are you using (`go version`)? <pre> $ go version go …version go1.12.1 linux/amd64 </pre> ### Does this issue reproduce with the latest release? No verification ### What operating system and processor architecture are you using (`go env`)? <details><summary><code>go env</code> Output</summary><br><pre> $ go env local env for build: GOARCH="amd64" GOBIN="/home/zhoudazhuang/gobin/" GOCACHE="/home/zhoudazhuang/.cache/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/home/zhoudazhuang/class100/gtools:/home/zhoudazhuang/goproject" GOPROXY="" GORACE="" GOROOT="/home/zhoudazhuang/usr/local/go1.12.1/go" GOTMPDIR="" GOTOOLDIR="/home/zhoudazhuang/usr/local/go1.12.1/go/pkg/tool/linux_amd64" GCCGO="gccgo" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build499587841=/tmp/go-build -gno-record-gcc-switches" online server: ╰─># uname -a Linux ll-025048236-FWWG.AppPZFW.prod.bj1 2.6.32-642.el6.x86_64 #1 SMP Tue May 10 17:27:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux ╰─># cat /etc/issue CentOS release 6.8 (Final) Kernel \r on an \m </pre></details> ### What did you do? Normal operation, heapIdle is getting bigger and bigger gc 24015 @413998.746s 23%: 16+3647+0.43 ms clock, 325+52790/18232/0+8.6 ms cpu, 6873->6900->5671 MB, 11404 MB goal, 20 P (forced) scvg-1: 151 MB released scvg-1: inuse: 9745, idle: 52574, sys: 62319, released: 52574, consumed: 9745 (MB) gc end: heapSys->65346568192, heapAlloc->6780874680, heapIdle->55123410944, heapReleased->55123247104 gc 24016 @414005.492s 23%: 7.1+3727+0.16 ms clock, 142+52631/18634/0+3.2 ms cpu, 6722->6838->5787 MB, 11342 MB goal, 20 P (forced) scvg-1: 149 MB released scvg-1: inuse: 9730, idle: 52594, sys: 62325, released: 52594, consumed: 9730 (MB) gc end: heapSys->65356005376, heapAlloc->6708748256, heapIdle->55151747072, heapReleased->55149142016 scvg2759: inuse: 9818, idle: 52491, sys: 62309, released: 52491, consumed: 9818 (MB) gc 24017 @414011.980s 23%: 21+3679+0.22 ms clock, 438+52629/18393/0+4.4 ms cpu, 6651->6707->5694 MB, 11575 MB goal, 20 P (forced) scvg-1: 142 MB released scvg-1: inuse: 9685, idle: 52629, sys: 62315, released: 52629, consumed: 9685 (MB) gc end: heapSys->65344962560, heapAlloc->6548453928, heapIdle->55188455424, heapReleased->55185702912 gc 24018 @414018.410s 23%: 25+3701+0.37 ms clock, 506+52583/18491/0+7.4 ms cpu, 6494->6558->5660 MB, 11389 MB goal, 20 P (forced) scvg-1: 177 MB released scvg-1: inuse: 9599, idle: 52687, sys: 62286, released: 52687, consumed: 9599 (MB) gc end: heapSys->65312587776, heapAlloc->6380088128, heapIdle->55247020032, heapReleased->55246462976 And the MemStats: fmtdebug Mem stats: {Alloc:7923150672 TotalAlloc:87802159107216 Sys:83425758664 Lookups:0 Mallocs:1317080204050 Frees:1316977196669 HeapAlloc:7923150672 HeapSys:63562973184 HeapIdle:52234674176 HeapInuse:11328299008 HeapReleased:51822125056 HeapObjects:103007381 StackInuse:16269934592 StackSys:16269934592 MSpanInuse:336623760 MSpanSys:420052992 MCacheInuse:34720 MCacheSys:49152 BuckHashSys:2346567 GCSys:2769477632 OtherSys:400924545 NextGC:12386391312 LastGC:1564479987316535864 PauseTotalNs:368820215369 PauseNs:[1287681 4999082 4236046 35185381 28689620 21532347 4224238 11823736 27419782 32950552 25486288 19654996 31435873 4595087 20290070 1704411 14529638 11940630 6121778 25226722 3802688 1190  ### What did you expect to see? heapIdle should release to os ### What did you see instead? heapIdle is not released to os , even thougth I periodic call debug.FreeOSMemory()

Go is releasing memory to the OS, but that isn’t reflected in the resident set size calculations. You can check the estimated memory counted as LazyFree by checking /proc/<pid>/smaps. The Linux documentation for LazyFree memory says this:

The memory isn’t freed immediately with madvise(). It’s freed in memory
pressure if the memory is clean.

Below, you’ll see the memory charts for the same workload to a regular dgraph alpha (blue line) and a GODEBUG=madvdontneed=1 dgraph alpha (orange line). The process memory of the orange line goes down.

pjolep · November 22, 2019, 6:03am

After a day of working with GODEBUG=madvdontneed=1 looks like nothing changed, still the memory that is seen as used by Kubernetes is about 6-7 GB higher then real used by Alpha nodes.

pjolep · February 5, 2020, 9:07am

It looked like with v1.1.1 it was behaving correct (not perfect), but correct. For instance, you had to be careful what queries you are running because some heavy query really kills the node. Like DGraph does not have any mechanism to block or stop query that takes a lot of memory.

After we have upgraded to v1.2.0 we started facing new issue. Looks like in some intervals DGraph is doing some things in background that takes a lot of memory and our nodes simple hit the limits that again kill the nodes.

Topic		Replies	Views
Dgraph Alpha Eating Up All RAM Dgraph	7	584	September 9, 2021
When writing data, dgraph takes up too much memory Dgraph area:performance	7	814	January 20, 2021
How to prevent RAM usage of Alpha node from growing? Dgraph mutation	10	2028	July 2, 2020
High memory utilization on alpha node (use of memory cache) Dgraph	8	1319	February 16, 2022
Consistent Increase in memory usage for zero leader Dgraph area:performance	7	1390	October 13, 2020

Extreme memory usage when constantly query and mutate data

Related topics