Benchmarks using `GODEBUG=madvdontneed` environment variable

Context
Different Customers / Users reported that Dgraph was still using memory even if their cluster was idle. The issue seems to be related to golang runtime that is not releasing that memory, either because it will try to reuse it.

This is not an issue with Dgraph itself. This is a known and common question/issue in the Go community:

…and many more…

Purpose of the tests
We wanted to test how golang was releasing memory when using the environment variable GODEBUG=madvdontneed=1 and see if it has any downside on Dgraph. Based on the results/comments we will decide if we should advise our users to use this underlying environment or not

Test setup
It was performed today (2020-06-22) using Dgraph compiled from master branch:

Dgraph version   : v2.0.0-rc1-425-gad0914a0c
Dgraph SHA-256   : 13bd1272550ca1d20d8126ca2f8058d4b7c958a22eadcbf330b6df35b5f531d5
Commit SHA-1     : ad0914a0c
Commit timestamp : 2020-06-22 13:20:38 +0530
Branch           : master
Go version       : go1.14.4
  • Create a cluster
  • bulk load a large dataset
  • take backups (alpha here will use enough memory for our test case)

Test1: GODEBUG=madvdontneed=1 (turned on)
This test is divided into two parts:

  • 1st part: backups were taken every 4min
  • 2nd part: backups were taken every 10min

Results:


Before the red line it shows up alpha mem usage with backups taken every 4 minutes, after the red line it shows us mem usage with backups taken every 10min

Test2: GODEBUG=madvdontneed=0 (turned off)
This test was run for a shorter time period as the results were already visible after a bunch of backups
Results:

@dmai FYI

2 Likes

We should be careful about these changes since it will have significant performance implications. The advdontneed should cause a lot of page faults (theoretically). @omar Do you have some metrics about the performance? How long did the backup process take with the flag enabled and disabled?

Let me explain a bit about what the flag you changed does.

Madvise - The memory usage advisor

The mdvise https://www.man7.org/linux/man-pages/man2/madvise.2.html system call is used to advise the kernel about how should it treat the memory used by the process (golang runtime in this case).

The golang runtime uses MADV_FREE by default

This means, the GC will mark freed up memory spans as FREE but it is up to the kernel to decide if it should reclaim the memory or not. The kernel will free up this memory if needed. So If I’m running only dgraph on a 64 GB machine that has idle memory of 30 GB, the kernel might still not reclaim it since there’s no other process requesting memory. This is done to improve performance since the kernel can take away the memory when it needs but the process (golang runtime) can hold on to it until someone else needs it.

The change you did GODEBUG=madvdontneed=1 (turned on) will advise the kernel to treat the marked spans as MADV_DONTNEED which means that take away these spans immediately. If you were to look at the RSS of the process, you’d see the memory usage drop instantly. This helps when you want to free up the memory instantly, but this would cause page faults since we gave away all the memory runtime had. This should affect the performance by a huge margin (theoretically).

This page has a nice explanation https://kernelnewbies.org/Linux_4.5#Add_MADV_FREE_flag_to_madvise.282.29

Also, the madvdontneed=0 was added in go1.12 https://golang.org/doc/go1.12#runtime and it won’t have any effect on older versions of golang.

1 Like

Hi @ibrahim, thanks for the feedback.

The 1st test (with GODEBUG=madvdontneed=1) has been running for 5+ hours and the 2nd test (with GODEBUG=madvdontneed=0) has been running for 1+ hour

If needed I can leave the test running for more time and provide charts accordingly.

If needed I can do another test, where I set GODEBUG=madvdontneed=0 and run a parallel process (not dgraph) that requires enough memory, in this way we can verify if the kernel will reclaim back some memory from dgraph to use it for the other process.

Did you mean when setting GODEBUG=madvdontneed=1 (turned on), wondering if it was a typo ? As the memory usage does not drop off at all when the undelying env variable is turned off (=0)

The important bit would be about how long did the backup take? The flag will release the memory but my concern is about how fast/slow it does it make everything.

Yes, sorry there was a typo. I’ll fix my comment. advise_dont_need means to free up the memory instantly. GODEBUG=madvdontneed=1 will instantly free up the memory.