Export RAM usage in dgraph v20.07.2

Experience Report / Bug

Export RAM usage in dgraph v20.07.2 is ~250% of v20.07.1, despite no visible improvements.

What you wanted to do

Export using curl localhost:8080/admin/export

Why that wasn’t great, with examples

Tested with 2 datasets, 50MB and 700MB, on a single alpha node (in docker). Export took the same amount of time in both dgraph versions.

ver\dataset 50MB 700MB
v20.07.1 1.5GB 3.3GB
v20.07.2 4.0GB 7.7GB

Export shouldn’t take more RAM, this doesn’t seem right. It would require to run dgraph on more powerful machines just for the export, which isn’t ideal.

Hi @ppp225

Sorry to hear this, let me do some tests and see what’s wrong. Will get back to you

In the meantime can re-run the test and collect memory heap-profiles when export is running this will help us understanding which component is using more memory than needed.

Also, to keep our test the more close to yours can you share the dataset you used with us, we can supply you with a private folder where you can share them.

Best,

Hi @ppp225

I did some tests and was not able to reproduce such memory difference that you reported here, wondering if you had the chance to collect some memory profiles when the export was running?

Best,
Omar

Hi @omar,

I have re-run tests with dgraph’s 21million dataset (170MB) from here: benchmarks/data at master · dgraph-io/benchmarks · GitHub
and observed the same issue during data export as before. (My dataset is similar.)

version docker stats mem usage pprof reported
v20.07.1 2.4GB 1GB
v20.07.2 5.6GB 2.9GB

I took mem profiles near the end of export:
pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.shuri-1.pb.gz (13.2 KB)
pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.shuri-2.pb.gz (20.2 KB)

Thanks for looking at this issue. Please let me know if you need anything else.

Hi @ppp225,

Thank you for providing the mem profile, will look into them and get back to you

Best,
Omar

Hi @ppp225

I discussed this internally and in 20.07.2 compression is enabled by default (as well as cache) hence memory usage is higher since data is compressed.

In the upcoming releases, we will let disabling compression possible - for the time being If you want to reduce the memory, we can help you remove the compression (by using badger stream command).

Best,
Omar

Thanks @omar for your answer.

I understand that the /p/ dir is now compressed (I see the difference in the discussion here), and during export the data has to be decompressed, and thus the higher memory usage.
I have found the related PR and Discuss.

I’m not sure about the cache, as export for me takes the same amount of time on both versions.

What I am missing from those discussions is performance considerations of this change. As we can see, export now consumes additional memory. Could you elaborate or link me to a discussion about it, as I’d like to know more about compression impact on performance and cpu/mem/ssd tradeoffs.

In the upcoming releases, we will let disabling compression possible

That would be great, as I would be able to fine tune to my needs! For now I’ll stay on v20.07.1.

Thanks,

You can disable compression in Alpha by setting the flag --badger.compression_level=0. That flag is available already.

If you enable it from a fresh cluster, then the data isn’t compressed. If you enable it for an existing cluster, then the existing data would be compressed and as new data is written to Dgraph the compressed files will get removed during compactions.

Thanks @dmai for pointing that out!

Disabling compression saved around 1GB. I looked at the flags and disabling cache has a much higher impact. By disabling both, I was able to get similar memory usage during export (and queries) as before, as seen in the table below.

version docker stats pprof time notes
v20.07.1 2.4GB 1GB 1m45s
v20.07.2 5.6GB 2.9GB 1m45s
v20.07.2 4.6GB 2.2GB 1m45s –badger.compression_level=0
v20.07.2 2.5GB 1GB 1m45s –badger.compression_level=0 --cache_mb 0

This was tested on a single node machine. Cache does not seem to have an impact on export times. Multi node clusters may perform differently. Used live loader, as bulk loader ignored the compression flag.

Using those both options I am able to continue running dgraph on low-memory machines.


I would like to add, that dgraph v20.07.1 (and .0) was impressively stable for me memory wise. Runs smoothly for months with predictable memory usage.
Seems that cache/compression adds a layer of unpredictability to RAM usage during queries also, and I’d love to see a mention in the deploy docs for running dgraph with low-memory (at the cost of performance).
This is a common use case for me when running apps / microservices, where additional performance doesn’t matter. Like development machines, unit tests or applications, where I’d like to have dgraphs scale potential for the future, but don’t want to run 32GB nodes just yet.