Small survey on compression (ZSTD in particular)

MichelDiz · March 30, 2023, 8:07pm

Dgraph supports data compression, using either Snappy or ZSTD compression algorithms, or no compression at all. By default, Snappy compression is used.

Snappy is a fast compression and decompression library that is optimized for speed and efficiency, making it a good choice for real-time data processing applications. ZSTD is another compression algorithm that provides a higher compression ratio than Snappy, but with a higher compression and decompression time. Choosing whether to use compression or which algorithm to use depends on the specific needs of the application, with Snappy providing a good balance between compression efficiency and speed.

Please answer the following questions to help us gain a better understanding of your preferences and usage patterns:

Are you currently using ZSTD or Snappy compression in Dgraph?

Yes (which one?)
No (why?)

If you are not currently using ZSTD or Snappy, do you plan to test it in the future?

Yes
No

If you are not using any compression in Dgraph, what is the reason for your preference?

Not necessary for my use case (I disable it)
Concerns about performance impact
Other (please specify)

What is your primary requirement for data compression in Dgraph?

Improved performance
Reduced disk usage (size in disk)
Better performance with large text sizes
Other (please specify)

If you have any experience with ZSTD and Dgraph please share.

Thank you for your participation in this survey.

ppp225 · April 3, 2023, 7:28pm

Are you currently using ZSTD or Snappy compression in Dgraph?

Yes and No:

Yes, Snappy, because with enough resources it seems to have a positive impact
No, when I want to limit RAM usage, like in CI pipeline runners for unit tests

If you are not currently using ZSTD or Snappy, do you plan to test it in the future?

No, don’t plan on using ZSTD, as from the limited benchamarks I found on the forums it was way slower than snappy

If you are not using any compression in Dgraph, what is the reason for your preference?

Concerns about performance impact - I couldn’t really find benchmarks comparing resource usage with and without compression in a real world scenario.

What is your primary requirement for data compression in Dgraph?

Improved performance
Reduced disk usage (size in disk)
Better performance with large text sizes

Topic		Replies	Views
Badger Compression Feedback Badger	6	1558	January 2, 2020
Data compression on disk - Deploy Documentation	0	431	August 29, 2020
21 million seems small Users	14	1517	November 28, 2017
Export RAM usage in dgraph v20.07.2 Dgraph	8	869	November 8, 2020
Suddenly increase pace of disk usage in Dgraph Dgraph	2	520	August 22, 2018

Small survey on compression (ZSTD in particular)

Related topics