Small survey on compression (ZSTD in particular)

Dgraph supports data compression, using either Snappy or ZSTD compression algorithms, or no compression at all. By default, Snappy compression is used.

Snappy is a fast compression and decompression library that is optimized for speed and efficiency, making it a good choice for real-time data processing applications. ZSTD is another compression algorithm that provides a higher compression ratio than Snappy, but with a higher compression and decompression time. Choosing whether to use compression or which algorithm to use depends on the specific needs of the application, with Snappy providing a good balance between compression efficiency and speed.

Please answer the following questions to help us gain a better understanding of your preferences and usage patterns:

  1. Are you currently using ZSTD or Snappy compression in Dgraph?
  • Yes (which one?)
  • No (why?)
  1. If you are not currently using ZSTD or Snappy, do you plan to test it in the future?
  • Yes
  • No
  1. If you are not using any compression in Dgraph, what is the reason for your preference?
  • Not necessary for my use case (I disable it)
  • Concerns about performance impact
  • Other (please specify)
  1. What is your primary requirement for data compression in Dgraph?
  • Improved performance
  • Reduced disk usage (size in disk)
  • Better performance with large text sizes
  • Other (please specify)

If you have any experience with ZSTD and Dgraph please share.

Thank you for your participation in this survey.

  1. Are you currently using ZSTD or Snappy compression in Dgraph?

Yes and No:

  • Yes, Snappy, because with enough resources it seems to have a positive impact
  • No, when I want to limit RAM usage, like in CI pipeline runners for unit tests
  1. If you are not currently using ZSTD or Snappy, do you plan to test it in the future?
  • No, don’t plan on using ZSTD, as from the limited benchamarks I found on the forums it was way slower than snappy
  1. If you are not using any compression in Dgraph, what is the reason for your preference?
  • Concerns about performance impact - I couldn’t really find benchmarks comparing resource usage with and without compression in a real world scenario.
  1. What is your primary requirement for data compression in Dgraph?

Improved performance
Reduced disk usage (size in disk)
Better performance with large text sizes

1 Like