As it was mentioned in #1081#1104 BadgerDB consider switching to a pure Go implementation of zstd (github.com/klauspost/compress) after it went out of Beta.
STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively used by several projects. This library is being continuously fuzz-tested, kindly supplied by fuzzit.dev.
I’m also interested in seeing this reopened. I have a package that imports badger/v2, and the indirect dependency on DataDog/zstd is really unfortunate. It takes a long time to build all that C code, and because it’s a single package, it means a huge bottleneck in my build time.
writing 100 million key-values approx 22 gig of data
Datadog/zstd
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23.08
Klauspost/compress
Elapsed (wall clock) time (h:mm:ss or m:ss): 7:50.20
Datadog/zstd logs
(run master with compression enabled)
badger 2020/06/19 18:21:19 INFO: Running for level: 0
badger 2020/06/19 18:21:25 INFO: LOG Compact 0->1, del 11 tables, add 10 tables, took 5.635393792s
badger 2020/06/19 18:21:25 INFO: Compaction for level: 0 DONE
badger 2020/06/19 18:21:25 INFO: Force compaction on level 0 done
2020/06/19 18:21:25 DB.Close. Error: <nil>. Time taken to close: 8.799659465s
Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
User time (seconds): 1585.33
System time (seconds): 46.84
Percent of CPU this job got: 426%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23.08
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3015956
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 18
Minor (reclaiming a frame) page faults: 656889
Voluntary context switches: 11779130
Involuntary context switches: 125223
Swaps: 0
File system inputs: 2832
File system outputs: 107017448
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
badger 2020/06/19 18:10:28 INFO: LOG Compact 1->2, del 9 tables, add 9 tables, took 5.11088109s
badger 2020/06/19 18:10:28 INFO: Compaction for level: 1 DONE
badger 2020/06/19 18:10:28 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/19 18:10:28 DB.Close. Error: <nil>. Time taken to close: 24.191639289s
Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
User time (seconds): 3697.65
System time (seconds): 57.36
Percent of CPU this job got: 798%
Elapsed (wall clock) time (h:mm:ss or m:ss): 7:50.20
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3046388
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 735124
Voluntary context switches: 15336179
Involuntary context switches: 600206
Swaps: 0
File system inputs: 32
File system outputs: 114008192
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
@johanbrandhorst@mvdan I understand that CGO is an evil and should not be used but do you have any strong reasons for switching to pure go based library? I’d like to move away from CGO but the performance difference is what makes me reluctant for this change. I’d love to know your thoughts
It might be impossible for compression to beat well optimized C code, even with the cgo cost of DataDog/zstd. But still, we could use this issue to track progress or reevaluate the situation every now and then. To me, this is a tradeoff - I would gladly have 5% slower code that doesn’t require cgo with all of its drawbacks, for example. I get that you’re seeing a difference larger than a few percent, but I imagine that gap can be made smaller over time.
Great with a real world test. However it doesn’t seem too real world.
You seem to be writing random (incompressible) data. That is a very, very limited test of a compressor and I assume the data isn’t really compressed at all?
It will always try to entropy compress the literals even if no matches can be found. I will however try to make this dynamic so it will automatically apply to cases in the fastest mode.
For this test you can use the zstd.WithNoEntropyCompression(true) option. For me that doubles the speed with random 4K blocks. I can’t remember the semantics used by native zstd.
You should disable CRC. It is disabled by default in datadog IIRC. This is about 5% improvement for me.
Honestly, overall, if it is disabled by default I don’t really see a problem. Also it could at least be used as a fallback. You can just use the cgo build tag to en/disable it.
I made the changes @klauspost suggested and I see very different results now. I made the following two changes
The keys being inserted are 32 bytes long with integers (not random, prefixed with zeros).
if err := batch.Set([]byte(fmt.Sprintf("%032d", i)), value); err != nil {
Set the WithNoEntropyCompression(true) and EOptions.WithEncoderCRC(false)
Here’s what I see now
writing 100 million key-values approx 18 GB of data (earlier it was 22 GB)
Datadog/zstd
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:58.19
Klauspost/compress
Elapsed (wall clock) time (h:mm:ss or m:ss): 5:40.24
I’m assuming the performance improvement is mostly because the data is no longer random. In real use cases, the data won’t be random.
On master
badger 2020/06/20 12:33:35 INFO: Compaction for level: 1 DONE
badger 2020/06/20 12:33:35 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/20 12:33:35 DB.Close. Error: <nil>. Time taken to close: 1m36.995748667s
Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
User time (seconds): 1074.89
System time (seconds): 24.20
Percent of CPU this job got: 262%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:58.19
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3437000
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 42
Minor (reclaiming a frame) page faults: 281705
Voluntary context switches: 11946311
Involuntary context switches: 91562
Swaps: 0
File system inputs: 4288
File system outputs: 52720896
Socket message sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
On ibrahim/klauspost-compress branch
badger 2020/06/20 12:41:07 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/20 12:41:07 DB.Close. Error: <nil>. Time taken to close: 12.916605732s
Command being timed: "go run main.go benchmark write -m 100 --dir ./100m-k -l"
User time (seconds): 1010.72
System time (seconds): 20.27
Percent of CPU this job got: 303%
Elapsed (wall clock) time (h:mm:ss or m:ss): 5:40.24
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2568748
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 309196
Voluntary context switches: 9775613
Involuntary context switches: 76950
Swaps: 0
File system inputs: 56
File system outputs: 52799208
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I think we should switch over to the pure go based implementation. Badger compactions (which will compress data) happen in the background and they can get affected by multiple factors. The compression algorithm wouldn’t be the bottleneck. @klauspost IIUC, I can decompress the data that was created by dataDog/zstd using klauspost/compres (which means they’re compatible), is this correct?
@mvdan I understand your point but just out of curiosity, why wouldn’t you use CGO in your code?
CGO comes with an associated cost but if there’s a significant performance gain, let’s say 2X using some C++ library, would you still prefer a go based one? If so, please help me understand why. I’m sorry if this question sounds too trivial, I don’t have a lot of experience with CGO.
I also noticed that the number of major page fails (requiring I/O) is much higher in case of datadog/zstd. Is this also a side effect of CGO? I would’ve expected the page faults to be somewhat similar in both the cases since they’re performing the same kind of operation.
IIUC, I can decompress the data that was created by dataDog/zstd using klauspost/compres (which means they’re compatible), is this correct?
Yes they are compatible both ways. The only exception is 0 bytes of input which will give 0 bytes output with the Go zstd. But you already have the zstd.WithZeroFrames(true) which will wrap 0 bytes in a header so it can be fed to DD zstd. This will of course only be relevant when downgrading.
I am fuzz testing the change above. It will have much less compression impact than completely disabling entropy coding, but will handle the random input blocks better. I would probably leave out the WithNoEntropyCompression and upgrade to the next version that will select this automatically when it makes sense.
number of major page fails
The dd library allocates a lot. That could be why. Go zstd does not allocate for 4K blocks after a few runs.
why wouldn’t you use CGO in your code?
Compiling is much slower. Cross compilation is a pain to set up/impossible for some. Deployment requires dependencies whereas plain Go is just the executable. cgo is inherently less secure since none of the c code has the security features the Go runtime provides.
Is switching/adding go-only zstd compression planned in the near future?
We are evaluating badger as a replacement for goleveldb for syncthing. And Syncthing runs on plenty of systems/archs, most of them cross-compiled. Thus we try to keep it pure go.
Several people have commented eagerly awaiting it to be merged, but it’s possible that you don’t have github notifications turned on.
I’m experimenting with integrating Badger into a project, I won’t be enabling CGO, and I would like to have zstd available. If you’re busy with other stuff, that’s understandable, it would just be nice to know what’s going on with that PR.