Use pure Go zstd implementation

Moved from GitHub badger/1162

Posted by narqo:

What version of Go are you using (go version)?

$ go version
go version go1.13.4 darwin/amd64

What version of Badger are you using?

latest master

Does this issue reproduce with the latest master?

Yes

As it was mentioned in #1081 #1104 BadgerDB consider switching to a pure Go implementation of zstd (github.com/klauspost/compress) after it went out of Beta.

Version 1.9.4 is no longer marked as beta, refer to https://github.com/klauspost/compress/tree/v1.9.4/zstd#status

STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively used by several projects. This library is being continuously fuzz-tested, kindly supplied by fuzzit.dev.

jarifibrahim commented :

We’re working on this @narqo. I’ll try to make this change as soon as possible.

jarifibrahim commented :

Hi @narqo , we’ve decided to not use the pure go based ZSTD because of performance issues. Please see https://github.com/dgraph-io/badger/pull/1176#issuecomment-573635053 and the benchmarks in https://github.com/dgraph-io/badger/pull/1176#issue-356949512 .

You can build badger with CGO_ENABLED=0 to use badger without CGO in which case Snappy would be the default compression algorithm.

johanbrandhorst commented :

The discussion in https://github.com/dgraph-io/badger/pull/1176 implies to me that this decision should be reconsidered.

mvdan commented :

I’m also interested in seeing this reopened. I have a package that imports badger/v2, and the indirect dependency on DataDog/zstd is really unfortunate. It takes a long time to build all that C code, and because it’s a single package, it means a huge bottleneck in my build time.

jarifibrahim commented :

I did some benchmarks against both the libraries using the badger benchmark write tool https://github.com/dgraph-io/badger/blob/master/badger/cmd/write_bench.go (compresion is disabled by default, you’ll have to enable it) and here’s what I found

writing 100 million key-values approx 22 gig of data
Datadog/zstd
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23.08
Klauspost/compress
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:50.20

Datadog/zstd logs

(run master with compression enabled)

badger 2020/06/19 18:21:19 INFO: Running for level: 0
badger 2020/06/19 18:21:25 INFO: LOG Compact 0->1, del 11 tables, add 10 tables, took 5.635393792s
badger 2020/06/19 18:21:25 INFO: Compaction for level: 0 DONE
badger 2020/06/19 18:21:25 INFO: Force compaction on level 0 done
2020/06/19 18:21:25 DB.Close. Error: <nil>. Time taken to close: 8.799659465s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
	User time (seconds): 1585.33
	System time (seconds): 46.84
	Percent of CPU this job got: 426%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23.08
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3015956
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 18
	Minor (reclaiming a frame) page faults: 656889
	Voluntary context switches: 11779130
	Involuntary context switches: 125223
	Swaps: 0
	File system inputs: 2832
	File system outputs: 107017448
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Klauspost/compress

(run code in https://github.com/dgraph-io/badger/tree/ibrahim/klauspost-compress with compression enabled)

badger 2020/06/19 18:10:28 INFO: LOG Compact 1->2, del 9 tables, add 9 tables, took 5.11088109s
badger 2020/06/19 18:10:28 INFO: Compaction for level: 1 DONE
badger 2020/06/19 18:10:28 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/19 18:10:28 DB.Close. Error: <nil>. Time taken to close: 24.191639289s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
	User time (seconds): 3697.65
	System time (seconds): 57.36
	Percent of CPU this job got: 798%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:50.20
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3046388
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1
	Minor (reclaiming a frame) page faults: 735124
	Voluntary context switches: 15336179
	Involuntary context switches: 600206
	Swaps: 0
	File system inputs: 32
	File system outputs: 114008192
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@johanbrandhorst @mvdan I understand that CGO is an evil and should not be used but do you have any strong reasons for switching to pure go based library? I’d like to move away from CGO but the performance difference is what makes me reluctant for this change. I’d love to know your thoughts

mvdan commented :

For those wondering what the code looks like, it’s https://github.com/dgraph-io/badger/compare/ibrahim/klauspost-compress. I’ll leave it to @klauspost to comment if the use of his pure Go version could be improved or if it’s equivalent to the cgo version.

It might be impossible for compression to beat well optimized C code, even with the cgo cost of DataDog/zstd. But still, we could use this issue to track progress or reevaluate the situation every now and then. To me, this is a tradeoff - I would gladly have 5% slower code that doesn’t require cgo with all of its drawbacks, for example. I get that you’re seeing a difference larger than a few percent, but I imagine that gap can be made smaller over time.

klauspost commented :

Great with a real world test. However it doesn’t seem too real world.

You seem to be writing random (incompressible) data. That is a very, very limited test of a compressor and I assume the data isn’t really compressed at all?
It will always try to entropy compress the literals even if no matches can be found. I will however try to make this dynamic so it will automatically apply to cases in the fastest mode.

For this test you can use the zstd.WithNoEntropyCompression(true) option. For me that doubles the speed with random 4K blocks. I can’t remember the semantics used by native zstd.

You should disable CRC. It is disabled by default in datadog IIRC. This is about 5% improvement for me.

Honestly, overall, if it is disabled by default I don’t really see a problem. Also it could at least be used as a fallback. You can just use the cgo build tag to en/disable it.

Edit: 2x faster random data in fastest mode: https://github.com/klauspost/compress/pull/270

jarifibrahim commented :

I made the changes @klauspost suggested and I see very different results now. I made the following two changes

  1. The keys being inserted are 32 bytes long with integers (not random, prefixed with zeros).
     if err := batch.Set([]byte(fmt.Sprintf("%032d", i)), value); err != nil {
  1. Set the WithNoEntropyCompression(true) and EOptions.WithEncoderCRC(false)

Here’s what I see now

writing 100 million key-values approx 18 GB of data (earlier it was 22 GB)
Datadog/zstd
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:58.19
Klauspost/compress
	Elapsed (wall clock) time (h:mm:ss or m:ss): 5:40.24

I’m assuming the performance improvement is mostly because the data is no longer random. In real use cases, the data won’t be random.

On master

badger 2020/06/20 12:33:35 INFO: Compaction for level: 1 DONE
badger 2020/06/20 12:33:35 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/20 12:33:35 DB.Close. Error: <nil>. Time taken to close: 1m36.995748667s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
	User time (seconds): 1074.89
	System time (seconds): 24.20
	Percent of CPU this job got: 262%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:58.19
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3437000
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 42
	Minor (reclaiming a frame) page faults: 281705
	Voluntary context switches: 11946311
	Involuntary context switches: 91562
	Swaps: 0
	File system inputs: 4288
	File system outputs: 52720896
	Socket message sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

On ibrahim/klauspost-compress branch

badger 2020/06/20 12:41:07 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/20 12:41:07 DB.Close. Error: <nil>. Time taken to close: 12.916605732s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m-k -l"
	User time (seconds): 1010.72
	System time (seconds): 20.27
	Percent of CPU this job got: 303%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 5:40.24
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2568748
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 309196
	Voluntary context switches: 9775613
	Involuntary context switches: 76950
	Swaps: 0
	File system inputs: 56
	File system outputs: 52799208
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

I think we should switch over to the pure go based implementation. Badger compactions (which will compress data) happen in the background and they can get affected by multiple factors. The compression algorithm wouldn’t be the bottleneck.
@klauspost IIUC, I can decompress the data that was created by dataDog/zstd using klauspost/compres (which means they’re compatible), is this correct?

@mvdan I understand your point but just out of curiosity, why wouldn’t you use CGO in your code?
CGO comes with an associated cost but if there’s a significant performance gain, let’s say 2X using some C++ library, would you still prefer a go based one? If so, please help me understand why. I’m sorry if this question sounds too trivial, I don’t have a lot of experience with CGO.

I also noticed that the number of major page fails (requiring I/O) is much higher in case of datadog/zstd. Is this also a side effect of CGO? I would’ve expected the page faults to be somewhat similar in both the cases since they’re performing the same kind of operation.

klauspost commented :

IIUC, I can decompress the data that was created by dataDog/zstd using klauspost/compres (which means they’re compatible), is this correct?

Yes they are compatible both ways. The only exception is 0 bytes of input which will give 0 bytes output with the Go zstd. But you already have the zstd.WithZeroFrames(true) which will wrap 0 bytes in a header so it can be fed to DD zstd. This will of course only be relevant when downgrading.

I am fuzz testing the change above. It will have much less compression impact than completely disabling entropy coding, but will handle the random input blocks better. I would probably leave out the WithNoEntropyCompression and upgrade to the next version that will select this automatically when it makes sense.

number of major page fails

The dd library allocates a lot. That could be why. Go zstd does not allocate for 4K blocks after a few runs.

why wouldn’t you use CGO in your code?

Compiling is much slower. Cross compilation is a pain to set up/impossible for some. Deployment requires dependencies whereas plain Go is just the executable. cgo is inherently less secure since none of the c code has the security features the Go runtime provides.

klauspost commented :

FYI, I just released v1.10.10 which will automatically disable entropy coding on likely incompressible data in fastest mode.

Is switching/adding go-only zstd compression planned in the near future?

We are evaluating badger as a replacement for goleveldb for syncthing. And Syncthing runs on plenty of systems/archs, most of them cross-compiled. Thus we try to keep it pure go.

@imsodin I have a PR for this https://github.com/dgraph-io/badger/pull/1383 and we’ll try to get this merged soon.

Didn’t notice that - thanks for the heads up!

1 Like