Use pure Go zstd implementation

Moved from GitHub badger/1162

Posted by narqo:

What version of Go are you using (go version)?

$ go version
go version go1.13.4 darwin/amd64

What version of Badger are you using?

latest master

Does this issue reproduce with the latest master?

Yes

As it was mentioned in #1081 #1104 BadgerDB consider switching to a pure Go implementation of zstd (github.com/klauspost/compress) after it went out of Beta.

Version 1.9.4 is no longer marked as beta, refer to compress/zstd at v1.9.4 · klauspost/compress · GitHub

STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively used by several projects. This library is being continuously fuzz-tested, kindly supplied by fuzzit.dev.

jarifibrahim commented :

We’re working on this @narqo. I’ll try to make this change as soon as possible.

jarifibrahim commented :

Hi @narqo , we’ve decided to not use the pure go based ZSTD because of performance issues. Please see Use pure Go based ZSTD implementation by jarifibrahim · Pull Request #1176 · dgraph-io/badger · GitHub and the benchmarks in Use pure Go based ZSTD implementation by jarifibrahim · Pull Request #1176 · dgraph-io/badger · GitHub .

You can build badger with CGO_ENABLED=0 to use badger without CGO in which case Snappy would be the default compression algorithm.

johanbrandhorst commented :

The discussion in Use pure Go based ZSTD implementation by jarifibrahim · Pull Request #1176 · dgraph-io/badger · GitHub implies to me that this decision should be reconsidered.

mvdan commented :

I’m also interested in seeing this reopened. I have a package that imports badger/v2, and the indirect dependency on DataDog/zstd is really unfortunate. It takes a long time to build all that C code, and because it’s a single package, it means a huge bottleneck in my build time.

jarifibrahim commented :

I did some benchmarks against both the libraries using the badger benchmark write tool badger/write_bench.go at master · dgraph-io/badger · GitHub (compresion is disabled by default, you’ll have to enable it) and here’s what I found

writing 100 million key-values approx 22 gig of data
Datadog/zstd
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23.08
Klauspost/compress
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:50.20

Datadog/zstd logs

(run master with compression enabled)

badger 2020/06/19 18:21:19 INFO: Running for level: 0
badger 2020/06/19 18:21:25 INFO: LOG Compact 0->1, del 11 tables, add 10 tables, took 5.635393792s
badger 2020/06/19 18:21:25 INFO: Compaction for level: 0 DONE
badger 2020/06/19 18:21:25 INFO: Force compaction on level 0 done
2020/06/19 18:21:25 DB.Close. Error: <nil>. Time taken to close: 8.799659465s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
	User time (seconds): 1585.33
	System time (seconds): 46.84
	Percent of CPU this job got: 426%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23.08
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3015956
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 18
	Minor (reclaiming a frame) page faults: 656889
	Voluntary context switches: 11779130
	Involuntary context switches: 125223
	Swaps: 0
	File system inputs: 2832
	File system outputs: 107017448
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Klauspost/compress

(run code in GitHub - dgraph-io/badger at ibrahim/klauspost-compress with compression enabled)

badger 2020/06/19 18:10:28 INFO: LOG Compact 1->2, del 9 tables, add 9 tables, took 5.11088109s
badger 2020/06/19 18:10:28 INFO: Compaction for level: 1 DONE
badger 2020/06/19 18:10:28 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/19 18:10:28 DB.Close. Error: <nil>. Time taken to close: 24.191639289s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
	User time (seconds): 3697.65
	System time (seconds): 57.36
	Percent of CPU this job got: 798%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:50.20
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3046388
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1
	Minor (reclaiming a frame) page faults: 735124
	Voluntary context switches: 15336179
	Involuntary context switches: 600206
	Swaps: 0
	File system inputs: 32
	File system outputs: 114008192
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@johanbrandhorst @mvdan I understand that CGO is an evil and should not be used but do you have any strong reasons for switching to pure go based library? I’d like to move away from CGO but the performance difference is what makes me reluctant for this change. I’d love to know your thoughts

mvdan commented :

For those wondering what the code looks like, it’s Comparing master...ibrahim/klauspost-compress · dgraph-io/badger · GitHub. I’ll leave it to @klauspost to comment if the use of his pure Go version could be improved or if it’s equivalent to the cgo version.

It might be impossible for compression to beat well optimized C code, even with the cgo cost of DataDog/zstd. But still, we could use this issue to track progress or reevaluate the situation every now and then. To me, this is a tradeoff - I would gladly have 5% slower code that doesn’t require cgo with all of its drawbacks, for example. I get that you’re seeing a difference larger than a few percent, but I imagine that gap can be made smaller over time.

klauspost commented :

Great with a real world test. However it doesn’t seem too real world.

You seem to be writing random (incompressible) data. That is a very, very limited test of a compressor and I assume the data isn’t really compressed at all?
It will always try to entropy compress the literals even if no matches can be found. I will however try to make this dynamic so it will automatically apply to cases in the fastest mode.

For this test you can use the zstd.WithNoEntropyCompression(true) option. For me that doubles the speed with random 4K blocks. I can’t remember the semantics used by native zstd.

You should disable CRC. It is disabled by default in datadog IIRC. This is about 5% improvement for me.

Honestly, overall, if it is disabled by default I don’t really see a problem. Also it could at least be used as a fallback. You can just use the cgo build tag to en/disable it.

Edit: 2x faster random data in fastest mode: zstd: Skip entropy on random data by klauspost · Pull Request #270 · klauspost/compress · GitHub

jarifibrahim commented :

I made the changes @klauspost suggested and I see very different results now. I made the following two changes

  1. The keys being inserted are 32 bytes long with integers (not random, prefixed with zeros).
     if err := batch.Set([]byte(fmt.Sprintf("%032d", i)), value); err != nil {
  1. Set the WithNoEntropyCompression(true) and EOptions.WithEncoderCRC(false)

Here’s what I see now

writing 100 million key-values approx 18 GB of data (earlier it was 22 GB)
Datadog/zstd
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:58.19
Klauspost/compress
	Elapsed (wall clock) time (h:mm:ss or m:ss): 5:40.24

I’m assuming the performance improvement is mostly because the data is no longer random. In real use cases, the data won’t be random.

On master

badger 2020/06/20 12:33:35 INFO: Compaction for level: 1 DONE
badger 2020/06/20 12:33:35 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/20 12:33:35 DB.Close. Error: <nil>. Time taken to close: 1m36.995748667s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m -l"
	User time (seconds): 1074.89
	System time (seconds): 24.20
	Percent of CPU this job got: 262%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:58.19
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3437000
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 42
	Minor (reclaiming a frame) page faults: 281705
	Voluntary context switches: 11946311
	Involuntary context switches: 91562
	Swaps: 0
	File system inputs: 4288
	File system outputs: 52720896
	Socket message sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

On ibrahim/klauspost-compress branch

badger 2020/06/20 12:41:07 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
2020/06/20 12:41:07 DB.Close. Error: <nil>. Time taken to close: 12.916605732s
	Command being timed: "go run main.go benchmark write -m 100 --dir ./100m-k -l"
	User time (seconds): 1010.72
	System time (seconds): 20.27
	Percent of CPU this job got: 303%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 5:40.24
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2568748
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 309196
	Voluntary context switches: 9775613
	Involuntary context switches: 76950
	Swaps: 0
	File system inputs: 56
	File system outputs: 52799208
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

I think we should switch over to the pure go based implementation. Badger compactions (which will compress data) happen in the background and they can get affected by multiple factors. The compression algorithm wouldn’t be the bottleneck.
@klauspost IIUC, I can decompress the data that was created by dataDog/zstd using klauspost/compres (which means they’re compatible), is this correct?

@mvdan I understand your point but just out of curiosity, why wouldn’t you use CGO in your code?
CGO comes with an associated cost but if there’s a significant performance gain, let’s say 2X using some C++ library, would you still prefer a go based one? If so, please help me understand why. I’m sorry if this question sounds too trivial, I don’t have a lot of experience with CGO.

I also noticed that the number of major page fails (requiring I/O) is much higher in case of datadog/zstd. Is this also a side effect of CGO? I would’ve expected the page faults to be somewhat similar in both the cases since they’re performing the same kind of operation.

klauspost commented :

IIUC, I can decompress the data that was created by dataDog/zstd using klauspost/compres (which means they’re compatible), is this correct?

Yes they are compatible both ways. The only exception is 0 bytes of input which will give 0 bytes output with the Go zstd. But you already have the zstd.WithZeroFrames(true) which will wrap 0 bytes in a header so it can be fed to DD zstd. This will of course only be relevant when downgrading.

I am fuzz testing the change above. It will have much less compression impact than completely disabling entropy coding, but will handle the random input blocks better. I would probably leave out the WithNoEntropyCompression and upgrade to the next version that will select this automatically when it makes sense.

number of major page fails

The dd library allocates a lot. That could be why. Go zstd does not allocate for 4K blocks after a few runs.

why wouldn’t you use CGO in your code?

Compiling is much slower. Cross compilation is a pain to set up/impossible for some. Deployment requires dependencies whereas plain Go is just the executable. cgo is inherently less secure since none of the c code has the security features the Go runtime provides.

klauspost commented :

FYI, I just released v1.10.10 which will automatically disable entropy coding on likely incompressible data in fastest mode.

Is switching/adding go-only zstd compression planned in the near future?

We are evaluating badger as a replacement for goleveldb for syncthing. And Syncthing runs on plenty of systems/archs, most of them cross-compiled. Thus we try to keep it pure go.

@imsodin I have a PR for this Replace Datadog/zstd with Klauspost/compress by jarifibrahim · Pull Request #1383 · dgraph-io/badger · GitHub and we’ll try to get this merged soon.

Didn’t notice that - thanks for the heads up!

1 Like

@ibrahim Is there any update on that PR?

Several people have commented eagerly awaiting it to be merged, but it’s possible that you don’t have github notifications turned on.

I’m experimenting with integrating Badger into a project, I won’t be enabling CGO, and I would like to have zstd available. If you’re busy with other stuff, that’s understandable, it would just be nice to know what’s going on with that PR.

@coder543 Please see Replace Datadog/zstd with Klauspost/compress by jarifibrahim · Pull Request #1383 · dgraph-io/badger · GitHub . We have merged the changes and it will be part of the next badger release.

1 Like

As I noted here Replace Datadog/zstd with Klauspost/compress by jarifibrahim · Pull Request #1383 · dgraph-io/badger · GitHub I don’t think the PR that was merged sufficiently solves the concerns we (the Caddy project) had. Datadog/zstd is still part of the dependency chain, so we’ll continue to have issues with builds where CGO_ENABLED=0 is not explicitly specified.

Was there concerns about backwards compatibility that led you to take this approach instead?

Hey @francislavoie , we have merged feat(zstd): replace datadog's zstd with Klauspost's zstd by NamanJain8 · Pull Request #1709 · dgraph-io/badger · GitHub. Hope this solves the issue for you. :tada:

2 Likes

Thanks @Naman!

For Caddy specifically, we’ll need smallstep/nosql to update to the latest version of Badger so that we get the updates.

See Upgrade badger dependency · Issue #12 · smallstep/nosql · GitHub (if you have a few minutes, maybe you could answer some of the questions Max has about migrating from v2 to v3?)