Benchmark request: badger v2 vs mongoDB

Moved from GitHub badger/833

Posted by tegk:

Could not find any benchmark.

How is the insert performance compared to MongoDB?

campoy commented :

MongoDB and Dgraph are wildly different databases, so talking about “insert performance” is quite risky.
Are we inserting new relationships? new nodes? new pieces in a document? Those will have quite different performances on MongoDB.

Do you have some specific usecase for which we might be able to come up with a meaningful benchmark?

tegk commented :

I am inserting a key value eg (“1”, “test@test.com”) and that couple million times.

campoy commented :

Hey @tegk,

I just realized that this issue was under the badger repo and not the dgraph one.
This benchmark makes much more sense in that case hehe

We do not have this kind of benchmark at the moment but I do agree it would be an interesting one to perform.
I’ll change the title of this issue and add it to our roadmap, although we could definitely use help with these, so if you’re interested in contributing I’d love to assist you with it.

BTW, even before the benchmark we do suspect badger to be faster than mongoDB, as we’ve increased the performance of our writers substantially over the latest month or so.
A blog post will be coming up soon, but in the meantime you can check out @manishrjain’s twitter thread on it here.

Thanks for the proposal!

tegk commented :

Badger v1.5.3 is approximately taking 78% less time than MongoDB to write 1 million key value pairs for my use case. Waiting for the v2 release and the stream writer to test out again.

recoilme commented :

I may find some benchmarks here (Put/Get/Concurrency):

Currently it supports:

  • pogreb Embedded key-value store for read-heavy workloads written in Go
  • goleveldb LevelDB key/value database in Go.
  • bolt An embedded key/value database for Go.
  • badgerdb Fast key-value DB in Go
  • slowpoke Low-level key/value store in pure Go
  • pudge Fast and simple key/value store written using Go’s standard library

martinmr commented :

Used pogreb-bench to compare badger v1.5 against badger v2:

v1.5

Number of keys: 1000000
Minimum key size: 16, maximum key size: 64
Minimum value size: 128, maximum value size: 512
Concurrency: 2
Running badgerdb benchmark...
Put: 20.065 sec, 49839 ops/sec
Get: 2.813 sec, 355552 ops/sec
Put + Get time: 22.877 sec
File size: 1.94GB

v2

badger 2019/06/20 15:14:11 INFO: All 5 tables opened in 426ms
badger 2019/06/20 15:14:11 INFO: Replaying file id: 4 at offset: 219030900
badger 2019/06/20 15:14:11 INFO: Replay took: 4.796µs
Number of keys: 1000000
Minimum key size: 16, maximum key size: 64
Minimum value size: 128, maximum value size: 512
Concurrency: 2
Running badgerdb benchmark...
Put: 28.065 sec, 35632 ops/sec
Get: 2.860 sec, 349689 ops/sec
Put + Get time: 30.924 sec
File size: 2.41GB

I ran the benchmarks and the results are consistent. v2 puts are slower. I couldn’t use the stream writer because the keys are randomly generated and the stream writer requires that they are in order.

martinmr commented :

@recoilme Hi. I opened https://github.com/recoilme/pogreb-bench/pull/1

It contains some minor fixes, and adds support for go modules, which is being used to use version 2 of the badger API.

recoilme commented :

@martinmr lgtm, thank you!

manishrjain commented :

pogreb bench is alright for very serial writes which don’t know anything about the usage behavior. But it isn’t a great way to actually benchmark Badger.

It doesn’t use anything in Badger, which makes it faster. For e.g., for serial writes, we’d typically use a batch writer. Instead pogreb uses one txn per write, which is slower because no batching of writes can happen.

Similarly, the values can be colocated with the keys, considering how small they are. So, I’d set ValueThreshold to the max size of value (+1 to avoid off by one maybe), so we don’t need to additionally retrieve the value from the value log.

@martinmr : We have read/write benchmarks in badger-bench that you can use instead, which have the benefit of aiming for write throughput and read throughput, etc.

martinmr commented :

@recoilme: I opened another PR (https://github.com/recoilme/pogreb-bench/pull/2) to use the WriteBatch in badger instead of using a separate transaction each time. Performance is greatly improved so you might want to rerun your benchmarks and update the results for badger.

@manishrjain I noticed that and tried to change it to use the StreamWriter but it didn’t work because the keys are not in order. I didn’t know about WriteBatch but I’ve changed pogreb-bench to use it. I just thought pogreb-bench was useful to compare badger to other DBs. I’ll use the benchmarks in badger-bench for more accurate testing.

recoilme commented :

Hello, @martinmr ! Thank you for your PR! Please take a look at https://github.com/recoilme/pogreb-bench/pull/2#issuecomment-504889315