Using badger for multiple version store

Hey, badger users (and maintainers):

I’m consider using badger for store key with multiple versions of values. It seems there are at least 2 approaches for this:

  1. make composite key of (key, version), maybe using “Uint64Max - version” for version to order inverse-chronological order. I can use the out-of-box badger DB, and badger will have no knowledge of my versions.

  2. use ManagedDB, which seems to allow users to directly specify version, and this will (hopefully) seamlessly with badger’s inner wiring.

I prefer (2), however, just want to get feedback:

  1. is this the right understanding?
  2. is there anything I need to pay special attention to?

Thanks.

I agree that option 2 is the best one for this. The one thing you must be aware is that you must manage the transaction timestamps yourself. Dgraph uses managed DB (opt. 2) and the Zero node handles the Ts accounting. So although you have more freedom you also have to do a bit more work. Maybe someone else has another caveat but that’s the one that comes to mind. Using managed DB also turns off a few convenience functions because they don’t make sense without implicit Ts.

I would suggest looking at our debug command, it has a lot of good hints about keys and DB.

Thanks, Gus for the prompt response!

what’s Zero node? maybe I didn’t look into Dgraph code yet, but it would be great if you can give me a pointer.

also what’s the debug command? you mean the /debug http endpoint or something else?

Also as I’m hacking the badger code, I notice in current WriteBatch implementation, the version is associated with each underlying Txn object (the commitTs), and for that reason, I cannot including key / value pairs with different version inside a single WriteBatch.

  1. Is that understanding correct?
  2. Of course this is not ideal, as blind write would write different key / value pairs with multiple versions.
    Do you have some thoughts on how to fix this?

thanks.

The Dgraph Zero node is a server that handles membership and timestamp (transaction) information.

The code for Zero is at: dgraph/dgraph/cmd/zero at master · dgraph-io/dgraph · GitHub

The Dgraph debug command is used with dgraph debug. It gives you information about the Dgraph posting directory (p-dir), where the Badger db is stored. Code is here: dgraph/dgraph/cmd/debug at master · dgraph-io/dgraph · GitHub

You are correct, a batch is associated with a specific version. I think if you wanted to write different version data you will need to look at the stream API. Check the backup code here: badger/backup.go at master · dgraph-io/badger · GitHub

When you do a backup you need all and any versions, so it might be closer to what you want to do. I personally prefer using the stream API for any batch-type operations. That code came from our need in Dgraph to do these types of operations.

Have a look at posting/writer.go in dgraph codebase. It sets a version per key-value by using a new txn per update and using callbacks.