Badger Documentation limited or Feature removed?

Hi,

I was trying to explore a badger documentation for the reverse iterator option to fetch a value in reverse order so that the last inserted is first fetched, but I could not find it any way:

I tried searching:

Github: https://github.com/dgraph-io/badger/
Badger Docs: https://dgraph.io/docs/badger/get-started

Nowhere there it mentions reverse iterator - I am surprised to understand if the feature has been removed from badger?

Thanks

You can have a look at examples in txn_test.go
Here is an example code:

func TestReverse(t *testing.T) {
	dir, err := ioutil.TempDir("", "badger-test")
	require.NoError(t, err)
	defer os.RemoveAll(dir)

	ops := getTestOptions(dir).WithNumVersionsToKeep(math.MaxInt32)
	db, err := Open(ops)
	require.NoError(t, err)
	for i := 0; i < 10; i++ {
		require.NoError(t, db.Update(func(txn *Txn) error {
			require.NoError(t, txn.Set([]byte("key"), []byte(fmt.Sprintf("%05d", i))))
			return nil
		}))
	}
	fmt.Println("Reverse")
	require.NoError(t, db.View(func(txn *Txn) error {
		iopts := DefaultIteratorOptions
		iopts.Reverse = true
		iopts.AllVersions = true
		iopts.Prefix = []byte("key")
		it := txn.NewIterator(iopts)
		defer it.Close()
		for it.Rewind(); it.ValidForPrefix(iopts.Prefix); it.Next() {
			x, _ := it.Item().ValueCopy(nil)
			fmt.Printf("%+v value:%s\n", it.Item(), x)
		}
		return nil
	}))
	fmt.Println("No reverse")
	require.NoError(t, db.View(func(txn *Txn) error {
		iopts := DefaultIteratorOptions
		iopts.Reverse = false
		iopts.AllVersions = true
		iopts.Prefix = []byte("key")
		it := txn.NewIterator(iopts)
		defer it.Close()
		for it.Rewind(); it.ValidForPrefix(iopts.Prefix); it.Next() {
			x, _ := it.Item().ValueCopy(nil)
			fmt.Printf("%+v value:%s\n", it.Item(), x)
		}
		return nil
	}))
	require.NoError(t, db.Close())
}

It outputs:

Reverse
key="key", version=1, meta=40 value:00000
key="key", version=2, meta=40 value:00001
key="key", version=3, meta=40 value:00002
key="key", version=4, meta=40 value:00003
key="key", version=5, meta=40 value:00004
key="key", version=6, meta=40 value:00005
key="key", version=7, meta=40 value:00006
key="key", version=8, meta=40 value:00007
key="key", version=9, meta=40 value:00008
key="key", version=10, meta=40 value:00009
No reverse
key="key", version=10, meta=40 value:00009
key="key", version=9, meta=40 value:00008
key="key", version=8, meta=40 value:00007
key="key", version=7, meta=40 value:00006
key="key", version=6, meta=40 value:00005
key="key", version=5, meta=40 value:00004
key="key", version=4, meta=40 value:00003
key="key", version=3, meta=40 value:00002
key="key", version=2, meta=40 value:00001
key="key", version=1, meta=40 value:00000

ping @docs

@Naman - No, I dont’ need reverse for value, but for KEY.

I want to store some data in key with unique date and time and I want to list all value based on date on reverse order.

Date-Time is mentioned on key. in the format “2006-01-02-12-06-10”

To clarify, the example that I posted was to demo iterating over the same key with different versions. Be default the only the latest one is fetched. If you set iopts.AllVersions = true, then you get all versions (latest -> oldest). If you also set iopts.Reverse = true, you get all versions (oldest -> latest). This has nothing to do with Value. It is all to do with Version.

If your whole DB contains Keys in format “YYYY-MM-DD-hh-mm-ss”, then you can simply iterate over the DB by something like this. This would give you all your keys in ascending order of your Keys.

for itr.Rewind(); itr.Valid(); itr.Next() {
  // Do work
}

When you mention “version” - Do you mean, Each time the same key is updated, they are saved separately? If so, I don’t want this - I do not want to store multiple value (version based) - It is just going to take unlawful storage, How can I avoid it?

You need not. :slight_smile:
NumVersionsToKeep is 1 by default in badger. I set it to math.MaxInt32 for the sake of example.

Hi, It’s confusing to see the answer from the core team without sharing much detail.

You didn’t answer -

  • Each time a key is updated, are they stored separately along with the previous value?

If so, I don’t want to store the previous value. I want to overwrite it, right away. Is there any such option with badger?

  • Is there any performance difference in overwriting?
  • how long does the old value remains inside my database file, If I don’t overwrite or if badger has no such option overwrite?
  • Are they cleaned up using an auto garbage collector? or Do I need to run the garbage collector myself to fresh it?

Thanks

Sorry for the lack of clarity.

Badger, a key-value database is inspired from RocksDB and uses LSM tree as the underlying data structure. LSM tree is optimized for heavy-write workloads which is performed by performing sequential writes. This means that a new entry(key-value) would be inserted into the Tree even if the key for it already exists.
For cleaning up of tree (removing old/stale data), it has to rely upon Compaction. This compaction process runs periodically, identifies the stale data and cleans up the tree.

Badger also provides a mechanism to keep multiple versions of same key (by setting NumVersionsToKeep). This is useful for various purposes. Dgraph uses this property of Badger.

For simplicity let’s assume you want to keep a single version of a Key, i.e., if you update a key with a new value, you don’t care about the older value. You want that older value to be cleaned up.

Each time you set a key-value pair (say key1: value2), a new entry(key-value pair) is inserted into the DB. As you have set NumVersionsToKeep=1 (its the default in badger), the older entry(key1: value1) for the same key is marked as stale. This value is no longer visible to you and would eventually be cleaned up by the compaction process. Now if you do txn.Get(key1) you would get value2.

The performance difference is only equivalent to inserting a new Key Value. While the memory would eventually be cleaned. But if your values are very large (>1KB) and you are frequently updating that, then garbage collection would take some time.

No time constraint as such. But it would eventually be cleaned up maintaining your DB in a healthy state.

Automatic garbage collector (compaction + value log GC) will clean that up for you. You don’t need to run it yourself.

Please feel free to ask further clarifications.

Now, this should go to either @badger @docs or FAQ .

That is some information.

I have just one thing that shook me little:

1KB is considered as very large value. Your post that shares above information is larger than 1 KB, is it very large value?

I have article based community and with badger requirement and my average value size would be 2 KB - 10 KB or more ( in some case ) ~ Are they really considered very large values?

Thank you for clarifying doubts.

Thanks

The default value Is set to 1KB. You can set the ValueThreshold to higher value as per your needs.

There are no benchmarks regarding this Threshold. It depends upon data access etc.

With higher value threshold, LSM tree would be bigger for same amount of data and each table will contain less entries. So, search speed would be slightly slower. I suggest you to not deviate too much from the DefaultValue.