Why should we keep all verisons and how to reduce vlog growing speed

We are using dgraph in production env for tracking user actions. But the vlog file in p directory is growing too fast and fill three 1.8t ssd, i have to try to reduce the ssd usage. Giving that badger clean repeat and old version data by value GC, i change releated gc setting and rebuild but it helps nothing .

So after reading all related article and thread post, here is my solution and question, waiting for your help:

1.Increase GC frequency and set discard param like followings

func badgerGC() {
	ticker := time.NewTicker(1 * time.Minute)
	defer ticker.Stop()
	for range ticker.C {
	again:
		err := badgerDB.RunValueLogGC(0.01)
		if err == nil {
			goto again
		}
	}
}

2.reduce vlog file size

from 1G to 128M
WithValueLogFileSize(128 << 20)

3.the last, why dgraph keep every version

		opt := badger.DefaultOptions(Config.PostingDir).WithValueThreshold(1 << 10 /* 1KB */).
			WithNumVersionsToKeep(math.MaxInt32).WithMaxCacheSize(1 << 30)

what if i set option.NumVersionsToKeep = 1

Thx for you help!!!

Hey @JimWen,

GC is a CPU intensive operation. If you set a low value like 0.01, badger will spend a lot of time trying to move things around. You can set a low value and things would still work fine. The ratio 0.01 means delete this file even if only 1% of the keys are invalid. Badger would then move the valid keys and discard the invalid ones. This also means that you moved 99% of the valid keys that didn’t need to be moved.

A smaller value log file size has a similar effect as a low discard ratio. If you have many small vlog file, the gc will run very often and you would still end up moving keys around. We don’t want to move keys around unless necessary. I think we could have a smaller value log file size by default. We could do some experiments with the file size and see which one gives the best CPU-disk tradeoff.

You would lose data. Dgraph sets number of versions to keep to max and then discards them after a snapshot is done. If you were using only badger, setting number of versions to keep to 1 would’ve worked. But since dgraph has to ensure the data is replicated to other nodes in the cluster, it is necessary that we keep all version and don’t discard them before they’re replicated.

Hope this answers all your question.

2 Likes

Thank you very much, your answer solved a lot of my confusion.

1 Like