Hi Guys,
We are evaluating if we can use badgerDB here at Mixpanel to replace our file based manifests. I have couple of questions regarding db.RunValueLogGC.
Is it correct understanding that currently valueLogGC is not triggered automatically in any way?
Is it intentional to leave it upto users to do db.RunValueLogGC periodically? Is it possible to automate it.
If it is not intentional , I can help put PR to automate it. Let me know your thoughts on it.
Harshal
I do believe that is correct (as far as my code grepping goes). @ibrahim might be able to chime in a bit more
yeah, for more context , we have a system where we add millions of KVs to badgerDB (key contains timestamp of when they were added) and every 30 minutes, we delete all the KVs from badgerDB that are atleast 10 days old. (we want to keep track of those KVs for 10 days)
We have never run db.RunValueLogGC , does that mean, we have never actually deleted any values yet? From what I could grep , I see db.RunValueLogGC is the only thing that truly deletes values from badgerDB. Is that correct?
Lately iterating on prefix periodically (every 30 minutes) has been using lot of CPU (see attached image of a profile) . I suspect it might be because we never ran db.RunValueLogGC.
Or if not that , does deleting lot of KVs while iterating through them can cause this?
@naman, thoughts?
Hey @harshalchaudhari, correct that vlog GC does not happens automatically. One need to trigger that via db.RunValueLogGC
. That’s how dgraph does it as well https://github.com/dgraph-io/dgraph/blob/832ebb9766f1732186eaecbf3792aff9e3766c6c/x/x.go#L1089-L1127.
Only the values bigger than vlogThreshold
goes into the value log. So for the KVs that lie entirely in LSM tree(with values < vlogThreshold) would be eventually be cleaned up via compactions. But the values that went into value log will not be cleaned and will use the disk space, unless you run the value log GC.
@ibrahim any thoughts on having a way for automatic vlog GC? I believe that the vlog GC was left up to the users because it was earlier used as write-ahead-log that a user may want to keep. I think that will be good to have functionality with configurable parameters.