Duplicated Rows?


(Sovereign313) #1

So, I have some code running that updates a k,v pair. When I retrieve the data it’s properly updated, but if I cat the 000000.vlog file, it shows the historical entries, and the new entry. Is this expected behavior, is my code screwing up and adding multiple kv’s? An example:

,$@753E56AC-1AED-4A8F-81BF-DA6BBE1A1ACA��������{“Key”:“753E56AC-1AED-4A8F-81BF-DA6BBE1A1ACA”,“Tags”:{“check_in_time”:“1546535641”,“cpucount”:“4”,“docker0”:“172.17.0.1/16”,“enp0s3”:“10.10.13.67/24,fe80::a00:27ff:fedf:cb10/64”,“hostname”:“localhost.localdomain”,“lo”:“127.0.0.1/8,::1/128”,“memory”:“11760295936”,“virbr0”:“192.168.124.1/24”}}��V�!badger!txn��������13��,$@753E56AC-1AED-4A8F-81BF-DA6BBE1A1ACA��������{“Key”:“753E56AC-1AED-4A8F-81BF-DA6BBE1A1ACA”,“Tags”:{“check_in_time”:“1546536241”,“cpucount”:“4”,“docker0”:“172.17.0.1/16”,“enp0s3”:“10.10.13.67/24,fe80::a00:27ff:fedf:cb10/64”,“hostname”:“localhost.localdomain”,“lo”:“127.0.0.1/8,::1/128”,“memory”:“11760295936”,“virbr0”:“192.168.124.1/24”}}��d�!badger!txn��������2���,$@753E56AC-1AED-4A8F-81BF-DA6BBE1A1ACA��������{“Key”:“753E56AC-1AED-4A8F-81BF-DA6BBE1A1ACA”,“Tags”:{“check_in_time”:“1546536841”,“cpucount”:“4”,“docker0”:“172.17.0.1/16”,“enp0s3”:“10.10.13.67/24,fe80::a00:27ff:fedf:cb10/64”,“hostname”:“localhost.localdomain”,“lo”:“127.0.0.1/8,::1/128”,“memory”:“11760295936”,“virbr0”:“192.168.124.1/24”}}$�8M�!badger!txn��������3�j��


(Manish R Jain) #2

Each write to value log is an append. Later, value logs can be GCed.


(Sovereign313) #3

Awesome. Thanks, I just wanted to make sure it doesn’t fill my disk, and spend time looking through my code for a bug. Appreciate it… any idea how long before they can get GCed?


(Daniel Mai) #4

Garbage collection is done manually. The recommendation is to do it periodically, ideally during periods of low activity. See the docs on garbage collection: https://github.com/dgraph-io/badger#garbage-collection


(Sovereign313) #5

Awesome. thank you for your help.


(Sovereign313) #6

So, I’ve built an http server for api requests that use badgerDB… I have an endpoint to call GC:
/gc

Which runs this code:

func handleGC(w http.ResponseWriter, r *http.Request) {
        err := db.RunValueLogGC(0.7)
        if err != nil {
                fmt.Fprintf(w, err.Error())
                return
        }

        fmt.Fprintf(w, "success")
        return
}

It responds with:
Value log GC attempt didn’t result in any cleanup.

I have 2 records in the DB with this info:

|---|---|
|cpucount:|4|
|docker0:|172.17.0.1/16|
|ens192:|10.10.36.87/24|
|hostname:|pu-dlrinventorylz-01|
|lo:|127.0.0.1/8|
|memory:|8185937920|
|os:|linux|
|registered_time:|1546877824|
|update_time:|1547047625|

The size of the vlog is 224k with every updated “Set” from the start. Why isn’t /gc cleaning up these entries?


(Daniel Mai) #7

GC does not remove the latest value log. If there’s only one vlog, then GC won’t touch it.