Badger backup takes long time to backup 3G store

I have badger db of size around 3G and I want to take its backup. I tried to use API as well as terminal command but it took so long time. At the end I had to stop process after 20min. And this is just a test store, we have huge stores than 3G. I am sharing output of badger info. I am using master branch of badger. I want to know if there is any way to make backup fast.

# badger info --dir /store/
Listening for /debug HTTP requests at port: 8080

[     2019-04-09T13:41:28Z] MANIFEST      24 kB MA
[           5 days earlier] 000303.sst    19 MB L1
[       58 minutes earlier] 000474.sst    70 MB L1
[       58 minutes earlier] 000475.sst    70 MB L1
[       58 minutes earlier] 000476.sst    68 MB L1
[           5 days earlier] 000307.sst    37 MB L2
[           5 days earlier] 000312.sst    70 MB L2
[           5 days earlier] 000314.sst    73 MB L2
[           5 days earlier] 000316.sst    13 MB L2
[           4 days earlier] 000335.sst    70 MB L2
[           4 days earlier] 000342.sst    70 MB L2
[           4 days earlier] 000356.sst    70 MB L2
[           4 days earlier] 000357.sst    27 MB L2
[           4 days earlier] 000358.sst    70 MB L2
[           4 days earlier] 000359.sst    19 MB L2
[           4 days earlier] 000375.sst    11 MB L2
[           4 days earlier] 000401.sst   2.1 MB L2
[           4 days earlier] 000413.sst    70 MB L2
[           4 days earlier] 000417.sst    70 MB L2
[           4 days earlier] 000418.sst    14 MB L2
[           4 days earlier] 000429.sst    70 MB L2
[           4 days earlier] 000430.sst    70 MB L2
[           4 days earlier] 000431.sst   3.8 MB L2
[           4 days earlier] 000443.sst    70 MB L2
[           4 days earlier] 000469.sst    70 MB L2
[           4 days earlier] 000470.sst    70 MB L2
[           4 days earlier] 000471.sst    70 MB L2
[           4 days earlier] 000472.sst    32 MB L2
[           4 days earlier] 000128.vlog   67 MB VL
[           4 days earlier] 000132.vlog   67 MB VL
[           4 days earlier] 000133.vlog   67 MB VL
[           4 days earlier] 000135.vlog   67 MB VL
[           4 days earlier] 000136.vlog   67 MB VL
[           4 days earlier] 000138.vlog   67 MB VL
[           4 days earlier] 000154.vlog   67 MB VL
[           4 days earlier] 000164.vlog   67 MB VL
[           4 days earlier] 000167.vlog   67 MB VL
[           4 days earlier] 000169.vlog   67 MB VL
[           4 days earlier] 000171.vlog   67 MB VL
[           4 days earlier] 000172.vlog   67 MB VL
[           4 days earlier] 000177.vlog   67 MB VL
[           4 days earlier] 000181.vlog   67 MB VL
[           4 days earlier] 000182.vlog   67 MB VL
[           4 days earlier] 000184.vlog   67 MB VL
[           4 days earlier] 000185.vlog   67 MB VL
[           4 days earlier] 000186.vlog   67 MB VL
[           4 days earlier] 000187.vlog   67 MB VL
[           4 days earlier] 000188.vlog   67 MB VL
[           4 days earlier] 000189.vlog   67 MB VL
[           4 days earlier] 000190.vlog   67 MB VL
[           4 days earlier] 000191.vlog   67 MB VL
[           4 days earlier] 000192.vlog   67 MB VL
[         15 minutes later] 000193.vlog   30 MB VL

[Summary]
Level 0 size:          0 B
Level 1 size:       227 MB
Level 2 size:       1.1 GB
Total index size:   1.4 GB
Value log size:     1.6 GB

Abnormalities: None.
0 extra files.
0 missing files.
0 empty files.
0 truncated manifests.

@rohanil If you are using API, you can try changing NumGo, which can increase parallelism while iteration.
I would like to know few things:

  • What is the Badger Version you are using?
  • System configuration where you are running Backup?
  • What is the bottleneck(CPU, Memory, Disk)?

I am able to run backup successfully on my machine(4 Core, 8 GB RAM).

@ashishgoswami

  • I am using badger on master branch
  • I am running it in docker container. It has linux system with 3 cores, 3GB RAM
  • The bottleneck is memory. I think it is getting out of memory. So container hangs and stops responding.

I tried to change NumGo parameter. But either increasing (32) or decreasing (4) it didn’t help.
I tested it on other test DB (storeB) of same size as the DB in the question and it had not any problems. For storeB, it logs “DB.Backup Time elapsed: 29s, bytes sent: 532 MB, speed: 18 MB/sec”. Does that mean the backup of storeB is of 532MB size? For the store in Question, it logs “DB.Backup Time elapsed: 01m08s, bytes sent: 681 MB, speed: 10 MB/sec” before container hangs. I don’t know if it is of any help.

I am attaching also a part of heap profile for a backup process of the DB in question.

What are you writing the backup to? Looks like you’re writing to something in memory (bytes.Buffer) and that is taking up all the RAM.

Well yes, I am writing into bytes.Buffer. I want to copy a badger DB from one server to other so I am taking backup as bytes.Buffer and sending it over http and restoring it at client server. Does badger have any tool to achieve this or do you have any suggestion to achieve this which could use less RAM? Thanks.

@rohanil Is it going out of memory if you are writing backup to a file?

You can take backup using badger backup tool on first server, transfer backup file to destination server and restore badger db using restore tool there. This can be one simple approach.

@ashishgoswami yes it goes out of memory even if I write backup to a file. In this case, it goes out of memory quicker than writing it to bytes.Buffer.

badger backup tool takes huge time. It doesn’t go out of memory though. I ran this command badger backup -f badger.bak --dir /badger/db/ and after 25 minutes, the process was still running. The size of badger.bak file was just 200MB. With API, I could get 600MB backup in a minute. Well I assume that is because of parallel processing in API.

backup tool also processes db data in parallel. Default no of goroutine for processing is 16.

Most likely the memory issue is in your code. The heap profile indicates that. To remove out Badger’s memory usage from the equation, you can send your output to /dev/null (or just not write to bytes.Buffer) and see how the memory usage holds.

writing to /dev/null worked well in terms of memory usage. Somehow it took 17 mins for first attempt and later attempts took 4-5 mins each time. I don’t know the reason. The memory usage looked fine. It jumped high at the beginning; maybe because of opening of the DB but later came down to nearly about 300MB. Total number of keys in DB are around 15 million.

Then I tried to write to file using bufio.Writer and it worked too. And it took less time than writing to /dev/null. It was surprising to me because yesterday I saw it was going out of memory so quickly while writing to file. Then I found out that yesterday I was writing to a file on mounted volume on the docker container and today I am writing to a file on non-mounted part of container. So writing backup to a file is working without memory problems now.

Now I am trying to restore that backup from backup file size of 1.5G. And again having out of memory problem. This time I am restoring it to a directory which is on non-mounted part of container. Maybe I need to read big file efficiently. My code is as below:

func restore() {
	opts := badger.DefaultOptions
	path := "/db-restored"
	opts.Dir = path
	opts.ValueDir = path
	opts.ValueLogFileSize = 1 << 26
	opts.ValueLogLoadingMode = options.FileIO
	opts.TableLoadingMode = options.MemoryMap
	kv, _ := badger.Open(opts)

	f, err := os.Open("/badger.bak")
	if err != nil {
		log.Println(err)
	}
	defer f.Close()
	bufferedReader := bufio.NewReader(f)
	kv.Load(bufferedReader)
}

The heap profile shows memory usage of 700MB but runtime.MemStats.HeapAlloc shows around 1.5G usage just before going out of memory.

@rohanil currently we don’t have any throttle on data to be read in memory while restoring. This can be the reason for out of memory. I am working on to fix this. Please check this Throttle the number of pending txns in WriteBatch · Issue #760 · dgraph-io/badger · GitHub

@rohanil Above issue is fixed and changes are in latest master.