Snapshots/Backup of RocksDB

@pawan, thanks for the summary.

Interestingly snapshot seems to behave differently from checkpoint. It seems that once we start the snapshot, the writes are blocked until the snapshot is done.

Code:

package main

import (
	"fmt"
	"math/rand"
	"strconv"
	"time"

	"github.com/dgraph-io/dgraph/store"
	"github.com/dgraph-io/dgraph/x"
)

const (
	dbPath1 = "/tmp/testdba"
)

func randStr() []byte {
	return []byte(strconv.Itoa(rand.Int()))
}

func main() {
	st, err := store.NewStore(dbPath1)
	x.Check(err)
	fmt.Println("Initial population")
	for i := 0; i < 1000000; i++ {
		st.SetOne(randStr(), randStr())
	}

	go func() {
		fmt.Println("~~~Start updating")
		for i := 0; i < 5000000; i++ {
			if (i % 10000) == 0 {
				fmt.Printf("SetOne %d\n", i)
			}
			if i < 230000 {
				st.SetOne([]byte("aaaaa"), []byte("old"))
			} else {
				st.SetOne([]byte("aaaaa"), []byte("new"))
			}
		}
	}()
	fmt.Println("Start saving snapshot")
	start := time.Now()
	snapshot := st.NewSnapshot()
	fmt.Printf("Done saving snapshot; time elapsed %v\n", time.Since(start))
	time.Sleep(10 * time.Millisecond)

	fmt.Println("Before using snapshot")
	result, err := st.Get([]byte("aaaaa"))
	if result != nil {
		fmt.Printf("Result: [%s]\n", string(result))
	} else {
		fmt.Println("Not found")
	}

	fmt.Println("After using snapshot")
	st.SetSnapshot(snapshot)
	result, err = st.Get([]byte("aaaaa"))
	if result != nil {
		fmt.Printf("Result: [%s]\n", string(result))
	} else {
		fmt.Println("Not found")
	}
}

Output:

Initial population
Start saving snapshot
Done saving snapshot; time elapsed 6.786µs
~~~Start updating
SetOne 0
Before using snapshot
Result: [old]
After using snapshot
Not found

Notice that snapshotting takes a while, so the goroutine has plenty of time to start running. However, it did not start running until the snapshot is done.

Before restoring the snapshot, we query the value and get “old”. Then we restore the snapshot and query again and get “value not found”. None of the writes made it to the snapshot, which is expected since they didn’t start until the snapshot is done.