What's the better to transfer snapshot to the new member joining an existing cluster

sumithreddy · February 8, 2023, 2:41am

Looking for a faster way to get new member to sync with the snapshot from the leader.
Was using badger 1.6.2 and just upgraded to 3.2103.2.
I currently iterate the keys using iterator and pipe it to new member.

I came across badger stream which does spin off multiple goroutines to transfer the data to new member. Does it persist progress so it can survive restarts ?
Can I directly transfer the snapshot over network and use it on the new member ?

sumithreddy · February 10, 2023, 8:55pm

@mrjn @MichelDiz any suggestions here

MichelDiz · February 11, 2023, 2:04am

Sorry, I can’t help you with Badger. But maybe @amanmangal or @ibrahim can.

amanmangal · February 11, 2023, 2:31am

Hi @scr

Before we solve the problem, could you elaborate the problem you are trying to solve? Did you try something that you found slow? How slow was it? Do you have any logs that you could share?

Thanks

sumithreddy · February 16, 2023, 9:50pm

Thanks @amanmangal.
I use badger as a persistence storage for my raft protocol.
I currently take a snapshot and iterate through the keys to send them to the member over pipe which is trying to catchup with the existing database, which is slow because its a single goroutine and small buffer size. It took me around to 60 minutes for a 5GB database.

Well that all said its a custom logic on top of the snapshot.

func sendSnapshot(txn *badger.Txn, writer io.Writer) error {
	it := txn.NewIterator(badger.DefaultIteratorOptions)
	defer it.Close()

	u32Buf := make([]byte, 4)
	smallBuf := make([]byte, 32*1024)
	bw := bufio.NewWriterSize(writer, 16384)

	for it.Rewind(); it.Valid(); it.Next() {
		item := it.Item()

		var entryBuf []byte
		err := item.Value(func(value []byte) error {
			if len(value) == 0 {
				return &kvdb.CorruptDbError{Message: "Empty ValueTuple", Key: item.KeyCopy(nil)}
			}

			entry := kvpb.BackupEntry{
				Key:        item.Key(),
				ValueTuple: value,
			}

			if sz := entry.Size(); sz <= len(smallBuf) {
				entryBuf = smallBuf[:sz]
			} else {
				entryBuf = make([]byte, sz)
			}
			n, err := entry.MarshalTo(entryBuf)
			if err != nil {
				return errors.Wrap(err, "Failed to marshal BackupEntry")
			}
			entryBuf = entryBuf[:n]

			return nil
		})
		if err != nil {
			return errors.Wrap(err, "item.Value failed")
		}

		binary.BigEndian.PutUint32(u32Buf, uint32(len(entryBuf)))
		if _, err := bw.Write(u32Buf); err != nil {
			return err
		}
		if _, err = bw.Write(entryBuf); err != nil {
			return err
		}
	}

	return bw.Flush()
}

I am looking to try out anything badger internally supports and can test it. I am looking at really quick way for the new member to catch up on existing db.

sumithreddy · February 23, 2023, 8:20pm

@amanmangal can you suggest what would be the best way for member catchup. Thanks!

amanmangal · February 24, 2023, 4:56pm

Give me a few days to get back to you on this.

sumithreddy · March 21, 2023, 10:12pm

@amanmangal any update on this.
Tried the badger stream API but its not much faster and also it doesn’t survive restarts.

amanmangal · March 22, 2023, 5:45pm

Hi Sumith,

Sorry, I have been busy with the release. There are a few options that you could try. You could use the Stream framework to read the data instead. How long does that take for you? You could take a look at how we do it in Dgraph here dgraph/snapshot.go at main · dgraph-io/dgraph · GitHub.

We also have a change coming up in badger that’d speed things up opt(stream): add option to directly copy over tables from lower levels (#1700) by mangalaman93 · Pull Request #1872 · dgraph-io/badger · GitHub. We plan to use it in Dgraph as shown in the PR here opt(snapshot): use full table copy when streaming the entire data (#7870) by mangalaman93 · Pull Request #8736 · dgraph-io/dgraph · GitHub. This only works when you are trying to transfer the full badger data.

Could you also measure your network throughput using iperf or a similar tool? 1 hour for 5 GB data seems a lot to me. Badger should be able to read the data much faster.

For surviving restarts, this seems a bit complex to me. When would you consider a key to be transferred to the other side, is it when you have sent it or is it when the other side has acknowledged it? You will have to build a protocol that allows you account for acknowledgement if needed. You could store the range of keys that has been sent/acknowledges on to disk once and modify the chooseKey function accordingly. But in that case, you won’t be able to optimization in the PR.

Hope this helps.

Topic		Replies	Views
QUESTION: Best way to create snapshots from badger Badger kind:question	1	479	December 9, 2020
Master-slave replication suggestions Badger kind:question , badger	1	699	July 13, 2020
Question: Badger key version consistency among raft followers Badger	8	904	August 26, 2020
Bager Stream framework usage Badger kind:question	0	144	May 13, 2024
Optimal way to sync DB entries Badger kind:question	1	635	September 15, 2022

What's the better to transfer snapshot to the new member joining an existing cluster

Related topics