DB.Backup/Stream.Backup may duplicate entries in incremental backups if used naively

Moved from GitHub badger/1351

Posted by cannona:

Looking at backup.go I see that both the wrapper function, Backup as well as Stream.Backup accept a since parameter for the purpose of taking an incremental backup. Their first return value is documented as follows:

It returns a timestamp indicating when the entries were dumped which can be passed into a later invocation to generate an incremental dump, of entries that have been added/modified since the last invocation of Stream.Backup().

First of all, this is technically incorrect, as the timestamp is of the last entry in the backup, not of the time the backup was taken. What’s more, reading the code, only entries that are before the since parameter are omitted, but entries that match the since timestamp are not. This means, if I am understanding the code correctly, that simply passing the value returned by the previous call to Backup will duplicate the latest entry/entries in the new Backup.

I recommend either returning maxVersion + 1 from Stream.Backup, or changing line 55:

if item.Version() < since {

to:

if item.Version() <= since {

I would also recommend duplicating much of the documentation from Stream.Backup to DB.Backup, as it’s not immediately obvious that this method returns the same thing that Stream.Backup does. I’m happy to submit a pull request, if that would help.

jarifibrahim commented :

@cannona

First of all, this is technically incorrect, as the timestamp is of the last entry in the backup, not of the time the backup was taken

In badger, we have read timestamp, commit timestamp, discard timestamp, etc which are all just version numbers. In the same sense, backup returns the timestamp which is the version number. Maybe we could fix the documentation so that it clearly states that the timestamp is the version (and not wall-clock time).

if item.Version() <= since {

This looks like a way to fix it. So if they do the following

ts, err := db.Backup(nil, 0) // return ts=2

newts, err := db.Backup(nil, 2)

Then it would work. @cannona feel free to send a PR. We should definitely write a test for it. I can help with that.

I would also recommend duplicating much of the documentation from Stream.Backup to DB.Backup, as it’s not immediately obvious that this method returns the same thing that Stream.Backup does. I’m happy to submit a pull request, if that would help.

Sounds reasonable to me. If you found it confusing, there could be others also who found this confusing. Adding more comments shouldn’t harm anything :slight_smile:

cannona commented :

I fully intend to follow up on this. Just been a bit busy lately.

Thanks.

On 7/3/20, stale[bot] notifications@github.com wrote:

This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/dgraph-io/badger/issues/1351#issuecomment-653406560

Hi @connona,

You suggested a good fix. But we would be having some issues with this fix. I am worried that in case we use any of the approaches suggested by you, maxVersion+1 or if item.Version() <= since { ; the users who are currently using badger with a correct understanding of return value (i.e. they are using the return value after incrementing it themselves) will miss out one of the entry in the backup. So we have a new problem of missing entry than duplicate entry.
Hence, this would be a breaking change. So for now we are considering fixing only the comment.
Thanks.

1 Like