Binary backup of badger files in Dgraph

Hi, I have a Dgraph cluster running and I’m trying to manually backup files that are owned by an active Dgraph alpha node. In your documentation it says

Badger is also rsync-friendly because all files are immutable, barring the latest value log which is append-only. So, rsync can be used as rudimentary way to perform a backup

and provides an rsync script that can be run.However, when I try to do it locally on one node to test it I see that accessing those copied files raises errors:

[email protected]:/dgraph# cp -Rv p p2
'p' -> 'p2'
'p/LOCK' -> 'p2/LOCK'
'p/MANIFEST' -> 'p2/MANIFEST'
'p/KEYREGISTRY' -> 'p2/KEYREGISTRY'
'p/00001.mem' -> 'p2/00001.mem'
'p/DISCARD' -> 'p2/DISCARD'
'p/000001.vlog' -> 'p2/000001.vlog'
[email protected]:/dgraph# rm -f p2/LOCK
[email protected]:/dgraph# ls p2/
000001.vlog  00001.mem  DISCARD  KEYREGISTRY  MANIFEST

But when I try to access the database in that folder badger crashes, see info-error.log (4.8 KB)

If I remove the .mem file, then it can open the database, but why ? I’m not sure I understand .mem files. I thought memtables were in memory only and .sst files were the flushed/compacted version of it. Is it the corresponding wal of the memtable in memory ? or what we call the commit log in other LSM storage versions.

Thanks

qq Are you trying to reverse engineer an open source project? oO lol

I come from Cassandra’s world so I know how to cope with its files and LSM storage but here I’m kinda lost even following the documentation. Are you saying I need to look at the source code to understand what’s going on ? :laughing:
I’m trying to follow what is said in the documentation and surprised that it doesn’t seem to be working for some reason :confused:

Certainly. The documentation doesn’t go deep. Only paper is capable of scratching the surface a little.

Can you point out what parts are blocking for you?

What I mean by that is that it looks like you are analyzing the behavior of the DB, not using it as a regular user. You are studying it through its behavior and not reading the code.

Which makes no sense, it’s easier to read the code than to experiment since it’s an open source database for anyone to audit. So reverse engineering doesn’t make sense here.

What I mean by that is that it looks like you are analyzing the behavior of the DB, not using it as a regular user. You are studying it through its behavior and not reading the code.

Which makes no sense, it’s easier to read the code than to experiment since it’s an open source database for anyone to audit. So reverse engineering doesn’t make sense here.
[/quote]

I’m trying to do binary backups and test the restore using what is said in the documentation, so I can’t see how you would call it reverse engineering … I gave the link to the documentation on my initial message but here it is again Get started — and I’m testing the last part of this section. By the way for some reason using sections’ links brings you to the top of the page and not to the section itself, at least on the page get-started

1 Like

I am not sure, why you would want to backup the badger DB, because DGraph comes with a built-in backup feature that dumps the database in a json like format.

I’ve achieved good results by running this query:

mutation { 
    export(input: {
        destination: /path/to/backup/dir/ 
        }) { 
    response { 
        message code 
        } 
    } 
}

you can post this query as json to the admin route http://localhost:8080/admin

Hope this helps

@mrwunderbar666 because a binary backup allows to restore a whole cluster quickly and is an enterprise feature https://dgraph.io/docs/enterprise-features/binary-backups/. So trying to see what can be done without needing an Enterprise license

1 Like