Binary backup of badger files in Dgraph

cscetbon · January 20, 2023, 3:56am

Hi, I have a Dgraph cluster running and I’m trying to manually backup files that are owned by an active Dgraph alpha node. In your documentation it says

Badger is also rsync-friendly because all files are immutable, barring the latest value log which is append-only. So, rsync can be used as rudimentary way to perform a backup

and provides an rsync script that can be run.However, when I try to do it locally on one node to test it I see that accessing those copied files raises errors:

root@dgraph-dgraph-alpha-0:/dgraph# cp -Rv p p2
'p' -> 'p2'
'p/LOCK' -> 'p2/LOCK'
'p/MANIFEST' -> 'p2/MANIFEST'
'p/KEYREGISTRY' -> 'p2/KEYREGISTRY'
'p/00001.mem' -> 'p2/00001.mem'
'p/DISCARD' -> 'p2/DISCARD'
'p/000001.vlog' -> 'p2/000001.vlog'
root@dgraph-dgraph-alpha-0:/dgraph# rm -f p2/LOCK
root@dgraph-dgraph-alpha-0:/dgraph# ls p2/
000001.vlog  00001.mem  DISCARD  KEYREGISTRY  MANIFEST

But when I try to access the database in that folder badger crashes, see info-error.log (4.8 KB)

If I remove the .mem file, then it can open the database, but why ? I’m not sure I understand .mem files. I thought memtables were in memory only and .sst files were the flushed/compacted version of it. Is it the corresponding wal of the memtable in memory ? or what we call the commit log in other LSM storage versions.

Thanks

MichelDiz · January 20, 2023, 5:52am

qq Are you trying to reverse engineer an open source project? oO lol

cscetbon · January 20, 2023, 2:39pm

I come from Cassandra’s world so I know how to cope with its files and LSM storage but here I’m kinda lost even following the documentation. Are you saying I need to look at the source code to understand what’s going on ?
I’m trying to follow what is said in the documentation and surprised that it doesn’t seem to be working for some reason

MichelDiz · January 20, 2023, 5:59pm

Certainly. The documentation doesn’t go deep. Only paper is capable of scratching the surface a little.

Can you point out what parts are blocking for you?

What I mean by that is that it looks like you are analyzing the behavior of the DB, not using it as a regular user. You are studying it through its behavior and not reading the code.

Which makes no sense, it’s easier to read the code than to experiment since it’s an open source database for anyone to audit. So reverse engineering doesn’t make sense here.

cscetbon · January 22, 2023, 6:04pm

MichelDiz:

qq Are you trying to reverse engineer an open source project?

What I mean by that is that it looks like you are analyzing the behavior of the DB, not using it as a regular user. You are studying it through its behavior and not reading the code.

Which makes no sense, it’s easier to read the code than to experiment since it’s an open source database for anyone to audit. So reverse engineering doesn’t make sense here.
[/quote]

I’m trying to do binary backups and test the restore using what is said in the documentation, so I can’t see how you would call it reverse engineering … I gave the link to the documentation on my initial message but here it is again Get started — and I’m testing the last part of this section. By the way for some reason using sections’ links brings you to the top of the page and not to the section itself, at least on the page get-started

mrwunderbar666 · January 30, 2023, 8:52am

I am not sure, why you would want to backup the badger DB, because DGraph comes with a built-in backup feature that dumps the database in a json like format.

I’ve achieved good results by running this query:

mutation { 
    export(input: {
        destination: /path/to/backup/dir/ 
        }) { 
    response { 
        message code 
        } 
    } 
}

you can post this query as json to the admin route http://localhost:8080/admin

Hope this helps

cscetbon · January 31, 2023, 4:06am

@mrwunderbar666 because a binary backup allows to restore a whole cluster quickly and is an enterprise feature https://dgraph.io/docs/enterprise-features/binary-backups/. So trying to see what can be done without needing an Enterprise license

Topic		Replies	Views
Start from scratch without exporting to RDF(Badger) Dgraph badger , area:documentation	1	521	March 18, 2024
Badger LSMT options on huge dataset Dgraph kind:question , badger , dgraph	0	403	May 2, 2023
Releasing BadgerDB v2.0 - Dgraph Blog Blog	3	810	February 4, 2020
Noob advancement into DGraph learning Users	2	643	November 7, 2017
How to do badgerdb sharding or badgerdb in replica set mode? Badger kind:question	5	1210	March 24, 2022

Binary backup of badger files in Dgraph

Related topics