For obvious reasons we want to be able to run regular backups of our Dgraph database, to prevent data loss.
To make sure this would work as expected, I took an export of our production database and tried to import in a local Docker instance but I saw a lot of data loss.
Below are the steps I performed - unfortunately I can’t provide the actual data as it is in production and therefore sensitive.
1. While SSH’d onto the production box - run:
curl localhost:8280/admin/export
// output - {"code": "Success", "message": "Export completed."}
2. SCP the created directory (in this case dgraph.r40015.u0415.0938
) to my local dgraph
directory
Before moving on - this is my alpha
config in my docker-compose
file:
alpha:
image: dgraph/dgraph:latest
container_name: dgraph_alpha
volumes:
- type: bind
source: /Users/{user}/dgraph
target: /dgraph
volume:
nocopy: true
ports:
- 8280:8280
- 9280:9280
restart: on-failure
command: dgraph alpha --port_offset 200 --my=alpha:7280 --lru_mb=2048 --zero=zero:5280
So I’ve mounted my local dgraph directory so I’m able to see the exported directory in my container.
3. I then run the following command to import the export into my local Docker container.
docker exec -it dgraph_alpha dgraph live -r /dgraph/dgraph.r40015.u0415.0938/g01.rdf.gz --zero=zero:5280 --dgraph=localhost:9280 -c 1
This produces the following output:
I0415 09:45:45.054696 25 init.go:88]
Dgraph version : v1.0.14
Commit SHA-1 : 26cb2f94
Commit timestamp : 2019-04-12 13:21:56 -0700
Branch : HEAD
Go version : go1.11.5
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
Creating temp client directory at /tmp/x789817426
badger 2019/04/15 09:45:45 INFO: All 0 tables opened in 0s
Processing /dgraph/dgraph.r40015.u0415.0938/g01.rdf.gz
Number of TXs run : 2
Number of RDFs processed : 1705
Time spent : 856.866103ms
RDFs processed per second : 1705
badger 2019/04/15 09:45:45 INFO: Storing value log head: {Fid:0 Len:43 Offset:9408}
badger 2019/04/15 09:45:45 INFO: Force compaction on level 0 done
If I then go to onto the UI and run the following query:
{
q(func: has(_predicate_)) {
count(uid)
}
}
- Locally this produces a count of 142
- On the prod server, it also produces a count of 142
However, if I do the following query:
{
q(func: has(username)) {
expand(_all_)
}
}
Locally I get 73 results with the following structure:
{
"username": "xxxx"
}
However, on production I still get 73 results but with the following structure:
{
"active": true/false,
"type": "xxx",
"username": "xxxx",
"email": "xxx@xxx.xxx",
"mobile": "+xxxxx"
}
So it looks like there is some data loss for the attribute values.
I can also run the following query locally but I still only get the username
returned:
{
q(func: has(email)) {
expand(_all_)
}
}