Hello,
we noticed a problem using incremental backups. Since Dgraph v24 we have made some tests with vector predicates where we have found a problem after restore (live) our backups.
The following behaviour could be observed:
- New Dgraph v24.0.4 instance created with initial setup of a schema containing a node type with two predicates, name and vector
- Set data to the name and vector predicate for some nodes and created a initial backup (full backup)
- After the first backup we changed / deleted some vector predicates and created an incremental backup afterward
- Based on this incremental backup we restored our data on a new clean Dgraph v24.0.4 system, where the changes on the vector predicates were inconsistent (all changes / deletions made before incremental backup disappeared and the state of the full backup for these vector predicates were restored). However, changes on non-vector predicates (like ‘name’) were restored as expected (also after the incremental backup).
- If we run full backups and restore this data, all changes are restored as expcted.
Could anybody confirm this behaviour?
Maybe issued by the mentioned problem with incremental backups and restores we have gotten an error when we tried to delete and write (renew) some vectors (some vectors could be deleted and renewed without problems).
However the database crashed after this event with the following log:
panic: runtime error: makeslice: len out of range
goroutine 278 [running]:
github.com/dgraph-io/dgraph/v24/tok/hnsw.decodeUint64MatrixUnsafe({0xc0001a2d00, 0x84e, 0x3c5b?}, 0xc00b44a8e0)
/home/runner/work/dgraph/dgraph/tok/hnsw/helper.go:482 +0x45
github.com/dgraph-io/dgraph/v24/tok/hnsw.populateEdgeDataFromKeyWithCacheType({0xc00b4f65e0?, 0xc00b44a7a8?}, 0x3c5b?, {0x270eb40?, 0xc000015c20?}, 0xc00b44a8e0)
/home/runner/work/dgraph/dgraph/tok/hnsw/helper.go:326 +0xa5
github.com/dgraph-io/dgraph/v24/tok/hnsw.(*persistentHNSW[...]).fillNeighborEdges(0xc000015d58, 0x3c5b?, {0x270eb40, 0xc000015c20}, 0xc00b44a8e0)
/home/runner/work/dgraph/dgraph/tok/hnsw/persistent_hnsw.go:150 +0xa9
github.com/dgraph-io/dgraph/v24/tok/hnsw.(*persistentHNSW[...]).searchPersistentLayer(0x2736860, {0x270eb40, 0xc000015c20}, 0x0, 0x3c5b, {0xc00ba12000, 0x600, 0x600}, {0xc00b4e0800, 0x600, ...}, ...)
/home/runner/work/dgraph/dgraph/tok/hnsw/persistent_hnsw.go:208 +0x647
github.com/dgraph-io/dgraph/v24/tok/hnsw.(*persistentHNSW[...]).insertHelper(0x2736860, {0x2712ab8, 0xc00b4f2300}, 0xc000015c20, 0x56ff62, {0xc00b4e0800, 0x600, 0x600})
/home/runner/work/dgraph/dgraph/tok/hnsw/persistent_hnsw.go:462 +0x2a7
github.com/dgraph-io/dgraph/v24/tok/hnsw.(*persistentHNSW[...]).Insert(0xc00b48fb00?, {0x2712ab8?, 0xc00b4f2300?}, {0x270eb40?, 0xc000015c20?}, 0xc00b44ac01?, {0xc00b4e0800, 0x600, 0x600})
/home/runner/work/dgraph/dgraph/tok/hnsw/persistent_hnsw.go:422 +0x5f
github.com/dgraph-io/dgraph/v24/posting.(*Txn).addIndexMutations(0xc00b48cd00, {0x2712ab8, 0xc00b4f2300}, 0xc00b44af50)
/home/runner/work/dgraph/dgraph/posting/index.go:178 +0x5e7
github.com/dgraph-io/dgraph/v24/posting.(*List).AddMutationWithIndex(0xc0004ffec0, {0x2712ab8, 0xc00b4f2300}, 0xc00b4da090, 0xc00b48cd00)
/home/runner/work/dgraph/dgraph/posting/index.go:604 +0x585
github.com/dgraph-io/dgraph/v24/worker.runMutation({0x2712a48?, 0x38721c0?}, 0xc00b4da090, 0xc00b48cd00)
/home/runner/work/dgraph/dgraph/worker/mutation.go:125 +0x558
github.com/dgraph-io/dgraph/v24/worker.(*node).applyMutations.func3({0xc00b498780, 0x9, 0xc00b4c76b0?})
/home/runner/work/dgraph/dgraph/worker/draft.go:520 +0x167
github.com/dgraph-io/dgraph/v24/worker.(*node).applyMutations(0x126499dea004f?, {0x2712a48, 0x38721c0}, 0xc00b498680)
/home/runner/work/dgraph/dgraph/worker/draft.go:539 +0x10b5
github.com/dgraph-io/dgraph/v24/worker.(*node).applyCommitted(0xc0003e7480, 0xc00b498680, 0x126499dea004f)
/home/runner/work/dgraph/dgraph/worker/draft.go:584 +0xe32
github.com/dgraph-io/dgraph/v24/worker.(*node).processApplyCh.func1({0xc00b4d6000, 0x3, 0xc00b62e670?})
/home/runner/work/dgraph/dgraph/worker/draft.go:784 +0x57f
github.com/dgraph-io/dgraph/v24/worker.(*node).processApplyCh(0xc0003e7480)
/home/runner/work/dgraph/dgraph/worker/draft.go:825 +0x212
created by github.com/dgraph-io/dgraph/v24/worker.(*node).InitAndStartNode in goroutine 250
/home/runner/work/dgraph/dgraph/worker/draft.go:1884 +0x59e
The affected Dgraph instance was broken afterward and could not be restarted after this event.
Thanks to the development team fixing this issue and providing some recommendations how could we get our system back on track without dataloss (export as RDF and import into a new system?).
cheers
Michael