Online restore error

What version of Dgraph are you using?

dgraph version 22.0.1

Command used

curl -H 'content-type: application/graphql' -XPOST localhost:8080/admin -d 'mutation { restore(input:{ location: "s3://s3.us-east-1.amazonaws.com/systemdb-prod/v22.0.1/nightly/"}){ message code } }'

Command response:
{"data":{"restore":{"message":"Restore operation started.","code":"Success"}},"extensions":{"tracing":{"version":1,"startTime":"2022-12-03T03:44:11.755148468Z","endTime":"2022-12-03T03:44:12.127584021Z","duration":372435569}}}--- Online restore complete.

Setup:
6 node cluster (3xA, 3xZ) that has access to an s3 bucket.

The online restore process is started, then the alpha fails at the end of the process with the following output below. Lastly, the alphas get stuck in a crash loop.

I1203 03:24:05.048612      21 run.go:754] x.Config: {PortOffset:0 Limit:normalize-node=10000; query-timeout=0ms; mutations-nquad=1000000; query-edge=1000000; max-pending-queries=10000; disallow-drop=false; txn-abort-after=5m; mutations=allow; max-retries=-1 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:extensions=true; poll-interval=1s; lambda-url=; introspection=true; debug=false GraphQLDebug:false}
I1203 03:24:05.048659      21 run.go:755] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:systemdb-alpha-0.systemdb-alpha-headless.systemdb-prod.svc.cluster.local:7080 ZeroAddr:[systemdb-zero-0.systemdb-zero-headless.systemdb-prod.svc.cluster.local:5080 systemdb-zero-1.systemdb-zero-headless.systemdb-prod.svc.cluster.local:5080 systemdb-zero-2.systemdb-zero-headless.systemdb-prod.svc.cluster.local:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:snapshot-after-duration=30m; pending-proposals=256; idx=; group=; learner=false; snapshot-after-entries=10000 Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc000416820 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:1395864371 IndexCacheSize:751619276 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[{Lower:0.0.0.0 Upper:255.255.255.255}] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2022-12-03 03:24:04.469775855 +0000 UTC m=+0.206174361 Ludicrous:concurrency=2000; enabled=false LudicrousEnabled:false Security:whitelist=0.0.0.0/0; token= EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I1203 03:24:05.048735      21 run.go:756] worker.Config: {PostingDir:/dgraph/p1 WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:2048 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I1203 03:24:05.048887      21 log.go:295] Found file: 1 First Index: 1
I1203 03:24:05.048941      21 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 21
I1203 03:24:05.048955      21 server_state.go:140] Opening postings BadgerDB with options: {Dir:/dgraph/p1 ValueDir:/dgraph/p1 SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x3111300 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:1395864371 IndexCacheSize:751619276 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I1203 03:24:05.643442      21 log.go:34] All 1480 tables opened in 246ms
I1203 03:24:05.645824      21 log.go:34] Discard stats nextEmptySlot: 0
I1203 03:24:05.646750      21 log.go:34] Set nextTxnTs to 16994037
I1203 03:24:05.650440      21 log.go:34] Deleting empty file: /dgraph/p1/000004.vlog
I1203 03:24:05.651294      21 groups.go:100] Current Raft Id: 0x1
I1203 03:24:05.651311      21 groups.go:116] Sending member request to Zero: id:1 addr:"systemdb-alpha-0.systemdb-alpha-headless.systemdb-prod.svc.cluster.local:7080" 
I1203 03:24:05.651603      21 worker.go:114] Worker listening at address: [::]:7080
I1203 03:24:05.652537      21 run.go:567] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I1203 03:24:05.652554      21 run.go:568] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
E1203 03:24:05.652553      21 groups.go:1224] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I1203 03:24:05.652564      21 run.go:595] gRPC server started.  Listening on port 9080
I1203 03:24:05.652860      21 run.go:596] HTTP server started.  Listening on port 8080
I1203 03:24:05.751814      21 pool.go:164] CONN: Connecting to systemdb-zero-0.systemdb-zero-headless.systemdb-prod.svc.cluster.local:5080
I1203 03:24:05.759907      21 groups.go:135] Connected to group zero. Assigned group: 0
I1203 03:24:05.759924      21 groups.go:137] Raft Id after connection to Zero: 0x1
I1203 03:24:05.759960      21 pool.go:164] CONN: Connecting to systemdb-alpha-2.systemdb-alpha-headless.systemdb-prod.svc.cluster.local:7080
I1203 03:24:05.759975      21 pool.go:164] CONN: Connecting to systemdb-alpha-1.systemdb-alpha-headless.systemdb-prod.svc.cluster.local:7080
I1203 03:24:05.760034      21 pool.go:164] CONN: Connecting to systemdb-zero-2.systemdb-zero-headless.systemdb-prod.svc.cluster.local:5080
I1203 03:24:05.760055      21 pool.go:164] CONN: Connecting to systemdb-zero-1.systemdb-zero-headless.systemdb-prod.svc.cluster.local:5080
I1203 03:24:05.760084      21 draft.go:270] Node ID: 0x1 with GroupID: 1
I1203 03:24:05.760095      21 draft.go:279] RaftContext: id:1 group:1 addr:"systemdb-alpha-0.systemdb-alpha-headless.systemdb-prod.svc.cluster.local:7080" 
I1203 03:24:05.760194      21 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc0000ab300 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x3111300 DisableProposalForwarding:false}
[Sentry] 2022/12/03 03:24:05 Sending fatal event [970f5b301eab485092fc18b30cf5609b] to o318308.ingest.sentry.io project: 1805390
2022/12/03 03:24:05 proto: SchemaUpdate: illegal tag 0 (wire type 7)
Error while loading schema from db
github.com/dgraph-io/dgraph/x.Checkf
	/home/runner/work/dgraph/dgraph/x/error.go:51
github.com/dgraph-io/dgraph/schema.LoadSchemaFromDb.func1
	/home/runner/work/dgraph/dgraph/schema/schema.go:533
github.com/dgraph-io/badger/v3.(*Item).Value
	/home/runner/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.1/iterator.go:100
github.com/dgraph-io/dgraph/schema.LoadSchemaFromDb
	/home/runner/work/dgraph/dgraph/schema/schema.go:529
github.com/dgraph-io/dgraph/schema.LoadFromDb
	/home/runner/work/dgraph/dgraph/schema/schema.go:502
github.com/dgraph-io/dgraph/worker.StartRaftNodes
	/home/runner/work/dgraph/dgraph/worker/groups.go:153
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run.func4
	/home/runner/work/dgraph/dgraph/dgraph/cmd/alpha/run.go:814
runtime.goexit
	/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/asm_amd64.s:1571
james@Jamess-MBP SystemDB % E1202 22:26:24.955626   63609 proxy_server.go:147] Error while proxying request: context canceled

Hey @jirick,

A few questions:

  1. Were you able to complete backup/restores with your graph on v22 prior? Or is this the first time you were attempting?
  2. Were the backups created with v22?
  3. Is multi-tenancy enabled in the graph?
  4. What is the size of the graph (either measured by MBs in your p folders or node/edge counts)?

Hi @matthewmcneely, I was able to restore the database (it took multiple attempts);

  1. Yes, was able to create backup/restores on version prior to v22. This isn’t my first time running backup/restore.
  2. The backup was created with v22. v21 and v22 backups are not compatible.
  3. Multi-tenancy is enabled.
  4. The p1 directory size is 27G.