Dgraph upgrade procedure while running on k8s

Till now we have been using the below procedure to upgrade Dgraph which is running on our k8s cluster:


  1. Identify leader node of the Alphas, you can use the latest Ratel version to do so or the below command on any Alpha
  kubectl exec -it proj-graph-engine-0 -- /bin/bash
  curl http://localhost:8080/state | grep leader
  1. Run curl localhost:8080/admin/export on Alpha leader to export data
  2. Copy it to your local machine kubectl cp proj-graph-engine-0:/dgraph/export .
  3. Scale down dgraph-alpha kubectl scale statefulset proj-graph-engine --replicas=0
  4. Scale down dgraph-zero kubectl scale statefulset proj-graph-engine-zero --replicas=0
  5. Remove dgraph-alpha and dgraph-zero volumes e.g. kubectl get pvckubectl delete pvc datadir-proj-graph-engine-0
  6. Upgrade to desired Dgraph version
  7. Scale down dgraph-alpha kubectl scale statefulset proj-graph-engine --replicas=0 , only Zeros should be running
  8. Check which one of Zeros is a leader as only the leader can assign Uids (perform import), connect to any Zero and
  kubectl exec -it proj-graph-engine-zero-0 -- /bin/bash
  curl http://localhost:6080/state | grep leader
  1. Copy files generated with export command (g01.rdf.gz and g01.schema.gz) to dgraph zero leader using:
  kubectl cp g01.rdf.gz proj-graph-engine-zero-0:/dgraph
  kubectl cp g01.schema.gz proj-graph-engine-zero-0:/dgraph
  1. Login into Zero leader kubectl exec -it proj-graph-engine-zero-0 -- /bin/bash
  2. Run bulk load on dgraph-zero dgraph bulk -f /dgraph/g01.rdf.gz -s /dgraph/g01.schema.gz --reduce_shards=1 --map_shards=1 NOTE: --reduce_shards and --map_shards need to be adjusted based on number of groups and shards
  3. Copy files generated into out/ directory to your local machine using kubectl cp proj-graph-engine-zero-0:/dgraph/out .
  4. Scale up dgraph-alpha kubectl scale statefulset proj-graph-engine --replicas=3
  5. Copy p/ directory from generated files into dgraph/ on each dgraph-alpha replica using kubectl cp ./0/p proj-graph-engine-0:/dgraph
  6. Scale down dgraph-alpha kubectl scale statefulset proj-graph-engine --replicas=0
  7. Scale up dgraph-alpha kubectl scale statefulset proj-graph-engine --replicas=3

Now I was trying to perform the same steps but migrating from v20.03.1 to v20.07.2 locally using minikube. When I performed step 16 I noticed such errors in the logs for each Alpha:

I1118 11:37:37.338507      16 log.go:34] Storing value log head: {Fid:0 Len:29 Offset:488}
E1118 11:37:37.344179      16 log.go:32] Failure while flushing memtable to disk: : open p/000008.sst: file exists. Retrying...
I1118 11:37:38.347554      16 log.go:34] Storing value log head: {Fid:0 Len:29 Offset:488}
E1118 11:37:38.348338      16 log.go:32] Failure while flushing memtable to disk: : open p/000009.sst: file exists. Retrying...
I1118 11:37:39.348465      16 log.go:34] Storing value log head: {Fid:0 Len:29 Offset:488}
E1118 11:37:39.348833      16 log.go:32] Failure while flushing memtable to disk: : open p/000010.sst: file exists. Retrying...
I1118 11:37:40.349047      16 log.go:34] Storing value log head: {Fid:0 Len:29 Offset:488}
E1118 11:37:40.349436      16 log.go:32] Failure while flushing memtable to disk: : open p/000011.sst: file exists. Retrying...
I1118 11:37:41.315202      16 log.go:34] Storing value log head: {Fid:0 Len:29 Offset:488}
...

And when I tried to perform step 17 I have got:

I1118 11:40:28.973361      16 server_state.go:181] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x2ec9288 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:false BlockCacheSize:1395864371 IndexCacheSize:536870912 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:15 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false managedTxns:false maxBatchCount:0 maxBatchSize:0}
[Sentry] 2020/11/18 11:40:29 Sending fatal event [5537d627f1ec498fb21e490ec2ca7132] to o318308.ingest.sentry.io project: 1805390
2020/11/18 11:40:29 file does not exist for table 4
Error while creating badger KV posting store
github.com/dgraph-io/dgraph/x.Checkf
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:51
github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:185
github.com/dgraph-io/dgraph/worker.InitServerState
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:54
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:733
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:97
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
	/usr/local/go/src/runtime/proc.go:203
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1373

How can I upgrade database in such case?