Data management best practices

What I want to do

I want to ensure that data is never corrupted or lost.

What I did

As I have been developing an application that uses Dgraph to store data, there have been many occasions where the data has become corrupt. I do not know why the data was corrupt. On those occasions I have searched for answers and the answer is usually, “delete the /data directory and start from scratch”. That works in development, but for a production system, that answer is not acceptable. Users would never use a product that loses or corrupts their data.

My question, how do I keep the data safe?

Are there some best practices that I could start to use in the development phase, so that when the application is ready for production I will not be losing any data?

As I was reading through the Dgraph Administration documentation, I noticed that there is a “shutdown” procedure. I have never used that. I am currently using a docker-compose to launch local instances and then doing a ctrl+c to shutdown. Is that a problem?

That page mentions data backups. I assume that it is a good idea to backup periodically. Are there any suggestions for how often backups should be made?

I also see that while upgrading the database version, I should probably export/backkup my data and then reimport/restore it after the upgrade.

Are there any other best practices for keeping the data safe?


Dgraph metadata

dgraph version

Dgraph version : v23.1.0
Dgraph codename : dgraph
Dgraph SHA-256 : c455c829ccc239e6b4a0624c37a3ffdc082da94d8ae7b71e5bc0801bb23f6624
Commit SHA-1 : 2b18d19
Commit timestamp : 2023-08-17 13:27:10 -0500
Branch : HEAD
Go version : go1.19.12
jemalloc enabled : true

Hey, I think your observations are correct. I also experience issues with data corruption during local development when restarting and sometimes on OOMs (rarely tho). Which is strange, as dgraph with its append-log design should theoretically be immune to such issues.

On production use with sufficient RAM, dgraph is extremely stable and I don’t see any issues.

I think the right way to combat any data corruption is to have the backup, of course, but also to have a HA setup running with multiple alphas/replicas.

I’d suggest doing backups as often as possible, however it’s quite intensive on the DB resources, so it needs to be timed right.

1 Like

Hi,

We don’t expect any corruptions to occur after stopping the cluster either via Ctrl+C or via docker compose down or docker compose stop.
Dgraph does handle SIGTERM signals properly, and if the p and w dirs from the Alpha and the zw directory from the Zero are retained, the cluster should start back up without any issues.

Can you please provide us with more info wrt the corruption ?

  1. Were any errors seen (for example during an Alpha startup) indicating a ‘data corruption’ ?
  2. Were any dirs from the docker volume moved/replaced before the next attempt to start the cluster ?
  3. Were any changes made to the service spec in the docker compose file ?

Lastly, can you please try the following:

  • Take a copy of your existing p directory: cp -r p p_copy
  • Delete the LOCK file from p_copy : cd p_copy; rm -f LOCK; cd ..
  • Run dgraph debug --postings p_copy --readonly=false --histogram and send us the output here.

Best,
Rahul

1 Like

@rarvikar

  1. Yes. There were errors. I am not sure if the errors indicated data corruption, but I do know that when I moved the data directory and let Dgraph create a new one, it was able to run successfully. Here are the errors I see when I use my previous data directory:
Summary
$ docker-compose up
[+] Running 3/0
 ✔ Container dgraph-zero-1   Created                                                                                                                                                                                        0.0s
 ✔ Container dgraph-ratel-1  Created                                                                                                                                                                                        0.0s
 ✔ Container dgraph-alpha-1  Created                                                                                                                                                                                        0.0s
Attaching to dgraph-alpha-1, dgraph-ratel-1, dgraph-zero-1
dgraph-ratel-1  | 2023/09/15 14:06:51 Listening on :8000...
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: Environment
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: Modules
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: IgnoreErrors
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: Environment
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: Modules
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: IgnoreErrors
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: ContextifyFrames
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: Environment
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: Modules
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Integration installed: IgnoreErrors
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: Environment
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: Modules
dgraph-alpha-1  | [Sentry] 2023/09/15 14:06:51 Integration installed: IgnoreErrors
dgraph-zero-1   | I0915 14:06:51.804349      16 sentry_integration.go:47]
dgraph-alpha-1  | I0915 14:06:51.804347      16 sentry_integration.go:47]
dgraph-zero-1   | This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/dgraph-sentry-integration/#data-handling.
dgraph-alpha-1  | This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/dgraph-sentry-integration/#data-handling.
dgraph-zero-1   | I0915 14:06:51.844637      16 init.go:85]
dgraph-zero-1   |
dgraph-zero-1   | Dgraph version   : v23.1.0
dgraph-zero-1   | Dgraph codename  : dgraph
dgraph-zero-1   | Dgraph SHA-256   : c455c829ccc239e6b4a0624c37a3ffdc082da94d8ae7b71e5bc0801bb23f6624
dgraph-zero-1   | Commit SHA-1     : 2b18d19
dgraph-zero-1   | Commit timestamp : 2023-08-17 13:27:10 -0500
dgraph-zero-1   | Branch           : HEAD
dgraph-zero-1   | Go version       : go1.19.12
dgraph-zero-1   | jemalloc enabled : true
dgraph-zero-1   |
dgraph-zero-1   | For Dgraph official documentation, visit https://dgraph.io/docs.
dgraph-zero-1   | For discussions about Dgraph     , visit http://discuss.dgraph.io.
dgraph-zero-1   | For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.
dgraph-zero-1   |
dgraph-zero-1   | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
dgraph-alpha-1  | I0915 14:06:51.844644      16 init.go:85]
dgraph-zero-1   | Copyright 2015-2023 Dgraph Labs, Inc.
dgraph-zero-1   |
dgraph-zero-1   |
dgraph-zero-1   | I0915 14:06:51.844739      16 run.go:253] Setting Config to: {raft:0x40000121c0 telemetry:0x4000012178 limit:0x40000121c8 bindall:true portOffset:0 numReplicas:1 peer: w:zw rebalanceInterval:480000000000 tlsClientConfig:<nil> audit:<nil> limiterConfig:0x4000429eb0}
dgraph-zero-1   | I0915 14:06:51.844782      16 run.go:143] Setting up grpc listener at: 0.0.0.0:5080
dgraph-zero-1   | I0915 14:06:51.845099      16 run.go:143] Setting up http listener at: 0.0.0.0:6080
dgraph-alpha-1  |
dgraph-alpha-1  | Dgraph version   : v23.1.0
dgraph-alpha-1  | Dgraph codename  : dgraph
dgraph-alpha-1  | Dgraph SHA-256   : c455c829ccc239e6b4a0624c37a3ffdc082da94d8ae7b71e5bc0801bb23f6624
dgraph-alpha-1  | Commit SHA-1     : 2b18d19
dgraph-alpha-1  | Commit timestamp : 2023-08-17 13:27:10 -0500
dgraph-alpha-1  | Branch           : HEAD
dgraph-alpha-1  | Go version       : go1.19.12
dgraph-alpha-1  | jemalloc enabled : true
dgraph-alpha-1  |
dgraph-alpha-1  | For Dgraph official documentation, visit https://dgraph.io/docs.
dgraph-alpha-1  | For discussions about Dgraph     , visit http://discuss.dgraph.io.
dgraph-alpha-1  | For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.
dgraph-alpha-1  |
dgraph-alpha-1  | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
dgraph-alpha-1  | Copyright 2015-2023 Dgraph Labs, Inc.
dgraph-alpha-1  |
dgraph-alpha-1  |
dgraph-alpha-1  | I0915 14:06:51.844690      16 run.go:753] x.Config: {PortOffset:0 Limit:query-timeout=0ms; txn-abort-after=5m; max-pending-queries=10000; shared-instance=false; query-edge=1000000; normalize-node=10000; mutations-nquad=1000000; disallow-drop=false; max-retries=10; mutations=allow LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:10 SharedInstance:false GraphQL:lambda-url=; introspection=true; debug=false; extensions=true; poll-interval=1s GraphQLDebug:false NormalizeCompatibilityMode:}
dgraph-alpha-1  | I0915 14:06:51.844746      16 run.go:754] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:alpha:7080 ZeroAddr:[zero:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:snapshot-after-duration=30m; pending-proposals=256; idx=; group=; learner=false; snapshot-after-entries=10000 Badger:{testOnlyOptions:{syncChan:<nil>} Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0x400037ea90 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 ExternalMagicVersion:0 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[{Lower:0.0.0.0 Upper:255.255.255.255}] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2023-09-15 14:06:51.735352167 +0000 UTC m=+0.047299751 Security:whitelist=0.0.0.0/0; token= EncryptionKey:**** LogDQLRequest:0 HardSync:false Audit:false}
dgraph-alpha-1  | I0915 14:06:51.844846      16 run.go:755] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN; tls=false;}
dgraph-alpha-1  | I0915 14:06:51.849551      16 storage.go:124] Init Raft Storage with snap: 18, first: 19, last: 0
dgraph-alpha-1  | I0915 14:06:51.849880      16 server_state.go:141] Opening postings BadgerDB with options: {testOnlyOptions:{syncChan:<nil>} Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x31706e0 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 ExternalMagicVersion:0 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
dgraph-zero-1   | I0915 14:06:51.850535      16 storage.go:124] Init Raft Storage with snap: 216, first: 217, last: 0
dgraph-zero-1   | I0915 14:06:51.850692      16 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0x40004a6a80 Applied:216 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x31706e0 DisableProposalForwarding:false}
dgraph-zero-1   | I0915 14:06:51.856549      16 node.go:312] Found Snapshot.Metadata: {ConfState:{Nodes:[1] Learners:[] XXX_unrecognized:[]} Index:216 Term:31 XXX_unrecognized:[]}
dgraph-zero-1   | I0915 14:06:51.856645      16 node.go:323] Found hardstate: {Term:35 Vote:1 Commit:228 XXX_unrecognized:[]}
dgraph-zero-1   | I0915 14:06:51.856651      16 node.go:328] Group 0 found 0 entries
dgraph-zero-1   | I0915 14:06:51.856654      16 raft.go:649] Restarting node for dgraphzero
dgraph-zero-1   | I0915 14:06:51.856669      16 node.go:190] Setting conf state to nodes:1
dgraph-zero-1   | I0915 14:06:51.857477      16 pool.go:165] CONN: Connecting to alpha:7080
dgraph-zero-1   | 2023/09/15 14:06:51 1 state.commit 228 is out of range [216, 216]
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Buffer flushed successfully.
dgraph-zero-1   | panic: 1 state.commit 228 is out of range [216, 216]
dgraph-zero-1   |
dgraph-zero-1   | goroutine 1 [running]:
dgraph-zero-1   | log.Panicf({0x1f07612?, 0x16acab0?}, {0x40004a78c0?, 0x1c317a0?, 0x40004a6a01?})
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/log/log.go:395 +0x68
dgraph-zero-1   | github.com/dgraph-io/dgraph/x.(*ToGlog).Panicf(0x40007bd258?, {0x1f07612?, 0x90?}, {0x40004a78c0?, 0x1?, 0x60662af7ba1c4ec1?})
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/dgraph/dgraph/x/log.go:39 +0x38
dgraph-zero-1   | go.etcd.io/etcd/raft.(*raft).loadState(0x4000444640, {0x23, 0x1, 0xe4, {0x0, 0x0, 0x0}})
dgraph-zero-1   |   /home/ubuntu/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/raft.go:1475 +0x1c4
dgraph-zero-1   | go.etcd.io/etcd/raft.newRaft(0x400050c840)
dgraph-zero-1   |   /home/ubuntu/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/raft.go:377 +0x5dc
dgraph-zero-1   | go.etcd.io/etcd/raft.RestartNode(0x400050c840)
dgraph-zero-1   |   /home/ubuntu/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/node.go:242 +0x24
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd/zero.(*node).initAndStartNode(0x400048f1a0)
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/zero/raft.go:664 +0x3a0
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd/zero.run()
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/zero/run.go:337 +0xc80
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd/zero.init.0.func1(0x4000288600?, {0x4000572d00?, 0x1?, 0x1?})
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/zero/run.go:82 +0x64
dgraph-zero-1   | github.com/spf13/cobra.(*Command).execute(0x4000288600, {0x4000572ce0, 0x1, 0x1})
dgraph-zero-1   |   /home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944 +0x5c8
dgraph-zero-1   | github.com/spf13/cobra.(*Command).ExecuteC(0x2d26fa0)
dgraph-zero-1   |   /home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x368
dgraph-zero-1   | github.com/spf13/cobra.(*Command).Execute(...)
dgraph-zero-1   |   /home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78 +0x64
dgraph-zero-1   | main.main()
dgraph-zero-1   |   /home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98 +0xf4
dgraph-zero-1   | W0915 14:06:51.862539       1 sentry_integration.go:139] unable to read CID from file /tmp/dgraph-zero-cid-sentry open /tmp/dgraph-zero-cid-sentry: no such file or directory. Skip
dgraph-alpha-1  | I0915 14:06:51.863287      16 log.go:33] All 4 tables opened in 1ms
dgraph-alpha-1  | I0915 14:06:51.863814      16 log.go:33] Discard stats nextEmptySlot: 0
dgraph-alpha-1  | I0915 14:06:51.863838      16 log.go:33] Set nextTxnTs to 300005
dgraph-alpha-1  | I0915 14:06:51.866011      16 groups.go:101] Current Raft Id: 0x1
dgraph-alpha-1  | I0915 14:06:51.866142      16 worker.go:114] Worker listening at address: [::]:7080
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:51 Sending fatal event [00def09201814ba0a872a5ad7ff5bcb4] to o318308.ingest.sentry.io project: 1805390
dgraph-alpha-1  | I0915 14:06:51.866024      16 groups.go:117] Sending member request to Zero: id:1 addr:"alpha:7080"
dgraph-alpha-1  | I0915 14:06:51.866879      16 run.go:566] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
dgraph-alpha-1  | I0915 14:06:51.866934      16 run.go:567] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
dgraph-alpha-1  | E0915 14:06:51.866930      16 groups.go:1229] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": unable to find any servers for group: 1. closer err: <nil>
dgraph-alpha-1  | I0915 14:06:51.866944      16 run.go:594] gRPC server started.  Listening on port 9080
dgraph-alpha-1  | I0915 14:06:51.867361      16 run.go:595] HTTP server started.  Listening on port 8080
dgraph-alpha-1  | I0915 14:06:51.968215      16 pool.go:165] CONN: Connecting to zero:5080
dgraph-zero-1   | [Sentry] 2023/09/15 14:06:52 Buffer flushed successfully.
dgraph-zero-1 exited with code 1
  1. As far as I am aware, I did not move or replace any directories from the docker volume before the next attempt to start the cluster.
  2. I did not change anything in the docker compose file before the errors started happening. However, I do think that the computer crashed while docker desktop was running. I do not remember if the dgraph docker container was running at the time of the crash.

After running docker-compose up to see the errors I posted above, I did my standard shutdown routine:

^CGracefully stopping... (press Ctrl+C again to force)
Aborting on container exit...
[+] Stopping 3/3
 ✔ Container dgraph-ratel-1  Stopped                                                                                                                                                                                        0.1s
 ✔ Container dgraph-zero-1   Stopped                                                                                                                                                                                        0.0s
 ✔ Container dgraph-alpha-1  Stopped                                                                                                                                                                                        0.1s
canceled

I assumed that the docker containers had stopped. But when I did a docker container ls I found that the docker containers were still running. So I did a docker exec -it <id> bash into the alpha container. Then I ran your debug command. Here is the output:

Summary
# dgraph debug --postings p --readonly=false --histogram
Opening DB: p
Listening for /debug HTTP requests at port: 8080
Port busy. Trying another one...
Listening for /debug HTTP requests at port: 8081
2023/09/15 14:37:23 Cannot acquire directory lock on "p".  Another process is using this Badger database. error: resource temporarily unavailable

github.com/dgraph-io/dgraph/x.Check
	/home/ubuntu/actions-runner/_work/dgraph/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/dgraph/cmd/debug.run
	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/debug/run.go:970
github.com/dgraph-io/dgraph/dgraph/cmd/debug.init.0.func1
	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/debug/run.go:85
github.com/spf13/cobra.(*Command).execute
	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944
github.com/spf13/cobra.(*Command).ExecuteC
	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78
main.main
	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98
runtime.main
	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/proc.go:250
runtime.goexit
	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/asm_arm64.s:1172

I only ever just do a Ctrl+C in the session that I started the containers in. I never use docker-compose stop. I wonder if perhaps I stopped a session, but it did not actually fully shutdown, and then I started it later and there was a conflict…

I have now realized that I need to check to make sure the docker containers have actually stopped after I do a Ctrl+C. Sometimes they do not all get the signal to shutdown.

Hi,

I’d be happy to have a chat/meeting to walk you through the process. I’ve sent you a PM; feel free to reply to me.

But indeed, you’d need to ensure that Ctrl+C does properly terminate both containers, for the Zero and Alpha processes respectively, before you attempt to restart the cluster or reuse the p dir to start a new cluster.
We need to ensure that the containers for the original cluster have stopped properly via Ctrl+C.

As a best practice, you can start a cluster in detached mode as:

docker compose up -d

To stop a cluster and delete the old containers:

docker compose down

If you’d like to re-use an existing p dir to start a new cluster, you’d need to capture the max values from the existing Zero and assign those values to the new Zero, in the new cluster, to reuse the old p dir.

Also, it seems you missed taking a copy of the p dir before running dgraph debug... as per my previous note. Here are the steps again:

  • Take a copy of your existing p directory: cp -r p p_copy
  • Delete the LOCK file from p_copy : cd p_copy; rm -f LOCK; cd ..
  • Run dgraph debug --postings p_copy --readonly=false --histogram and send us the output here.

If there is indeed some corruption, the above command should print it.

1 Like

Yesterday, I was using a fresh copy of Dgraph. I shutdown properly, I made sure all the containers where shutdown. This morning I tried to start Dgraph again, docker compose up, and the zero is panicking.

I do not know if the data is corrupt, but this is the panic message:

panic: 1 state.commit 41 is out of range [37, 37]

Here is the output from the Docker session:

Summary
$ docker compose up
[+] Running 5/1
 ✔ Network dgraph_default                                                                                                                               Created                                                             0.0s
 ✔ Container dgraph-zero-1                                                                                                                              Created                                                             0.1s
 ✔ Container dgraph-alpha-1                                                                                                                             Created                                                             0.1s
 ✔ Container dgraph-ratel-1                                                                                                                             Created                                                             0.1s
 ! ratel The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested                                                                     0.0s
Attaching to dgraph-alpha-1, dgraph-ratel-1, dgraph-zero-1
dgraph-ratel-1  | 2023/09/20 15:00:56 Listening on :8000...
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: Environment
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: ContextifyFrames
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: Environment
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: Modules
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: IgnoreErrors
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: Modules
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: IgnoreErrors
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: ContextifyFrames
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: Environment
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: Modules
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Integration installed: IgnoreErrors
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: Environment
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: Modules
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Integration installed: IgnoreErrors
dgraph-zero-1   | I0920 15:00:56.916043      16 sentry_integration.go:47]
dgraph-zero-1   | This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/dgraph-sentry-integration/#data-handling.
dgraph-alpha-1  | I0920 15:00:56.916043      16 sentry_integration.go:47]
dgraph-alpha-1  | This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/dgraph-sentry-integration/#data-handling.
dgraph-zero-1   | I0920 15:00:56.951344      16 init.go:85]
dgraph-alpha-1  | I0920 15:00:56.951343      16 init.go:85]
dgraph-zero-1   |
dgraph-zero-1   | Dgraph version   : v23.1.0
dgraph-zero-1   | Dgraph codename  : dgraph
dgraph-zero-1   | Dgraph SHA-256   : c455c829ccc239e6b4a0624c37a3ffdc082da94d8ae7b71e5bc0801bb23f6624
dgraph-zero-1   | Commit SHA-1     : 2b18d19
dgraph-zero-1   | Commit timestamp : 2023-08-17 13:27:10 -0500
dgraph-zero-1   | Branch           : HEAD
dgraph-zero-1   | Go version       : go1.19.12
dgraph-zero-1   | jemalloc enabled : true
dgraph-zero-1   |
dgraph-zero-1   | For Dgraph official documentation, visit https://dgraph.io/docs.
dgraph-zero-1   | For discussions about Dgraph     , visit http://discuss.dgraph.io.
dgraph-zero-1   | For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.
dgraph-zero-1   |
dgraph-zero-1   | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
dgraph-zero-1   | Copyright 2015-2023 Dgraph Labs, Inc.
dgraph-zero-1   |
dgraph-zero-1   |
dgraph-zero-1   | I0920 15:00:56.951425      16 run.go:253] Setting Config to: {raft:0x40003ae200 telemetry:0x4000112068 limit:0x40003ae218 bindall:true portOffset:0 numReplicas:1 peer: w:zw rebalanceInterval:480000000000 tlsClientConfig:<nil> audit:<nil> limiterConfig:0x400049a2d0}
dgraph-zero-1   | I0920 15:00:56.951450      16 run.go:143] Setting up grpc listener at: 0.0.0.0:5080
dgraph-zero-1   | I0920 15:00:56.951644      16 run.go:143] Setting up http listener at: 0.0.0.0:6080
dgraph-alpha-1  |
dgraph-alpha-1  | Dgraph version   : v23.1.0
dgraph-alpha-1  | Dgraph codename  : dgraph
dgraph-alpha-1  | Dgraph SHA-256   : c455c829ccc239e6b4a0624c37a3ffdc082da94d8ae7b71e5bc0801bb23f6624
dgraph-alpha-1  | Commit SHA-1     : 2b18d19
dgraph-alpha-1  | Commit timestamp : 2023-08-17 13:27:10 -0500
dgraph-alpha-1  | Branch           : HEAD
dgraph-alpha-1  | Go version       : go1.19.12
dgraph-alpha-1  | jemalloc enabled : true
dgraph-alpha-1  |
dgraph-alpha-1  | For Dgraph official documentation, visit https://dgraph.io/docs.
dgraph-alpha-1  | For discussions about Dgraph     , visit http://discuss.dgraph.io.
dgraph-alpha-1  | For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.
dgraph-alpha-1  |
dgraph-alpha-1  | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
dgraph-alpha-1  | Copyright 2015-2023 Dgraph Labs, Inc.
dgraph-alpha-1  |
dgraph-alpha-1  |
dgraph-alpha-1  | I0920 15:00:56.951366      16 run.go:753] x.Config: {PortOffset:0 Limit:txn-abort-after=5m; max-retries=10; mutations=allow; normalize-node=10000; query-timeout=0ms; max-pending-queries=10000; shared-instance=false; query-edge=1000000; mutations-nquad=1000000; disallow-drop=false LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:10 SharedInstance:false GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false NormalizeCompatibilityMode:}
dgraph-alpha-1  | I0920 15:00:56.951420      16 run.go:754] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:alpha:7080 ZeroAddr:[zero:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:pending-proposals=256; idx=; group=; learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m Badger:{testOnlyOptions:{syncChan:<nil>} Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0x4000420b00 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 ExternalMagicVersion:0 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[{Lower:0.0.0.0 Upper:255.255.255.255}] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2023-09-20 15:00:56.841165834 +0000 UTC m=+0.045222918 Security:whitelist=0.0.0.0/0; token= EncryptionKey:**** LogDQLRequest:0 HardSync:false Audit:false}
dgraph-alpha-1  | I0920 15:00:56.951466      16 run.go:755] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN; tls=false;}
dgraph-zero-1   | I0920 15:00:56.954021      16 log.go:296] Found file: 24 First Index: 0
dgraph-zero-1   | I0920 15:00:56.962269      16 storage.go:124] Init Raft Storage with snap: 37, first: 38, last: 0
dgraph-zero-1   | I0920 15:00:56.962530      16 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0x4000498800 Applied:37 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x31706e0 DisableProposalForwarding:false}
dgraph-zero-1   | I0920 15:00:56.963485      16 node.go:312] Found Snapshot.Metadata: {ConfState:{Nodes:[1] Learners:[] XXX_unrecognized:[]} Index:37 Term:2 XXX_unrecognized:[]}
dgraph-zero-1   | I0920 15:00:56.963505      16 node.go:323] Found hardstate: {Term:2 Vote:1 Commit:41 XXX_unrecognized:[]}
dgraph-zero-1   | I0920 15:00:56.963511      16 node.go:328] Group 0 found 0 entries
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:56 Sending fatal event [045eefda75684dcbbe98569ebb828696] to o318308.ingest.sentry.io project: 1805390
dgraph-zero-1   | I0920 15:00:56.963519      16 raft.go:649] Restarting node for dgraphzero
dgraph-alpha-1  | 2023/09/20 15:00:56 strconv.ParseInt: parsing "00001 2": invalid syntax
dgraph-alpha-1  | while parsing: w/00001 2.wal
dgraph-alpha-1  | github.com/dgraph-io/dgraph/raftwal.getLogFiles
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/raftwal/log.go:284
dgraph-alpha-1  | github.com/dgraph-io/dgraph/raftwal.openWal
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/raftwal/wal.go:426
dgraph-alpha-1  | github.com/dgraph-io/dgraph/raftwal.InitEncrypted
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/raftwal/storage.go:102
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:123
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.InitServerState
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:80
dgraph-zero-1   | I0920 15:00:56.963528      16 node.go:190] Setting conf state to nodes:1
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:757
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.1.func1
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:92
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).ExecuteC
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).Execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd.Execute
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Buffer flushed successfully.
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78
dgraph-alpha-1  | main.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98
dgraph-alpha-1  | runtime.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/proc.go:250
dgraph-alpha-1  | runtime.goexit
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/asm_arm64.s:1172
dgraph-alpha-1  |
dgraph-alpha-1  | github.com/dgraph-io/dgraph/x.Check
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/x/error.go:42
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:124
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.InitServerState
dgraph-zero-1   | I0920 15:00:56.964076      16 pool.go:165] CONN: Connecting to alpha:7080
dgraph-zero-1   | 2023/09/20 15:00:56 1 state.commit 41 is out of range [37, 37]
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:80
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:757
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.1.func1
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:92
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).ExecuteC
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).Execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd.Execute
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78
dgraph-alpha-1  | main.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98
dgraph-alpha-1  | runtime.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/proc.go:250
dgraph-alpha-1  | runtime.goexit
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/asm_arm64.s:1172
dgraph-zero-1   | panic: 1 state.commit 41 is out of range [37, 37]
dgraph-zero-1   |
dgraph-zero-1   | goroutine 1 [running]:
dgraph-zero-1   | log.Panicf({0x1f07612?, 0x16acab0?}, {0x4000499c00?, 0x1c317a0?, 0x4000498801?})
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/log/log.go:395 +0x68
dgraph-zero-1   | github.com/dgraph-io/dgraph/x.(*ToGlog).Panicf(0x40006bb258?, {0x1f07612?, 0x90?}, {0x4000499c00?, 0x1?, 0xf1999fc986a5938e?})
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/x/log.go:39 +0x38
dgraph-zero-1   | go.etcd.io/etcd/raft.(*raft).loadState(0x40003d2640, {0x2, 0x1, 0x29, {0x0, 0x0, 0x0}})
dgraph-zero-1   | 	/home/ubuntu/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/raft.go:1475 +0x1c4
dgraph-zero-1   | go.etcd.io/etcd/raft.newRaft(0x4000416370)
dgraph-zero-1   | 	/home/ubuntu/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/raft.go:377 +0x5dc
dgraph-zero-1   | go.etcd.io/etcd/raft.RestartNode(0x4000416370)
dgraph-zero-1   | 	/home/ubuntu/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/node.go:242 +0x24
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd/zero.(*node).initAndStartNode(0x4000101b00)
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/zero/raft.go:664 +0x3a0
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd/zero.run()
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/zero/run.go:337 +0xc80
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd/zero.init.0.func1(0x40004f0600?, {0x40004704a0?, 0x1?, 0x1?})
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/zero/run.go:82 +0x64
dgraph-zero-1   | github.com/spf13/cobra.(*Command).execute(0x40004f0600, {0x4000470480, 0x1, 0x1})
dgraph-zero-1   | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944 +0x5c8
dgraph-zero-1   | github.com/spf13/cobra.(*Command).ExecuteC(0x2d26fa0)
dgraph-zero-1   | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x368
dgraph-zero-1   | github.com/spf13/cobra.(*Command).Execute(...)
dgraph-zero-1   | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
dgraph-zero-1   | github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78 +0x64
dgraph-zero-1   | main.main()
dgraph-zero-1   | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98 +0xf4
dgraph-zero-1   | W0920 15:00:56.969462       1 sentry_integration.go:139] unable to read CID from file /tmp/dgraph-zero-cid-sentry open /tmp/dgraph-zero-cid-sentry: no such file or directory. Skip
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:56 Sending fatal event [bec22f74fb9549a886b753f8764b913b] to o318308.ingest.sentry.io project: 1805390
dgraph-alpha-1 exited with code 1
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:57 Buffer flushed successfully.
dgraph-zero-1 exited with code 1
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: Environment
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: Modules
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: IgnoreErrors
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: ContextifyFrames
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: Environment
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: Modules
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Integration installed: IgnoreErrors
dgraph-alpha-1  | I0920 15:00:57.444567      15 sentry_integration.go:47]
dgraph-alpha-1  | This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/dgraph-sentry-integration/#data-handling.
dgraph-alpha-1  | I0920 15:00:57.480544      15 init.go:85]
dgraph-alpha-1  |
dgraph-alpha-1  | Dgraph version   : v23.1.0
dgraph-alpha-1  | Dgraph codename  : dgraph
dgraph-alpha-1  | Dgraph SHA-256   : c455c829ccc239e6b4a0624c37a3ffdc082da94d8ae7b71e5bc0801bb23f6624
dgraph-alpha-1  | Commit SHA-1     : 2b18d19
dgraph-alpha-1  | Commit timestamp : 2023-08-17 13:27:10 -0500
dgraph-alpha-1  | Branch           : HEAD
dgraph-alpha-1  | Go version       : go1.19.12
dgraph-alpha-1  | jemalloc enabled : true
dgraph-alpha-1  |
dgraph-alpha-1  | For Dgraph official documentation, visit https://dgraph.io/docs.
dgraph-alpha-1  | For discussions about Dgraph     , visit http://discuss.dgraph.io.
dgraph-alpha-1  | For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.
dgraph-alpha-1  |
dgraph-alpha-1  | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
dgraph-alpha-1  | Copyright 2015-2023 Dgraph Labs, Inc.
dgraph-alpha-1  |
dgraph-alpha-1  |
dgraph-alpha-1  | I0920 15:00:57.480574      15 run.go:753] x.Config: {PortOffset:0 Limit:mutations-nquad=1000000; txn-abort-after=5m; mutations=allow; query-edge=1000000; normalize-node=10000; disallow-drop=false; query-timeout=0ms; max-retries=10; max-pending-queries=10000; shared-instance=false LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:10 SharedInstance:false GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false NormalizeCompatibilityMode:}
dgraph-alpha-1  | I0920 15:00:57.480614      15 run.go:754] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:datadog=; ratio=0.01; jaeger= MyAddr:alpha:7080 ZeroAddr:[zero:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:idx=; group=; learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256 Badger:{testOnlyOptions:{syncChan:<nil>} Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0x400018e810 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 ExternalMagicVersion:0 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[{Lower:0.0.0.0 Upper:255.255.255.255}] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2023-09-20 15:00:57.366782626 +0000 UTC m=+0.041683543 Security:token=; whitelist=0.0.0.0/0 EncryptionKey:**** LogDQLRequest:0 HardSync:false Audit:false}
dgraph-alpha-1  | I0920 15:00:57.480655      15 run.go:755] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN; tls=false;}
dgraph-alpha-1  | [Sentry] 2023/09/20 15:00:57 Sending fatal event [29a8c6fcb5de428fab10f0f1fa5a6ce1] to o318308.ingest.sentry.io project: 1805390
dgraph-alpha-1  | 2023/09/20 15:00:57 strconv.ParseInt: parsing "00001 2": invalid syntax
dgraph-alpha-1  | while parsing: w/00001 2.wal
dgraph-alpha-1  | github.com/dgraph-io/dgraph/raftwal.getLogFiles
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/raftwal/log.go:284
dgraph-alpha-1  | github.com/dgraph-io/dgraph/raftwal.openWal
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/raftwal/wal.go:426
dgraph-alpha-1  | github.com/dgraph-io/dgraph/raftwal.InitEncrypted
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/raftwal/storage.go:102
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:123
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.InitServerState
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:80
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:757
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.1.func1
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:92
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).ExecuteC
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).Execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd.Execute
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78
dgraph-alpha-1  | main.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98
dgraph-alpha-1  | runtime.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/proc.go:250
dgraph-alpha-1  | runtime.goexit
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/asm_arm64.s:1172
dgraph-alpha-1  |
dgraph-alpha-1  | github.com/dgraph-io/dgraph/x.Check
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/x/error.go:42
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:124
dgraph-alpha-1  | github.com/dgraph-io/dgraph/worker.InitServerState
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/worker/server_state.go:80
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:757
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.1.func1
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/alpha/run.go:92
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).ExecuteC
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068
dgraph-alpha-1  | github.com/spf13/cobra.(*Command).Execute
dgraph-alpha-1  | 	/home/ubuntu/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
dgraph-alpha-1  | github.com/dgraph-io/dgraph/dgraph/cmd.Execute
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/cmd/root.go:78
dgraph-alpha-1  | main.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/dgraph/dgraph/dgraph/main.go:98
dgraph-alpha-1  | runtime.main
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/proc.go:250
dgraph-alpha-1  | runtime.goexit
dgraph-alpha-1  | 	/home/ubuntu/actions-runner/_work/_tool/go/1.19.12/arm64/src/runtime/asm_arm64.s:1172
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:57 Integration installed: ContextifyFrames
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:57 Integration installed: Environment
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:57 Integration installed: Modules
dgraph-zero-1   | [Sentry] 2023/09/20 15:00:57 Integration installed: IgnoreErrors
dgraph-alpha-1 exited with code 1

Looking at the output log closer, it appears that the problem is not with the data (p), but with the write ahead log from the alpha. It looks like it is reading the filename of the wal file and expecting it to be a valid int, but instead there is a space in it… :thinking:

Here are the contents of my data directory:

p/
  .000001.vlog.icloud
  .00001.mem.icloud
  000001 2.vlog
  000001.sst
  00001 2.mem
  DISCARD
  KEYREGISTRY
  MANIFEST

t/
  tasks.buf

w/
  .00001.wal.icloud
  00001 2.wal
  wal.meta

zw/
  00530.wal
  wal.meta

I am running this on a Mac, so iCloud backup is putting extra files in the directories. Would those cause any issues? I assume not. Those spaces in the files do seem to be problematic though. Would Dgraph be creating files with spaces?

It seems as though iCloud backup is the culprit. :face_with_raised_eyebrow: That is frustrating.