Failing to connect Alpha with Zero - Need help in setting the docker cluster setup

rajesh · May 31, 2020, 3:04am

I am creating a new cluster setup as per https://dgraph.io/docs/deploy/#run-using-docker Using Docker.
Facing issue in connecting the Alpha with the Zero.

Logs from Zero:
[root@0F960 ~]# docker run -it -p 5080:5080 -p 6080:6080 -v /mnt/md0/appdata/zero:/dgraph dgraph/dgraph:latest dgraph zero --my=192.168.1.126:5080
[Decoder]: Using assembly version of decoder
[Sentry] 2020/05/31 02:46:14 Integration installed: ContextifyFrames
[Sentry] 2020/05/31 02:46:14 Integration installed: Environment
[Sentry] 2020/05/31 02:46:14 Integration installed: Modules
[Sentry] 2020/05/31 02:46:14 Integration installed: IgnoreErrors
[Decoder]: Using assembly version of decoder
[Sentry] 2020/05/31 02:46:15 Integration installed: ContextifyFrames
[Sentry] 2020/05/31 02:46:15 Integration installed: Environment
[Sentry] 2020/05/31 02:46:15 Integration installed: Modules
[Sentry] 2020/05/31 02:46:15 Integration installed: IgnoreErrors
I0531 02:46:15.799457 14 init.go:99]

Dgraph version : v20.03.1
Commit timestamp : 2020-04-24 13:53:41 -0700
Branch : HEAD
Go version : go1.14.1

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

I0531 02:46:15.800297 I0531 02:46:15.800755 badger 2020/05/31 I0531 02:46:16.172124 E0531 02:46:16.172567 I0531 02:46:16.173199 I0531 02:46:16.173663 I0531 02:46:16.173725 I0531 02:46:16.173770 E0531 02:46:16.174223 I0531 02:46:16.174885 I0531 02:46:16.213030 I0531 02:46:16.213219 I0531 02:46:17.175241 I0531 02:46:18.475158 I0531 02:46:18.475505 I0531 02:46:18.475564 I0531 02:46:18.475709 I0531 02:46:18.475730 I0531 02:46:18.475878 I0531 02:46:18.476068 I0531 02:46:18.476884 I0531 02:46:18.477515 W0531 02:46:19.175261 E0531 02:46:19.175422 I0531 02:46:22.270327 I0531 02:46:22.312004 14 run.go:108] Setting up grpc listener at: 0.0.0.0:5080
14 run.go:108] Setting up http listener at: 0.0.0.0:6080
02:46:16 INFO: All 0 tables opened in 1ms
14 node.go:148] Setting raft.Config to: &{ID:1 peers: learners: ElectionTick:20 HeartbeatTick:1 Storage:0xc0000b4280 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x282e510 DisableProposalForwarding:false}
14 storage.go:97] deleteRange failed with error: requested index is unavailable due to compaction, from: 0, until: 0
14 node.go:326] Group 0 found 0 entries
14 log.go:34] 1 became follower at term 0
14 log.go:34] newRaft 1 [peers: , term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
14 log.go:34] 1 became follower at term 1
14 raft.go:516] While proposing CID: Not Zero leader. Aborting proposal: cid:“190ee5c6-65ef-4833-a61c-4eb7bfac71b8” . Retrying…
14 run.go:307] Running Dgraph Zero…
14 node.go:185] Setting conf state to nodes:1
14 raft.go:702] Done applying conf change at 0x1
14 log.go:34] 1 no leader at term 1; dropping index reading msg
14 log.go:34] 1 is starting a new election at term 1
14 log.go:34] 1 became pre-candidate at term 1
14 log.go:34] 1 received MsgPreVoteResp from 1 at term 1
14 log.go:34] 1 became candidate at term 2
14 log.go:34] 1 received MsgVoteResp from 1 at term 2
14 log.go:34] 1 became leader at term 2
14 log.go:34] raft.node: 1 elected leader 1 at term 2
14 raft.go:667] I’ve become the leader, updating leases.
14 assign.go:42] Updated Lease id: 1. Txn Ts: 1
14 node.go:674] [0x1] Read index context timed out
14 raft.go:516] While proposing CID: Not Zero leader. Aborting proposal: cid:“217ae434-98a3-4a8d-a5f7-0bc19e528e6a” . Retrying…
14 raft.go:509] CID set for cluster: b9d28391-ab87-43f2-810c-d4baea23b710
14 license_ee.go:45] Enterprise state proposed to the cluster: key:“z1-11311509396647037717” license:<maxNodes:1844674407370955

Logs from Alpha - Server1
F960 ~]# docker run -it -p 7080:7080 -p 8080:8080 -p 9080:9080 -v ~/server1:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=2048 --zero=192.168.1.126:5080 --my=192.168.1.126:7080
[Decoder]: Using assembly version of decoder
[Sentry] 2020/05/31 02:47:03 Integration installed: ContextifyFrames
[Sentry] 2020/05/31 02:47:03 Integration installed: Environment
[Sentry] 2020/05/31 02:47:03 Integration installed: Modules
[Sentry] 2020/05/31 02:47:03 Integration installed: IgnoreErrors
[Decoder]: Using assembly version of decoder
[Sentry] 2020/05/31 02:47:04 Integration installed: ContextifyFrames
[Sentry] 2020/05/31 02:47:04 Integration installed: Environment
[Sentry] 2020/05/31 02:47:04 Integration installed: Modules
[Sentry] 2020/05/31 02:47:04 Integration installed: IgnoreErrors
I0531 02:47:04.491079 14 init.go:99]

Dgraph version : v20.03.1
Commit timestamp : 2020-04-24 13:53:41 -0700
Branch : HEAD
Go version : go1.14.1

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

I0531 02:47:04.491870 14 run.go:609] x.Config: {PortOffset:0 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000}
I0531 02:47:04.491934 14 run.go:610] x.WorkerConfig: {ExportPath:export NumPendingProposals:256 Tracing:1 MyAddr:192.168.1.126:7080 ZeroAddr:[192.168.1.126:5080] RaftId:0 WhiteListedIPRanges: MaxRetries:-1 StrictMutations:false AclEnabled:false AbortOlderThan:5m0s SnapshotAfter:10000 ProposedGroupId:0 StartTime:2020-05-31 02:47:03.630075952 +0000 UTC m=+0.018112711 LudicrousMode:false BadgerKeyFile:}
I0531 02:47:04.492003 14 run.go:611] worker.Config: {PostingDir:p BadgerTables:mmap BadgerVlog:mmap BadgerKeyFile: BadgerCompressionLevel:3 WALDir:w MutationsMode:0 AuthToken: AllottedMemory:2048 HmacSecret: AccessJwtTtl:0s RefreshJwtTtl:0s AclRefreshInterval:0s}
I0531 02:47:04.492099 14 server_state.go:75] Setting Badger Compression Level: 3
I0531 02:47:04.492123 14 server_state.go:84] Setting Badger table load option: mmap
I0531 02:47:04.492134 14 server_state.go:96] Setting Badger value log load option: mmap
I0531 02:47:04.492196 14 server_state.go:141] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x282e510 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:true MaxCacheSize:10485760 MaxBfCacheSize:0 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey: EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0531 02:47:04.546257 14 log.go:34] All 0 tables opened in 0s
I0531 02:47:04.548453 14 log.go:34] Replaying file id: 0 at offset: 0
I0531 02:47:04.548490 14 log.go:34] Replay took: 5.906µs
I0531 02:47:04.548643 14 server_state.go:75] Setting Badger Compression Level: 3
I0531 02:47:04.548662 14 server_state.go:84] Setting Badger table load option: mmap
I0531 02:47:04.548677 14 server_state.go:96] Setting Badger value log load option: mmap
I0531 02:47:04.548694 14 server_state.go:160] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x282e510 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:true MaxCacheSize:1073741824 MaxBfCacheSize:0 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey: EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0531 02:47:04.573671 14 log.go:34] All 0 tables opened in 0s
I0531 02:47:04.575469 14 log.go:34] Replaying file id: 0 at offset: 0
I0531 02:47:04.576056 14 log.go:34] Replay took: 12.37µs
I0531 02:47:04.577360 14 groups.go:107] Current Raft Id: 0x0
I0531 02:47:04.578198 14 worker.go:96] Worker listening at address: [::]:7080
I0531 02:47:04.579385 14 run.go:480] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0531 02:47:04.579532 14 run.go:481] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0531 02:47:04.579719 14 run.go:512] gRPC server started. Listening on port 9080
I0531 02:47:04.579758 14 run.go:513] HTTP server started. Listening on port 8080
I0531 02:47:04.678139 14 pool.go:160] CONNECTING to 192.168.1.126:5080
I0531 02:47:09.582103 14 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0531 02:47:09.582784 14 admin.go:520] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0531 02:47:14.584376 14 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0531 02:47:14.584425 14 admin.go:520] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0531 02:47:24.592850 14 admin.go:520] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
W0531 02:47:24.679725 14 pool.go:254] Connection lost with 192.168.1.126:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.1.126:5080: i/o timeout"
I0531 02:47:29.593193 14 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests

The health shows unhealthy status,
192.168.1.126:8080/health?all

[{“status”:“unhealthy”,“lastEcho”:1590893948},{“instance”:“alpha”,“address”:“192.168.1.126:7080”,“status”:“healthy”,“group”:“0”,“version”:“v20.03.1”,“uptime”:14,“lastEcho”:1590893962}]

Status of Zero shows,
192.168.1.126:6080/state
{“counter”:“4”,“zeros”:{“1”:{“id”:“1”,“addr”:“192.168.1.126:5080”,“leader”:true}},“cid”:“b9d28391-ab87-43f2-810c-d4baea23b710”,“license”:{“maxNodes”:“18446744073709551615”,“expiryTs”:“1593485182”,“enabled”:true}}

MichelDiz · May 31, 2020, 4:08am

This is docker related. You should use the container naming in the “my” flag (address). Instead of localhost or some IP. And, as you gonna expose the cluster anyway. It will be accessible for your app via localhost or IP. The “my” flag is about the communication between Dgraph instances.

See
https://github.com/dgraph-io/dgraph/issues/4580#issuecomment-574850341

rajesh · June 1, 2020, 4:55pm

Thanks MichelDiz,
I can now see connection established between Alpha and Zero. But from the Ratel I am not able to communicate with the Alpha end point. Am I missing something.

Zero logs.
I0601 16:10:22.539384 14 zero.go:417] Got connection request: cluster_info_only:true
I0601 16:10:22.540196 14 zero.go:435] Connected: cluster_info_only:true
I0601 16:10:22.542762 14 zero.go:417] Got connection request: addr:“Alpha0:7080”
I0601 16:10:22.543551 14 pool.go:160] CONNECTING to Alpha0:7080
W0601 16:10:22.547837 14 pool.go:254] Connection lost with Alpha0:7080. Error: rpc error: code = Unknown desc = No node has been set up yet
I0601 16:10:22.591418 14 zero.go:562] Connected: id:1 group_id:1 addr:“Alpha0:7080”
I0601 16:11:21.220375 14 zero.go:417] Got connection request: cluster_info_only:true
I0601 16:11:21.220788 14 zero.go:435] Connected: cluster_info_only:true
I0601 16:11:21.224273 14 zero.go:417] Got connection request: addr:“Alpha1:7081”
I0601 16:11:21.225011 14 pool.go:160] CONNECTING to Alpha1:7081
W0601 16:11:21.228448 14 pool.go:254] Connection lost with Alpha1:7081. Error: rpc error: code = Unknown desc = No node has been set up yet
I0601 16:11:21.264453 14 zero.go:562] Connected: id:2 group_id:2 addr:“Alpha1:7081”

Alpha Logs:

I0601 16:10:22.434960 14 worker.go:96] Worker listening at address: [::]:7080
I0601 16:10:22.436249 14 run.go:480] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0601 16:10:22.436691 14 run.go:481] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0601 16:10:22.436963 14 run.go:512] gRPC server started. Listening on port 9080
I0601 16:10:22.437315 14 run.go:513] HTTP server started. Listening on port 8080
I0601 16:10:22.535255 14 pool.go:160] CONNECTING to Zero:5080
I0601 16:10:22.592656 14 groups.go:135] Connected to group zero. Assigned group: 1
I0601 16:10:22.593300 14 groups.go:137] Raft Id after connection to Zero: 0x1
I0601 16:10:22.593646 14 pool.go:160] CONNECTING to Alpha0:7080
I0601 16:10:22.594930 14 draft.go:200] Node ID: 0x1 with GroupID: 1
I0601 16:10:22.595032 14 node.go:148] Setting raft.Config to: &{ID:1 peers: learners: ElectionTick:20 HeartbeatTick:1 Storage:0xc0231a45c0 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x282e510 DisableProposalForwarding:false}
I0601 16:10:22.596553 14 node.go:326] Group 1 found 0 entries
I0601 16:10:22.597198 14 draft.go:1567] New Node for group: 1
E0601 16:10:22.595985 14 storage.go:97] deleteRange failed with error: requested index is unavailable due to compaction, from: 0, until: 0
I0601 16:10:22.598389 14 log.go:34] 1 became follower at term 0
I0601 16:10:22.599413 14 log.go:34] newRaft 1 [peers: , term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I0601 16:10:22.599438 14 log.go:34] 1 became follower at term 1
I0601 16:10:22.599539 14 draft.go:147] Operation started with id: opRollup
I0601 16:10:22.599571 14 groups.go:155] Server is ready
I0601 16:10:22.599839 14 draft.go:962] Found Raft progress: 0
I0601 16:10:22.600011 14 groups.go:784] Got address of a Zero leader: Zero:5080
I0601 16:10:22.600234 14 groups.go:797] Starting a new membership stream receive from Zero:5080.
I0601 16:10:22.602737 14 groups.go:814] Received first state update from Zero: counter:5 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:“Alpha0:7080” > > > > zeros:<key:1 value:<id:1 addr:“Zero:5080” leader:true > > maxRaftId:1 cid:“b1527d5b-2323-444c-806a-35b9e55f2cab” license:<maxNodes:18446744073709551615 expiryTs:1593619719 enabled:true >

MichelDiz · June 1, 2020, 5:07pm

It says “Connected” - just ignore the ACL login.

rajesh · June 1, 2020, 6:44pm

But still I am not able to execute any command. The UI also shows lock symbol.
The health?all in 8080 (Alpha0) returns,
[
{
“status”: “unhealthy”,
“lastEcho”: 1591032233
},
{
“instance”: “alpha”,
“address”: “Alpha0:7080”,
“status”: “healthy”,
“group”: “0”,
“version”: “v20.03.1”,
“uptime”: 4182,
“lastEcho”: 1591036415
}
]

MichelDiz · June 1, 2020, 7:56pm

What browser are you using? if it is Safari. Ratel has issues with it. Try Chrome. Works fine here. Or maybe you are using some old version of Dgraph.

cc @paulftw

rajesh · June 1, 2020, 8:02pm

I am using Chrome with latest version of DGraph.

MichelDiz · June 1, 2020, 8:05pm

Are you starting Ratel locally or in Docker? can be a “CORS” issue.

rajesh · June 2, 2020, 1:10am

The Ratel is in the docker running in a Linux Host (NAS Drive) and I am trying to access the Ratel from my laptop Chrome browser.

MichelDiz · June 2, 2020, 1:17am

Do the following.

Press F12
Go to “Application”
Click in “Clear Storage”
Then click in “Clear Site Data”.

With this, we make sure it will start from scratch.
Now, open the ratel again. And on the page “Choose a version of the Ratel interface”. choose “Local Bundle”.

Or http://localhost:8000/?local

If the issue continues let me know.

PS. Also make sure the logs are okay in Dgraph’s instances. Check for health all the time you see the locks and errors and report them here.

paulftw · June 2, 2020, 11:47am

@rajesh good first step in troubleshooting connection issues is to just open alpha or zero URL in the browser.

On my test cluster, if I open the alpha URL (http://localhost:8080) in browser I see

Dgraph browser is available for running separately using the dgraph-ratel binary

For a zero server (http://localhost:6080) I get

404 page not found

However, http://localhost:6080/health and http://localhost:6080/state contain a fair bit more information.

If you are not able to see these plain text messages – some part of your setup is blocking the network access. May be firewall, ports not forwarded in docker, TLS (if you’ve enabled it), and so on.

If you are able to load those pages but Ratel gives an error - it’s either ACL features or something else.

Lock in Ratel is shown if it cannot run a sample query, this could well be caused by an unhealthy zero.
Quick test to check that is to use cURL to access the database e.g.:

curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d '
{ q(func: uid(1)) { uid } }'

paulftw · June 2, 2020, 11:55am

@rajesh @MichelDiz
scratch that.

Last screenshot posted in this thread is showing that Ratel is able to connect to the alpha, and gets responses back.
However, any query Rajesh is running fails due to

"Please retry again, server is not ready to accept requests"

This means browser, network connections and Ratel are fine, but the servers are not. If you post your docker-compose file we’ll be able to check for errors.

You can also try wiping out data directories/volumes (don’t do that if you have important data stored in your cluster)

rajesh · June 3, 2020, 10:22pm

Thanks @MichelDiz @paulftw.
I figured out the issue, for some reason the network “dgraph_default” was not accessible from my laptop. When I added the containers (Ratel and Alphas) into the default “Bridge” network I was able to access the Ratel and connect to the Alpha0 successfully.

Thanks again for your support

joaquin · June 3, 2020, 11:02pm

Were you using docker-compose to bring up a dgraph cluster and ratel? Or docker cli? That is interesting not being able to access the network. What is the host system.

rajesh · June 4, 2020, 3:00am

I was using the Docker CLI in the Terra Master (NAS drive, which has Linux OS) which was the Host system. I was accessing the dgraph dockers from my Windows 10 laptop. Not just the dgraph, but also other simple containers were also not accessible if not in host network. So I don’t think it has to do anything with dgraph. I also tried installing the dgraph containers in the windows 10 laptop and it worked perfectly fine.

joaquin · June 4, 2020, 6:04am

That’s awesome. Assuming Windows 10 Pro for Hyper-V with Docker Desktop? Are you using any particular client? We even have a dotnet client, though I have yet to tinker with it.

rajesh · June 4, 2020, 11:33pm

@joaquin yes Windows 10 Pro for Hyper-V. Haven’t yet used any client yet. Just have setup and started to learn dgraph.

thewisewolfHoro · June 23, 2020, 12:01pm

This is what I’m looking for!!!
It helps a lot!

Topic		Replies	Views
Issues starting up alpha server single host Dgraph	1	442	May 3, 2021
Dgraph cluster setup Dgraph dgraph , cluster , docker	2	526	March 9, 2023
Setting up cluster with docker-compose Dgraph kind:bug , docker	0	696	April 16, 2022
Alpha can't connect with zero. rpc error Users	6	1609	February 19, 2020
Alpha cannot see zero port when loading data through tutorial Users	4	1381	February 23, 2019

Failing to connect Alpha with Zero - Need help in setting the docker cluster setup

Related topics