Propose conf Type:EntryConfChange Data

What I want to do

I want to replace an existing alpha node to join the cluster.

What I did

After I removed alpha4, I deployed alpha on a new physical machine node to add it to the cluster, but it failed to start.


alpha-4:

I0513 09:44:19.330346      21 run.go:696] x.Config: {PortOffset:0 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000 MutationsNQuadLimit:1000000 PollInterval:1s GraphqlExtension:true GraphqlDebug:false GraphqlLambdaUrl:}
I0513 09:44:19.330378      21 run.go:697] x.WorkerConfig: {TmpDir:t ExportPath:export NumPendingProposals:1024 Tracing:0.01 MyAddr:dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080 ZeroAddr:[dgraph-zero-1.dgraph-zero.crm-test.svc.cluster.local:5080 dgraph-zero-0.dgraph-zero.crm-test.svc.cluster.local:5080 dgraph-zero-2.dgraph-zero.crm-test.svc.cluster.local:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> RaftId:0 WhiteListedIPRanges:[] MaxRetries:-1 StrictMutations:false AclEnabled:false AbortOlderThan:5m0s SnapshotAfter:10000 ProposedGroupId:0 StartTime:2021-05-13 09:44:18.79177103 +0000 UTC m=+0.012082017 LudicrousMode:false LudicrousConcurrency:2000 EncryptionKey:**** LogRequest:0 HardSync:false}
I0513 09:44:19.330421      21 run.go:698] worker.Config: {PostingDir:p PostingDirCompression:1 PostingDirCompressionLevel:0 WALDir:w MutationsMode:0 AuthToken: PBlockCacheSize:3473303142 PIndexCacheSize:1870240153 WalCache:0 HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35,0 CacheMb:0}
I0513 09:44:19.331456      21 storage.go:132] Init Raft Storage with snap: 0, first: 1, last: 0
I0513 09:44:19.331509      21 server_state.go:76] Setting Posting Dir Compression Level: 0
I0513 09:44:19.331523      21 server_state.go:120] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x2e0fef8 Compression:1 InMemory:false MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:3473303142 IndexCacheSize:1870240153 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0513 09:44:19.338546      21 log.go:34] All 0 tables opened in 0s
I0513 09:44:19.338956      21 log.go:34] Discard stats nextEmptySlot: 0
I0513 09:44:19.338988      21 log.go:34] Set nextTxnTs to 0
E0513 09:44:19.339531      21 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\vdgraph.cors\x00": Unable to find any servers for group: 1. closer err: <nil>
I0513 09:44:19.339558      21 groups.go:99] Current Raft Id: 0x0
I0513 09:44:19.340101      21 groups.go:643] No healthy Zero leader found. Trying to find a Zero leader...
I0513 09:44:19.339655      21 worker.go:104] Worker listening at address: [::]:7080
E0513 09:44:19.340617      21 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0513 09:44:19.340626      21 run.go:519] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0513 09:44:19.340646      21 run.go:520] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0513 09:44:19.340672      21 run.go:552] gRPC server started.  Listening on port 9080
I0513 09:44:19.340681      21 run.go:553] HTTP server started.  Listening on port 8080
I0513 09:44:19.340699      21 server.go:114] Starting telemetry data collection for alpha...
2021/05/13 09:44:19 Endpoint has been deprecated. Please use CollectorEndpoint instead.
I0513 09:44:19.440304      21 pool.go:162] CONNECTING to dgraph-zero-1.dgraph-zero.crm-test.svc.cluster.local:5080
I0513 09:44:19.483462      21 groups.go:689] Found connection to leader: dgraph-zero-1.dgraph-zero.crm-test.svc.cluster.local:5080
I0513 09:44:19.495083      21 groups.go:127] Connected to group zero. Assigned group: 2
I0513 09:44:19.495098      21 groups.go:129] Raft Id after connection to Zero: 0xc
I0513 09:44:19.495151      21 pool.go:162] CONNECTING to dgraph-alpha-0.dgraph-alpha.crm-test.svc.cluster.local:7080
I0513 09:44:19.495176      21 pool.go:162] CONNECTING to dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080
I0513 09:44:19.495197      21 pool.go:162] CONNECTING to dgraph-alpha-1.dgraph-alpha.crm-test.svc.cluster.local:7080
I0513 09:44:19.495231      21 pool.go:162] CONNECTING to dgraph-alpha-3.dgraph-alpha.crm-test.svc.cluster.local:7080
I0513 09:44:19.495255      21 pool.go:162] CONNECTING to dgraph-zero-0.dgraph-zero.crm-test.svc.cluster.local:5080
I0513 09:44:19.495282      21 pool.go:162] CONNECTING to dgraph-zero-2.dgraph-zero.crm-test.svc.cluster.local:5080
I0513 09:44:19.495296      21 draft.go:230] Node ID: 0xc with GroupID: 2
I0513 09:44:19.495361      21 node.go:152] Setting raft.Config to: &{ID:12 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc000430230 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2e0fef8 DisableProposalForwarding:false}
I0513 09:44:19.495444      21 node.go:326] Group 2 found 0 entries
I0513 09:44:19.495453      21 draft.go:1650] Calling IsPeer
I0513 09:44:19.537567      21 draft.go:1655] Done with IsPeer call
I0513 09:44:19.537581      21 draft.go:1712] New Node for group: 2
I0513 09:44:19.537589      21 draft.go:1719] Trying to join peers.
I0513 09:44:19.537596      21 draft.go:1633] Calling JoinCluster via leader: dgraph-alpha-3.dgraph-alpha.crm-test.svc.cluster.local:7080
I0513 09:44:24.340757      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0513 09:44:29.340894      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0513 09:44:34.341029      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.

alpha-3:

I0513 09:52:42.688988      20 node.go:586] Trying to add 0xb to cluster. Addr: dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080
I0513 09:52:42.688997      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\232\277\232\324\266\375\237\341N\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.688999      20 node.go:587] Current confstate at 0x8: nodes:8 nodes:9
I0513 09:52:42.689029      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\361\267\366\223\235\356\365\352\253\001\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.689061      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\341\333\324\331\205\352\263\312\021\020\000\030\013\"H\t\013\000\000\000\000\000\000\000\020\002\032;dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.692147      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\305\363\272\330\203\353\233\313r\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.705148      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\254\227\372\367\345\216\242\377@\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.706212      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\300\261\223\260\306\346\233\252\207\001\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.714992      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\201\335\365\257\353\363\237\214\344\001\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.715021      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\326\312\304\254\202\201\327\217\234\001\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.732274      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\375\225\360\372\213\310\264\3101\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.732303      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\271\226\305\266\244\207\266\273+\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.732319      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\317\303\201\230\346\365\312\206Y\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.734529      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\371\275\361\365\301\373\265\242M\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]
I0513 09:52:42.743330      20 log.go:34] propose conf Type:EntryConfChange Data:"\010\375\376\271\202\205\237\367\226\254\001\020\001\030\t"  ignored since pending unapplied configuration [index 8222686, applied 8222512]

Dgraph metadata

dgraph version
Dgraph version   : v20.11.2
Dgraph codename  : tchalla-2
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : true

Have you removed it via admin remove? if so, you can’t use the same Raft ID.

It is indeed removed, but the removed raft id is 9, now it is 12. Checking the status information of zero, 12 has indeed not been used.

"removed":[{"id":"3","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1615991006","clusterInfoOnly":false,"forceGroupId":false},{"id":"4","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},{"id":"5","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},{"id":"1","groupId":0,"addr":"dgraph-zero-0.dgraph-zero.crm-test.svc.cluster.local:5080","leader":true,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},{"id":"1","groupId":1,"addr":"dgraph-alpha-0.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1615990975","clusterInfoOnly":false,"forceGroupId":false},{"id":"7","groupId":1,"addr":"dgraph-alpha-0.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1620716707","clusterInfoOnly":false,"forceGroupId":false},{"id":"9","groupId":2,"addr":"dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1620719808","clusterInfoOnly":false,"forceGroupId":false},{"id":"11","groupId":2,"addr":"dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false}]

Hi, @MichelDiz @pawan
Excuse me, do you have any suggestions to solve the current situation?

Let me ping more guys. @graphql and @ibrahim - Do you have any idea what could be happening?

@musiciansLyf are you using multi-tenancy?

Thank you very much.

No, the current version should not support it yet.

My doubt: Is it true that when there are only two alphas in group2, after zero removes one of the alphas, the alpha-3 in group2 is not perceptible, resulting in the original raft member 0x9 being sent a message but not received The reply caused the subsequent request to be ignored.

alpha-3 log:

I0514 08:01:06.482733      22 node.go:586] Trying to add 0xc to cluster. Addr: dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080
I0514 08:01:06.482749      22 node.go:587] Current confstate at 0x8: nodes:8 nodes:9
I0514 08:01:06.777035      22 log.go:34] 8 is starting a new election at term 29
I0514 08:01:06.777059      22 log.go:34] 8 became pre-candidate at term 29
I0514 08:01:06.777064      22 log.go:34] 8 received MsgPreVoteResp from 8 at term 29
I0514 08:01:06.777079      22 log.go:34] 8 [logterm: 29, index: 62573422] sent MsgPreVote request to 9 at term 29
I0514 08:01:06.777126      22 node.go:258] RaftComm: [0x8] Sending message of type MsgPreVote to 0x9
I0514 08:01:09.482903      22 node.go:586] Trying to add 0xc to cluster. Addr: dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080
I0514 08:01:09.482930      22 node.go:587] Current confstate at 0x8: nodes:8 nodes:9
I0514 08:01:10.177031      22 log.go:34] 8 is starting a new election at term 29
I0514 08:01:10.177051      22 log.go:34] 8 became pre-candidate at term 29
I0514 08:01:10.177056      22 log.go:34] 8 received MsgPreVoteResp from 8 at term 29
I0514 08:01:10.177078      22 log.go:34] 8 [logterm: 29, index: 62573422] sent MsgPreVote request to 9 at term 29
I0514 08:01:10.177119      22 node.go:258] RaftComm: [0x8] Sending message of type MsgPreVote to 0x9
I0514 08:01:12.483026      22 node.go:586] Trying to add 0xc to cluster. Addr: dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080
I0514 08:01:12.483047      22 node.go:587] Current confstate at 0x8: nodes:8 nodes:9
I0514 08:01:13.577030      22 log.go:34] 8 is starting a new election at term 29
I0514 08:01:13.577054      22 log.go:34] 8 became pre-candidate at term 29
I0514 08:01:13.577059      22 log.go:34] 8 received MsgPreVoteResp from 8 at term 29
I0514 08:01:13.577082      22 log.go:34] 8 [logterm: 29, index: 62573422] sent MsgPreVote request to 9 at term 29
I0514 08:01:13.577132      22 node.go:258] RaftComm: [0x8] Sending message of type MsgPreVote to 0x9
W0514 08:01:14.577278      22 node.go:420] Unable to send message to peer: 0x9. Error: Do not have address of peer 0x9

alpha-4 log:

I0514 08:07:40.271103      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:07:45.271590      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:07:50.271731      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:07:55.271872      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:00.272015      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:05.272163      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:10.272317      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:15.272403      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:20.272545      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:25.272698      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:30.272845      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:35.272985      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:40.273123      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:45.273258      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
I0514 08:08:50.273398      21 admin.go:686] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.

The above is the latest alpha-3 and alpha4 log information. Alpha-3 will also continue to send election messages to the old 0x9, because 0x9 has been removed by zero, resulting in election failure, so group2 currently does not have a leader, and query and mutation services are still unavailable. And because there is no timeout limit for joining the cluster, the new alpha-4 has not yet tried to join the cluster, but it still fails.

After the alpha-3 pod restarts, it still keeps communicating with 0x9 (the removed alpha-4).

At present, the cluster cannot be recovered. I cleaned up the data and redeployed the cluster once. When I found the amount of empty data, the time taken to view the query and mutation process from the retal interface has changed from more than 4ms to more than 20ms. Any suggestions to fix。

@musiciansLyf most probably, the older node was not removed correctly and the new node couldn’t join the cluster.
@dmai do you see any reason why this might have failed?