Failed to install Dgraph HA

What I want to do

I want to install Dgraph HA.

What I did

I created a dev cluster by k3d by

k3d cluster create dev --config=dev-cluster-config.yaml

dev-cluster-config.yaml file:

apiVersion: k3d.io/v1alpha2
kind: Simple
kubeAPI:
  hostPort: "6440"
network: hm-network
ports:
  - port: 40000:80
    nodeFilters:
      - loadbalancer
options:
  k3s:
    extraServerArgs:
      - --no-deploy=traefik
      - --cluster-domain=dev.k8s-hongbomiao.com

I am trying to deploy Dgraph HA.

I installed Dgraph by

kubectl create namespace hm
kubectl apply --namespace=hm --filename=https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml

As you see my dgraph-alpha-0 has issue:

Here is my dgraph-alpha-0 log:

++ hostname -f
+ dgraph alpha --my=dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 --zero dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080,dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080,dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080
[Sentry] 2021/08/05 19:16:15 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 19:16:15 Integration installed: Environment
[Sentry] 2021/08/05 19:16:15 Integration installed: Modules
[Sentry] 2021/08/05 19:16:15 Integration installed: IgnoreErrors
[Sentry] 2021/08/05 19:16:16 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 19:16:16 Integration installed: Environment
[Sentry] 2021/08/05 19:16:16 Integration installed: Modules
[Sentry] 2021/08/05 19:16:16 Integration installed: IgnoreErrors
I0805 19:16:16.205382      19 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0805 19:16:16.396706      19 init.go:110] 

Dgraph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


I0805 19:16:16.396764      19 run.go:752] x.Config: {PortOffset:0 Limit:disallow-drop=false; txn-abort-after=5m; max-pending-queries=10000; query-edge=1000000; mutations-nquad=1000000; query-timeout=0ms; max-retries=-1; mutations=allow; normalize-node=10000 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
I0805 19:16:16.396828      19 run.go:753] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 ZeroAddr:[dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080 dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080 dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group= Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc0001cab50 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2021-08-05 19:16:15.798430129 +0000 UTC m=+0.265114410 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:token=; whitelist= EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I0805 19:16:16.396923      19 run.go:754] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I0805 19:16:16.397085      19 log.go:295] Found file: 1 First Index: 0
I0805 19:16:16.399098      19 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 0
I0805 19:16:16.399128      19 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x33e3080 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I0805 19:16:16.425051      19 log.go:34] All 0 tables opened in 0s
I0805 19:16:16.427041      19 log.go:34] Discard stats nextEmptySlot: 0
I0805 19:16:16.427190      19 log.go:34] Set nextTxnTs to 0
I0805 19:16:16.431935      19 groups.go:99] Current Raft Id: 0x0
I0805 19:16:16.432064      19 worker.go:114] Worker listening at address: [::]:7080
I0805 19:16:16.432003      19 groups.go:115] Sending member request to Zero: addr:"dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" 
I0805 19:16:16.434244      19 run.go:565] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0805 19:16:16.434300      19 run.go:566] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0805 19:16:16.434320      19 run.go:593] gRPC server started.  Listening on port 9080
I0805 19:16:16.434330      19 run.go:594] HTTP server started.  Listening on port 8080
E0805 19:16:16.434370      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0805 19:16:16.534017      19 pool.go:162] CONNECTING to dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080
W0805 19:16:16.537177      19 pool.go:267] Connection lost with dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-0.dgraph-zero.hm.svc.cluster.local: no such host"
I0805 19:16:16.737992      19 pool.go:162] CONNECTING to dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080
W0805 19:16:16.740397      19 pool.go:267] Connection lost with dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.hm.svc.cluster.local: no such host"
I0805 19:16:17.141647      19 pool.go:162] CONNECTING to dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080
W0805 19:16:17.144077      19 pool.go:267] Connection lost with dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-2.dgraph-zero.hm.svc.cluster.local: no such host"
E0805 19:16:17.435876      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:18.436001      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:19.436210      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:20.437387      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:21.435964      19 admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E0805 19:16:21.438010      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:22.438593      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:23.439503      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:24.440648      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:25.441728      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 19:16:26.436907      19 admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.

Note

transport: Error while dialing dial tcp: lookup dgraph-zero-0.dgraph-zero.hm.svc.cluster.local: no such host
transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.hm.svc.cluster.local: no such host
transport: Error while dialing dial tcp: lookup dgraph-zero-2.dgraph-zero.hm.svc.cluster.local: no such host

inside.

However, they are actually exist

➜ kubectl get pod --context=k3d-dev -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   metrics-server-86cbb8457f-8jzsd           1/1     Running   0          11m
kube-system   local-path-provisioner-5ff76fc89d-t29w8   1/1     Running   0          11m
kube-system   coredns-7448499f4d-w9htt                  1/1     Running   0          11m
hm            dgraph-zero-0                             1/1     Running   0          4m26s
hm            dgraph-zero-1                             1/1     Running   0          3m13s
hm            dgraph-zero-2                             1/1     Running   0          2m50s
hm            dgraph-alpha-0                            0/1     Running   2          4m26s
➜ kubectl get svc --context=k3d-dev -A
NAMESPACE     NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes            ClusterIP   10.43.0.1      <none>        443/TCP                  10m
kube-system   kube-dns              ClusterIP   10.43.0.10     <none>        53/UDP,53/TCP,9153/TCP   10m
kube-system   metrics-server        ClusterIP   10.43.29.122   <none>        443/TCP                  10m
hm            dgraph-zero-public    ClusterIP   10.43.9.50     <none>        5080/TCP,6080/TCP        3m30s
hm            dgraph-alpha-public   ClusterIP   10.43.127.25   <none>        8080/TCP,9080/TCP        3m30s
hm            dgraph-zero           ClusterIP   None           <none>        5080/TCP                 3m30s
hm            dgraph-alpha          ClusterIP   None           <none>        7080/TCP                 3m30s

My other services in this cluster can talk to each other without any issue.

And I found if I remove --cluster-domain=dev.k8s-hongbomiao.com or change it to --cluster-domain=cluster.local when I create the cluster by k3d, Dgraph HA can be installed without any issue.

However, I need set the cluster domain to do some cluster related work.

How can I install Dgraph HA when there is a cluster domain? Thanks

UPDATE:

I found this happens to Dgraph single server version (Dgraph Alpha and Dgraph are in same pod) too, when I installed by

kubectl create namespace hm
kubectl apply --namespace=hm --filename=https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml
++ hostname -f
+ dgraph alpha --my=dgraph-0.dgraph.hm.svc.dev.k8s-hongbomiao.com:7080 --zero dgraph-0.dgraph.hm.svc.cluster.local:5080
[Sentry] 2021/08/05 22:38:20 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 22:38:20 Integration installed: Environment
[Sentry] 2021/08/05 22:38:20 Integration installed: Modules
[Sentry] 2021/08/05 22:38:20 Integration installed: IgnoreErrors
[Sentry] 2021/08/05 22:38:21 Integration installed: ContextifyFrames
[Sentry] 2021/08/05 22:38:21 Integration installed: Environment
[Sentry] 2021/08/05 22:38:21 Integration installed: Modules
[Sentry] 2021/08/05 22:38:21 Integration installed: IgnoreErrors
I0805 22:38:21.926193      19 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0805 22:38:22.128588      19 init.go:110] 

Dgraph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


I0805 22:38:22.128647      19 run.go:752] x.Config: {PortOffset:0 Limit:mutations=allow; query-edge=1000000; disallow-drop=false; query-timeout=0ms; txn-abort-after=5m; max-pending-queries=10000; normalize-node=10000; mutations-nquad=1000000; max-retries=-1 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
I0805 22:38:22.128782      19 run.go:753] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:dgraph-0.dgraph.hm.svc.dev.k8s-hongbomiao.com:7080 ZeroAddr:[dgraph-0.dgraph.hm.svc.cluster.local:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group= Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc0003961f0 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2021-08-05 22:38:21.453562539 +0000 UTC m=+0.311585012 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:token=; whitelist= EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I0805 22:38:22.129055      19 run.go:754] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I0805 22:38:22.130677      19 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 0
I0805 22:38:22.130783      19 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x33e3080 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I0805 22:38:22.145325      19 log.go:34] All 0 tables opened in 0s
I0805 22:38:22.147849      19 log.go:34] Discard stats nextEmptySlot: 0
I0805 22:38:22.147915      19 log.go:34] Set nextTxnTs to 0
I0805 22:38:22.150591      19 groups.go:99] Current Raft Id: 0x0
I0805 22:38:22.150605      19 worker.go:114] Worker listening at address: [::]:7080
I0805 22:38:22.150643      19 groups.go:115] Sending member request to Zero: addr:"dgraph-0.dgraph.hm.svc.dev.k8s-hongbomiao.com:7080" 
I0805 22:38:22.153357      19 run.go:565] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0805 22:38:22.153486      19 run.go:566] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0805 22:38:22.153549      19 run.go:593] gRPC server started.  Listening on port 9080
I0805 22:38:22.153584      19 run.go:594] HTTP server started.  Listening on port 8080
E0805 22:38:22.153632      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0805 22:38:22.251622      19 pool.go:162] CONNECTING to dgraph-0.dgraph.hm.svc.cluster.local:5080
W0805 22:38:22.472690      19 pool.go:267] Connection lost with dgraph-0.dgraph.hm.svc.cluster.local:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-0.dgraph.hm.svc.cluster.local: no such host"
E0805 22:38:23.154976      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 22:38:24.155661      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0805 22:38:25.156576      19 groups.go:1181] Error during SubscribeForUpdates for prefix 

That looks like a k8s issue. Make sure the URL is valid. If k8s propagates correctly the addresses(it has an internal DNS). You have to choose or use the default values and not touch the YAML. Or, if the address is valid, edit the YAML with the valid URL/SVC.

Also, make sure to do a full cleanup(start from scratch) cuz some configs are written in stone in Dgraph. Always do a checkup list before going forward or changing any config.

Thanks @MichelDiz !
I did try a full cleanup (start from scratch), which is the way I posted above.

And if I add my other services in this cluster, they can all talk to each other by
my-service.hm or my-service.hm.svc.cluster.local.

dgraph-zero-0.dgraph-zero.hm.svc.cluster.local does look correct to me, but I am sure why it says “no such host”.

The YAML has a script that verifies the hostname. If there’s no hostname, it will try to use the default values from SVC. Which in general is based on the service name itself. See dgraph/dgraph-ha.yaml at f181a70302c771a2bdcab0f3faf082402c2bec05 · dgraph-io/dgraph · GitHub

Thanks @MichelDiz !
I found the issue. I opened the pull request at fix(k8s): allow to deploy to cluster with domain name by Hongbo-Miao ¡ Pull Request #7976 ¡ dgraph-io/dgraph ¡ GitHub

Sorry, but I don’t think that PR is valid for other users. It is as it is due general usage. You can use in your end, but not valid for a merge.

I feel after removing the hard coded .svc.cluster.local, it can be deployed to any cluster with or without cluster name.
Doesn’t the PR make it be more general?
Sorry, I am not an expert of Kubernetes. Would you mind explaining why? Thanks! :grinning:

The address svc.cluster.local is part of the K8s internal DNS. It should work in any k8s implementation. This naming in the DNS server wasn’t changed since then.

1 Like

Thanks @MichelDiz for the explaination. I agree it is by default which is why most people skipped .svc.cluster.local.

However, just like namespace we can change, which refers to POD_NAMESPACE in the yaml file.
If the Dgraph team doesn’t want to use the solution skipping the hard set cluster name, and want to explicitly set, can we support cluster name, maybe something like POD_CLUSTER if that makes more sense to you?
Does that make sense? Thanks!

BTW, if you check the log above, dgraph-alpha actually picked correct cluster name, which in this case, it is dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com.
However, the issue line, which hard sets to .svc.cluster.local, causes the issue connecting to dgraph-zero.
They are not consistent.

++ hostname -f
+ dgraph alpha --my=dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 --zero dgraph-zero-0.dgraph-zero.hm.svc.cluster.local:5080,dgraph-zero-1.dgraph-zero.hm.svc.cluster.local:5080,dgraph-zero-2.dgraph-zero.hm.svc.cluster.local:5080
[Sentry] 2021/08/05 19:16:15 Integration installed: ContextifyFrames

I guess when you have a custom host address you should edit the YAML. And the general user should just use as it is.

@dmai do you have anything to add?

I actually tried many Kubernetes modules in my repo, however, I only found Dgraph has issue when I set cluster name. :sweat_smile:

Sorry, Dgraph alpha still has issue. However, now all 3 dgraph-alpha can be started.

Will report back if find the solution.

dgraph-alpha-0 log:

++ hostname -f
+ dgraph alpha --my=dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 --zero dgraph-zero-0.dgraph-zero.hm:5080,dgraph-zero-1.dgraph-zero.hm:5080,dgraph-zero-2.dgraph-zero.hm:5080
[Sentry] 2021/08/09 23:00:43 Integration installed: ContextifyFrames
[Sentry] 2021/08/09 23:00:43 Integration installed: Environment
[Sentry] 2021/08/09 23:00:43 Integration installed: Modules
[Sentry] 2021/08/09 23:00:43 Integration installed: IgnoreErrors
[Sentry] 2021/08/09 23:00:44 Integration installed: ContextifyFrames
[Sentry] 2021/08/09 23:00:44 Integration installed: Environment
[Sentry] 2021/08/09 23:00:44 Integration installed: Modules
[Sentry] 2021/08/09 23:00:44 Integration installed: IgnoreErrors
I0809 23:00:44.958922      19 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry "sentry=false;" flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0809 23:00:45.279089      19 init.go:110] 

Dgraph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


I0809 23:00:45.279152      19 run.go:752] x.Config: {PortOffset:0 Limit:mutations-nquad=1000000; txn-abort-after=5m; max-pending-queries=10000; mutations=allow; query-edge=1000000; normalize-node=10000; disallow-drop=false; query-timeout=0ms; max-retries=-1 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
I0809 23:00:45.279249      19 run.go:753] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:datadog=; ratio=0.01; jaeger= MyAddr:dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080 ZeroAddr:[dgraph-zero-0.dgraph-zero.hm:5080 dgraph-zero-1.dgraph-zero.hm:5080 dgraph-zero-2.dgraph-zero.hm:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group= Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc0002160f0 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2021-08-09 23:00:44.266820644 +0000 UTC m=+0.391799266 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:token=; whitelist= EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I0809 23:00:45.279388      19 run.go:754] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I0809 23:00:45.281449      19 log.go:295] Found file: 1 First Index: 1
I0809 23:00:45.282451      19 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 16
I0809 23:00:45.282493      19 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x33e3080 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I0809 23:00:45.296802      19 log.go:34] All 1 tables opened in 4ms
I0809 23:00:45.299114      19 log.go:34] Discard stats nextEmptySlot: 0
I0809 23:00:45.299278      19 log.go:34] Set nextTxnTs to 1
I0809 23:00:45.303718      19 run.go:565] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0809 23:00:45.303839      19 run.go:566] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0809 23:00:45.303858      19 run.go:593] gRPC server started.  Listening on port 9080
I0809 23:00:45.303909      19 run.go:594] HTTP server started.  Listening on port 8080
I0809 23:00:45.303992      19 groups.go:99] Current Raft Id: 0x1
I0809 23:00:45.304003      19 groups.go:115] Sending member request to Zero: id:1 addr:"dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" 
I0809 23:00:45.304262      19 worker.go:114] Worker listening at address: [::]:7080
E0809 23:00:45.304329      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0809 23:00:45.405381      19 pool.go:162] CONNECTING to dgraph-zero-0.dgraph-zero.hm:5080
E0809 23:00:46.306093      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
W0809 23:00:46.445283      19 pool.go:267] Connection lost with dgraph-zero-0.dgraph-zero.hm:5080. Error: rpc error: code = Unavailable desc = HTTP Logical service in fail-fast
I0809 23:00:46.646293      19 pool.go:162] CONNECTING to dgraph-zero-1.dgraph-zero.hm:5080
W0809 23:00:46.881360      19 pool.go:267] Connection lost with dgraph-zero-1.dgraph-zero.hm:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.hm on 10.43.0.10:53: no such host"
I0809 23:00:47.282046      19 pool.go:162] CONNECTING to dgraph-zero-2.dgraph-zero.hm:5080
E0809 23:00:47.307009      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
W0809 23:00:47.552877      19 pool.go:267] Connection lost with dgraph-zero-2.dgraph-zero.hm:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-2.dgraph-zero.hm on 10.43.0.10:53: no such host"
E0809 23:00:48.307553      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:49.309011      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:50.305308      19 admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E0809 23:00:50.310089      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:51.310943      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:52.361566      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:53.362674      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:54.363633      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:55.306294      19 admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E0809 23:00:55.364284      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:56.364588      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:57.365577      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:58.365767      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:00:59.367004      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:00.306619      19 admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E0809 23:01:00.367739      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:01.368506      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:02.407731      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:03.408807      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:04.409500      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:05.346306      19 admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E0809 23:01:05.411031      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:06.411502      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:07.412141      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0809 23:01:08.412791      19 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0809 23:01:08.431202      19 pool.go:162] CONNECTING to dgraph-zero-0.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080
I0809 23:01:08.439346      19 groups.go:134] Connected to group zero. Assigned group: 0
I0809 23:01:08.439389      19 groups.go:136] Raft Id after connection to Zero: 0x1
I0809 23:01:08.439470      19 pool.go:162] CONNECTING to dgraph-alpha-2.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080
I0809 23:01:08.439492      19 pool.go:162] CONNECTING to dgraph-alpha-1.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080
I0809 23:01:08.439640      19 pool.go:162] CONNECTING to dgraph-zero-1.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080
I0809 23:01:08.439775      19 pool.go:162] CONNECTING to dgraph-zero-2.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080
I0809 23:01:08.439804      19 draft.go:270] Node ID: 0x1 with GroupID: 1
I0809 23:01:08.439841      19 draft.go:279] RaftContext: id:1 group:1 addr:"dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" 
I0809 23:01:08.439906      19 node.go:152] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc000096240 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x33e3080 DisableProposalForwarding:false}
I0809 23:01:08.440670      19 node.go:321] Found hardstate: {Term:3 Vote:2 Commit:16 XXX_unrecognized:[]}
I0809 23:01:08.440781      19 node.go:326] Group 1 found 16 entries
I0809 23:01:08.440788      19 draft.go:1827] Restarting node for group: 1
I0809 23:01:08.440803      19 log.go:34] 1 became follower at term 3
I0809 23:01:08.440818      19 log.go:34] newRaft 1 [peers: [], term: 3, commit: 16, applied: 0, lastindex: 16, lastterm: 3]
I0809 23:01:08.440858      19 draft.go:211] Operation started with id: opRollup
I0809 23:01:08.440936      19 draft.go:1208] Found Raft progress: 0
I0809 23:01:08.441567      19 groups.go:820] Got address of a Zero leader: dgraph-zero-0.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080
I0809 23:01:08.442021      19 groups.go:834] Starting a new membership stream receive from dgraph-zero-0.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080.
I0809 23:01:08.441750      19 node.go:189] Setting conf state to nodes:1 
I0809 23:01:08.442693      19 node.go:189] Setting conf state to nodes:1 nodes:2 
I0809 23:01:08.442786      19 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:3 
I0809 23:01:08.446800      19 groups.go:851] Received first state update from Zero: counter:17 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" last_update:1628549457 > > members:<key:2 value:<id:2 group_id:1 addr:"dgraph-alpha-1.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" leader:true last_update:1628549908 > > members:<key:3 value:<id:3 group_id:1 addr:"dgraph-alpha-2.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080" > > tablets:<key:"\000\000\000\000\000\000\000\000dgraph.drop.op" value:<group_id:1 predicate:"\000\000\000\000\000\000\000\000dgraph.drop.op" > > tablets:<key:"\000\000\000\000\000\000\000\000dgraph.graphql.p_query" value:<group_id:1 predicate:"\000\000\000\000\000\000\000\000dgraph.graphql.p_query" > > tablets:<key:"\000\000\000\000\000\000\000\000dgraph.graphql.schema" value:<group_id:1 predicate:"\000\000\000\000\000\000\000\000dgraph.graphql.schema" > > tablets:<key:"\000\000\000\000\000\000\000\000dgraph.graphql.xid" value:<group_id:1 predicate:"\000\000\000\000\000\000\000\000dgraph.graphql.xid" > > tablets:<key:"\000\000\000\000\000\000\000\000dgraph.type" value:<group_id:1 predicate:"\000\000\000\000\000\000\000\000dgraph.type" > > checksum:12696972231616318625 > > zeros:<key:1 value:<id:1 addr:"dgraph-zero-0.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080" leader:true > > zeros:<key:2 value:<id:2 addr:"dgraph-zero-1.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080" > > zeros:<key:3 value:<id:3 addr:"dgraph-zero-2.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080" > > maxTxnTs:10000 maxRaftId:3 cid:"21d1222e-2cf3-44aa-b99d-2cff3bd13cbc" license:<maxNodes:18446744073709551615 expiryTs:1631141459 enabled:true > 
W0809 23:01:08.449424      19 pool.go:267] Connection lost with dgraph-alpha-1.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-alpha-1.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com on 10.43.0.10:53: no such host"
W0809 23:01:08.449431      19 pool.go:267] Connection lost with dgraph-alpha-2.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-alpha-2.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com on 10.43.0.10:53: no such host"
W0809 23:01:08.449499      19 pool.go:267] Connection lost with dgraph-zero-2.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-2.dgraph-zero.hm.svc.dev.k8s-hongbomiao.com on 10.43.0.10:53: no such host"
I0809 23:01:09.413380      19 pool.go:162] CONNECTING to dgraph-alpha-0.dgraph-alpha.hm.svc.dev.k8s-hongbomiao.com:7080
I0809 23:01:09.442269      19 groups.go:166] Server is ready
I0809 23:01:09.442311      19 access_ee.go:408] ResetAcl closed
I0809 23:01:09.442315      19 access_ee.go:318] RefreshAcls closed
I0809 23:01:10.441029      19 log.go:34] 1 is starting a new election at term 3
I0809 23:01:10.441084      19 log.go:34] 1 became pre-candidate at term 3
I0809 23:01:10.441089      19 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0809 23:01:10.441115      19 log.go:34] 1 [logterm: 3, index: 16] sent MsgPreVote request to 2 at term 3
I0809 23:01:10.441121      19 log.go:34] 1 [logterm: 3, index: 16] sent MsgPreVote request to 3 at term 3
W0809 23:01:11.441872      19 node.go:420] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0809 23:01:11.441928      19 node.go:420] Unable to send message to peer: 0x3. Error: Unhealthy connection
➜ kubectl get svc -A
NAMESPACE              NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                         AGE
hm                     dgraph-alpha                         ClusterIP      None            <none>         7080/TCP                        7m15s
hm                     dgraph-alpha-public                  ClusterIP      10.43.77.89     <none>         8080/TCP,9080/TCP               7m15s
hm                     dgraph-zero                          ClusterIP      None            <none>         5080/TCP                        7m15s
hm                     dgraph-zero-public                   ClusterIP      10.43.68.220    <none>         5080/TCP,6080/TCP               7m15s
...
➜ kubectl get pod -A
NAMESPACE              NAME                                                 READY   STATUS      RESTARTS   AGE
hm                     dgraph-zero-0                                        1/1     Running     0          8m
hm                     dgraph-alpha-0                                       1/1     Running     0          8m
hm                     dgraph-zero-1                                        1/1     Running     0          7m39s
hm                     dgraph-alpha-1                                       1/1     Running     0          7m29s
hm                     dgraph-zero-2                                        1/1     Running     0          7m19s
hm                     dgraph-alpha-2                                       1/1     Running     0          7m9s
...

Tried again today with @dmai with same patch (the pull request I opened) I did early, everything works. :sweat_smile:
Thanks @MichelDiz and @dmai !

2 Likes

Your PR is merged now. Thanks @Hongbo-Miao

1 Like