Issues with Dgraph running in Kubernetes (K8 Loadbalancing?)

Hi. I have setup Dgraph in Kubernetes (GKE
1.18.6-gke.4801) using the helm chart and there was no issues with the deployment and all pods are healthy.

All pods and services are running and healthy. No errors in the logs.

But, when I try to port forward the alpha service and try to update the schema or drop all data by hitting either the /alter or /admin/schema endpoint, the results are random (since K8s service is load balancing to different alpha pods underneath).

For eg.

Hitting /admin/schema thrice without changing any payload or URL, I get different results:

Dropping data using /alter four times without changing any payload or URL, I get different results:

May I know if this is a bug or am I doing something wrong? I would expect a consistent result even if the service is load balancing between pods.

Also, I am not sure what these mean in this context Only leader can decide to commit or abort since I am just hitting the service and the service is deciding which pod to hit.

I also get this in the logs: Error while retrieving timestamps: rpc error: code = Unknown desc = Assigning IDs is only allowed on leader. with delay: 10ms. Will retry... and it gets okay automatically

I also noticed these issues which might be because of this as well:

I don’t know if this info is relevant but I am running Dgraph on a cluster having Linkerd as the service mesh and restricted PSP setup.

This is how my values file look:

image: &image
  registry: docker.io
  repository: dgraph/dgraph
  tag: v20.07.1
  pullPolicy: IfNotPresent
  # pullSecrets:
  #   - myRegistryKeySecretName
  debug: false

zero:
  name: zero
  metrics:
    enabled: true
  monitorLabel: zero-dgraph-io
  ## StatefulSet controller supports automated updates. There are two valid update strategies: RollingUpdate and OnDelete
  ## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets
  ##
  updateStrategy: RollingUpdate

  ## Partition update strategy
  ## https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions
  ##
  # rollingUpdatePartition:

  ## StatefulSet controller supports relax its ordering guarantees while preserving its uniqueness and identity guarantees. There are two valid pod management policies: OrderedReady and Parallel
  ## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy
  ##
  podManagementPolicy: OrderedReady

  ## Number of dgraph zero pods
  ##
  replicaCount: 3

  ## Max number of replicas per data shard.
  ## i.e., the max number of Dgraph Alpha instances per group (shard).
  ##
  shardReplicaCount: 5

  ## zero server pod termination grace period
  ##
  terminationGracePeriodSeconds: 60

  ## Hard means that by default pods will only be scheduled if there are enough nodes for them
  ## and that they will never end up on the same node. Setting this to soft will do this "best effort"
  antiAffinity: soft

  ## By default this will make sure two pods don't end up on the same node
  ## Changing this to a region would allow you to spread pods across regions
  podAntiAffinitytopologyKey: "kubernetes.io/hostname"

  ## This is the node affinity settings as defined in
  ## https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
  nodeAffinity: {}

  ## Extra environment variables which will be appended to the env: definition for the container.
  extraEnvs: []

  ## Configuration file for dgraph zero used as an alternative to command-line options
  ## Ref: https://dgraph.io/docs/deploy/#config
  configFile:
    config.toml: |
      whitelist = '10.0.0.0/8,172.0.0.0/8,192.168.0.0/16'
      lru_mb = 2048

  ## Kubernetes configuration
  ## For minikube, set this to NodePort, elsewhere use LoadBalancer
  ##
  service:
    type: ClusterIP
    annotations: {}
    ## StatefulSet pods will need to have addresses published in order to
    ## communicate to each other in order to enter a ready state.
    publishNotReadyAddresses: true

  ## dgraph Pod Security Context
  securityContext:
    enabled: true
    fsGroup: 1001
    runAsUser: 1001

  persistence:
    enabled: true
    storageClass: "csi-cephfs"
    accessModes:
      - ReadWriteMany
    size: 32Gi

  ## Node labels and tolerations for pod assignment
  ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  ## ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
  ##
  nodeSelector: {}
  tolerations: []

  ## Configure resource requests
  ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
  ##
  resources:
    requests:
      memory: 100Mi

  ## Custom liveness and readiness probes
  customStartupProbe: {}
  customLivenessProbe: {}
  customReadinessProbe: {}

alpha:
  name: alpha
  metrics:
    enabled: true
  monitorLabel: alpha-dgraph-io
  updateStrategy: RollingUpdate
  podManagementPolicy: OrderedReady

  ## Number of dgraph nodes
  ##
  replicaCount: 3

  ## zero server pod termination grace period
  ##
  terminationGracePeriodSeconds: 600

  ## Hard means that by default pods will only be scheduled if there are enough nodes for them
  ## and that they will never end up on the same node. Setting this to soft will do this "best effort"
  antiAffinity: soft

  ## By default this will make sure two pods don't end up on the same node
  ## Changing this to a region would allow you to spread pods across regions
  podAntiAffinitytopologyKey: "kubernetes.io/hostname"

  ## This is the node affinity settings as defined in
  ## https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
  nodeAffinity: {}

  ## Extra environment variables which will be appended to the env: definition for the container.
  extraEnvs: []
  configFile:
    config.toml: |
      whitelist = '10.0.0.0/8,172.0.0.0/8,192.168.0.0/16'
      lru_mb = 2048

  ## Kubernetes configuration
  ## For minikube, set this to NodePort, elsewhere use LoadBalancer
  ##
  service:
    type: ClusterIP
    annotations: {}
    ## StatefulSet pods will need to have addresses published in order to
    ## communicate to each other in order to enter a ready state.
    publishNotReadyAddresses: true

  ## alpha ingress resource configuration
  ## This requires an ingress controller to be installed into your k8s cluster
  ingress:
    enabled: false
    # hostname: ""
    # annotations: {}
    # tls: {}

  ## dgraph Pod Security Context
  securityContext:
    enabled: true
    fsGroup: 1001
    runAsUser: 1001
  tls:
    enabled: false
    files: {}
  acl:
    enabled: false
  encryption:
    enabled: false
  persistence:
    enabled: true
    storageClass: "csi-cephfs"
    accessModes:
      - ReadWriteMany
    size: 100Gi
    annotations: {}

  ## Custom liveness and readiness probes
  customStartupProbe: {}
  customLivenessProbe: {}
  customReadinessProbe: {}


ratel:
  name: ratel

  ## Enable Ratel service
  enabled: true

  ## Number of dgraph nodes
  ##
  replicaCount: 1

  # Extra environment variables which will be appended to the env: definition for the container.
  extraEnvs: []

  ## Kubernetes configuration
  ## For minikube, set this to NodePort, elsewhere use ClusterIP or LoadBalancer
  ##
  service:
    type: ClusterIP
    annotations: {}

  ## ratel ingress resource configuration
  ## This requires an ingress controller to be installed into your k8s cluster
  ingress:
    enabled: false

  ## dgraph Pod Security Context
  securityContext:
    enabled: true
    fsGroup: 1001
    runAsUser: 1001

  ## Configure resource requests
  ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
  ##
  ## resources:
  ##   requests:
  ##     memory: 256Mi
  ##     cpu: 250m

  ## Configure extra options for liveness and readiness probes
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
  ##

  ## Custom liveness and readiness probes
  customLivenessProbe: {}
  customReadinessProbe: {}

global:
  ## Combined ingress resource for alpha and ratel services
  ## This will override existing ingress configurations under alpha and ratel
  ## This requires an ingress controller to be installed into your k8s cluster
  ingress:
    enabled: false
    annotations: {}
    tls: {}
    ratel_hostname: ""
    alpha_hostname: ""

Okay, I did some changes. I disabled Linkerd injection by adding the annotation and reinstalled dgraph:

annotations:
    linkerd.io/inject: disabled

and it seems to produce consistent results now. Still not sure why though.

Linkerd’s load balancing logic and K8 load balancing logic are different: https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/

Would that be the reason? Not sure.

Update: I tried the same in a different cluster with Linkerd and I am able to reproduce there as well. So, taking Linkerd out of the equation makes it work.

We have not tested with service meshes like Linkerd or Istio yet. Service mesh platforms are complex and have a lot of depth in contrast to Ingresses. So, I would recommend starting with ingress, until the needs require use of service meshes.

In building the helm chart support for ingress, these were tested:

1 Like

@joaquin Thanks for the reply. Ingress won’t be relevant for me because my intention is not to expose the database outside the cluster. I use service mesh for communication within the cluster and I have ingress only for the front end services.

I can currently work with Linkerd disabled for Dgraph but since every microservice will call the dgraph service to do db operations, having it meshed might make sense atleast in the future.

That makes sense. Also, you can have the ingress and/or load balancers be private only. This way, you can reduce load balancers per service and keep it private internal only.

Service mesh make sense in the long run, especially with microservices, such as determine which ones can talk to the database or not, and balance traffic load evenly for HTTP/2 if GRPC clients are used.

1 Like

@joaquin Sure. Btw, wanted to share this link: https://linkerd.io/2/features/http-grpc/

I am not sure what version of grpc-go dgraph uses but looks like bugs have been fixed in latest versions as per linkerd docs.

Thanks for the information, that is good to know.