we have deployed dgraph cluster using official helm charts in production with 3 zero and 3 alpha nodes.
One of our queries look like this
{
stores(func: eq(site, %s)) @filter(eq(state, "active") and not uid(%s)){
%s
}
}
This query gives correct results most of the time but gives an empty response a few times.
It neither fails nor does it give any other wrong result but only empty results as wrong results.
Also, this is not the only query with this issue, there are other queries also which gives empty results occasionally.
{
inactiveStores(func: type(store)) @filter(eq(site, %s) and lt(createdOn, %s) and eq(state, inactive)) {
uid
productCount: count(hasProduct)
}
}
For example, I hit ratel 5-6 times with this query, it gives empty results 1 or 2 times. Also, there are no mutations running which can affect the results.
How to debug such an issue?
Is this a known issue?
This is affecting our production systems because we can’t build reliable systems on top of it.
dgraph:
## Global Docker image parameters
## Please, note that this will override the image parameters, including dependencies, configured to use the global value
## Current available global Docker image parameters: imageRegistry and imagePullSecrets
##
# global:
# imageRegistry: myRegistryName
# imagePullSecrets:
# - myRegistryKeySecretName
image:
registry: docker.io
repository: dgraph/dgraph
tag: v20.03.0
## Specify a imagePullPolicy
## Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent'
## ref: http://kubernetes.io/docs/user-guide/images/#pre-pulling-images
##
pullPolicy: Always
## Optionally specify an array of imagePullSecrets.
## Secrets must be manually created in the namespace.
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
# pullSecrets:
# - myRegistryKeySecretName
## Set to true if you would like to see extra information on logs
## It turns BASH and NAMI debugging in minideb
## ref: https://github.com/bitnami/minideb-extras/#turn-on-bash-debugging
##
debug: false
zero:
name: zero
monitorLabel: zero-dgraph-io
## StatefulSet controller supports automated updates. There are two valid update strategies: RollingUpdate and OnDelete
## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets
##
updateStrategy: RollingUpdate
## Partition update strategy
## https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions
##
# rollingUpdatePartition:
## StatefulSet controller supports relax its ordering guarantees while preserving its uniqueness and identity guarantees. There are two valid pod management policies: OrderedReady and Parallel
## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy
##
podManagementPolicy: OrderedReady
## Number of dgraph zero pods
##
replicaCount: 3
## Max number of replicas per data shard.
## i.e., the max number of Dgraph Alpha instances per group (shard).
##
shardReplicaCount: 3
## zero server pod termination grace period
##
terminationGracePeriodSeconds: 60
## Hard means that by default pods will only be scheduled if there are enough nodes for them
## and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: soft
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
podAntiAffinitytopologyKey: "kubernetes.io/hostname"
## This is the node affinity settings as defined in
## https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
## Kubernetes configuration
## For minikube, set this to NodePort, elsewhere use LoadBalancer
##
service:
type: ClusterIP
## dgraph Pod Security Context
securityContext:
enabled: false
fsGroup: 1001
runAsUser: 1001
## dgraph data Persistent Volume Storage Class
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
persistence:
enabled: true
storageClass: iopsssd
persistentVolumeReclaimPolicy: Retain
accessModes:
- ReadWriteOnce
size: 10Gi
## Node labels and tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature
##
nodeSelector:
spotinst.io/node-lifecycle: od
tolerations: []
## Configure resource requests
## ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
resources:
requests:
memory: 3096Mi
cpu: 2
## Configure extra options for liveness and readiness probes
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
##
livenessProbe:
enabled: false
port: 6080
path: /health
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
enabled: false
port: 6080
path: /state
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
alpha:
name: alpha
monitorLabel: alpha-dgraph-io
## StatefulSet controller supports automated updates. There are two valid update strategies: RollingUpdate and OnDelete
## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets
##
updateStrategy: RollingUpdate
## Partition update strategy
## https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions
##
# rollingUpdatePartition:
## StatefulSet controller supports relax its ordering guarantees while preserving its uniqueness and identity guarantees. There are two valid pod management policies: OrderedReady and Parallel
## ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy
##
podManagementPolicy: OrderedReady
## Number of dgraph nodes
##
replicaCount: 3
## zero server pod termination grace period
##
terminationGracePeriodSeconds: 600
## Hard means that by default pods will only be scheduled if there are enough nodes for them
## and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: soft
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
podAntiAffinitytopologyKey: "kubernetes.io/hostname"
## This is the node affinity settings as defined in
## https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
## Kubernetes configuration
## For minikube, set this to NodePort, elsewhere use LoadBalancer
##
service:
type: ClusterIP
## dgraph Pod Security Context
securityContext:
enabled: false
fsGroup: 1001
runAsUser: 1001
## dgraph data Persistent Volume Storage Class
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
persistence:
enabled: true
storageClass: iopsssd
persistentVolumeReclaimPolicy: Retain
accessModes:
- ReadWriteOnce
size: 50Gi
annotations: {}
## Node labels and tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature
##
nodeSelector:
spotinst.io/node-lifecycle: od
tolerations: []
## Configure resource requests
## ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
resources:
requests:
memory: 12Gi
cpu: 8
## Configure value for lru_mb flag
## Typically a third of available memory is recommended, keeping the default value to 2048mb
# lru_mb: 3096
## Configure extra options for liveness and readiness probes
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
##
livenessProbe:
enabled: false
port: 8080
path: /health?live=1
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
enabled: false
port: 8080
path: /health
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
ratel:
name: ratel
## Number of dgraph nodes
##
replicaCount: 1
## Kubernetes configuration
## For minikube, set this to NodePort, elsewhere use ClusterIP or LoadBalancer
##
service:
type: ClusterIP
## dgraph Pod Security Context
securityContext:
enabled: false
fsGroup: 1001
runAsUser: 1001
## Configure resource requests
## ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
## resources:
## requests:
## memory: 256Mi
## cpu: 250m
## Configure extra options for liveness and readiness probes
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
##
livenessProbe:
enabled: false
port: 8000
path: /
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
readinessProbe:
enabled: false
port: 8000
path: /
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
successThreshold: 1
@vtomar
Can you verify that the cluster is not failing when you are getting the random results. One possible reason for this can be that some pods might have died and aren’t able to return all the data due to consensus.
It’s our production system, can’t start from scratch.
Data is ingested using dgraph mutations using JAVA client, it comes from various external sources S3 files, REST APIs etc, transformed into our data model and pushed to dgraph using mutations.
Besides, how would clean up help and recreate same problem?
I see that you have changed this from 5 to 3. There is a reason why?
That depends. First of all, it’s just a guess - starting from scratch avoids having to do a detailed scan to find a problem, and take more time than you intended. To solve the problem.
Often users do several tests using the same volume and sometimes mix previous configurations and even run posting and wall files on top of each other. Overwriting files. That can work, but generate problems.
A while ago I saw this happen with other users. We don’t know what was done to achieve this. But starting from scratch usually helps.
It would be great to understand the steps to replicate this and then see if it is a problem in Dgraph or in the use of DB. But, the last time I saw someone report this was over 9+ months ago. And was a heavy user.
Either way, you will have to stop your production to solve the problem. It is not possible to understand exactly what is happening only through logs, yaml, or testimony.
On reproducing the use case, you could do an export and re-import into a test cluster. How is ratel and alpha accessed? If using port-forward to a particular pod, are all the pods getting same frequency of empty results? Sometimes, when port-forward errors may occur, and you have to terminate tunnel and reconnect.
The Java client is experiencing similar behavior in similar frequency? Is the Java client running as a pod within the cluster, or outside the cluster?
What Kubernetes implementation, version are you using? Are you using the default scheduler or another scheduler (noticed the spotinst in the nodeSelector)? Were any underlying nodes affected swapped during testing?
Hi @vtomar, in another thread, the simple fix is to upgrade to dgraph 20.07. I was wondering if you could do the same to see if this issue has been fixed?