Query that crashes a dgraph server

jzhu077 · January 31, 2018, 2:21am

I have set up a dgraph cluster in kubernetes with 5 zeros and 30 servers with replicas set to 3.

Each dgraph server has memory set to --memory_mb 3036
When I run a long query which consists of approximately 200 blocks and some blocks are expected to return thousands or even hundred thousand results.

About 20 mins into the querying, the dgraph server pod that handles the query crashed and logged:

2018/01/31 01:48:48 node.go:400: WARN: A tick missed to fire. Node blocks too long!
2018/01/31 01:48:48 node.go:400: WARN: A tick missed to fire. Node blocks too long!
2018/01/31 01:48:48 node.go:400: WARN: A tick missed to fire. Node blocks too long!
2018/01/31 01:48:48 node.go:400: WARN: A tick missed to fire. Node blocks too long!
2018/01/31 01:48:48 node.go:400: WARN: A tick missed to fire. Node blocks too long!

Is this a result of memory usage exceeded the memory allocated? ie if I increase the memory setting would that solve this issue?

janardhan · January 31, 2018, 1:45pm

Is the cpu usage very high when you do this query? Can you please share the heap and cpu profile.
It can happen if all cpu is being eaten by the query or if process becomes too slow(dgraph using swap memory - this shouldn’t happen on kubernetes though)

jzhu077 · January 31, 2018, 9:04pm

Yes, the CPU usage is relatively high during the query time. 62.5% (10cores/16cores) in some spikes
Each node has spec n1-standard-16 (16 vCPUs, 60 GB memory)

This is the CPU usage at the time it crashes (~14.50)

Memory usage remains at high level even after the ingesting phase 93%(56GB/60GB)
Green line represents the overall usage.
Yellow line represents usage from default namespace where dgraph pods are deployed.

Does it mean the memory config for dgraph server will not have any effect on the performance of the dgraph? I thought during a long query dgraph server passes the result of a block of the query to another that contains the relevant predicate, if the result is too big to be stored in the memory will it crash? or it would log another error/warning message?

pawan · January 31, 2018, 11:32pm

Yes, that’s true. Dgraph does pass the query to another server to execute it but the result has to finally be aggregated on the node which got the initial request. If the result is too big to be stored in memory, the server may go out of memory but logs should say that. Do you have any logs from the server crashing?

More memory should definitely improve performance but turn off swap space if not already done.

jzhu077 · February 1, 2018, 2:36am

The logs I pasted were the only odd logs that I found when the pod crashes.

Is this some kind of config I can set in the GCE node? or from dgraph binary? Not sure how I can turn off swap space off, and why would this help to improve the performance.

janardhan · February 1, 2018, 3:13am

sudo swapoff -a can be used to turn off swap. I am not sure whether your system is using swap but if it uses it makes the programn very slow since disk would be used instead of ram. Its better to let the programn go oom instead of using swap.

jzhu077 · February 1, 2018, 3:55am

Cheers, the swap is set to 0 by default in GCE VM instances.

system · March 3, 2018, 3:55am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Execute query, alpha takes up high memory and ends up OOM Dgraph kind:question , dgraph	4	899	August 3, 2022
Continuous increase in memory consumption on self hosted dgraph pods Dgraph Clients kind:question , dgraph	4	1278	April 25, 2023
What is the expected behavior of 'dgraph server --memory_mb? Users	12	1646	December 30, 2017
Massive kswapd0 CPU spikes? Dgraph	2	1896	July 21, 2021
Extreme memory usage when constantly query and mutate data Dgraph	5	1801	February 5, 2020

Query that crashes a dgraph server

Related topics