We have an application where we need to perform regular schema alter operations to isolate some predicates for new datasets. This alter operation creates a new set of predicates. We have observed following behaviour whenever alter operation runs:
increase in pending queries: rate(dgraph_pending_queries_total)
Spike in go routines
Spike in server latencies: dgraph_latency_bucket, dgraph_grpc_io_server_server_latency_bucket
As per the documentation schema alter ops when run in background should not impact ongoing query/mutations.
Welcome to the Dgraph community. We hope to see you around here more often.
Now, when you say spike in goroutines, where are you seeing that? Goroutines are not like regular threads - you could spawn a million of them and they should have minimal impact on the running of the program if the program is structured well (i.e no communication deadlock bugs)
More concerning is the spike in latencies. @ibrahim - ideas?
@chewxy Number of go routines (for dgraph alpha pods) are not really an issue, just an observation. Its just that it corresponds to latency increase in most cases
@ibrahim While enabling the tracer on production might take some time, I wanted to have a better understanding on how schema updates work.
Same schema updates take ~2 sec on a fresh dgraph cluster, where as it takes around 60s - 100s on a cluster that has considerable amount of data and receiving query/mutation requests.
Considering the fact that schema updates have addition of new predicates, It should not impact operations on existing predicates and update time should not be increase by this much.
@ibrahim I tried to reproduce the same behaviour on a test setup. alter op response time increased from 600ms to 5.24 s.
Also observed increased latencies for ongoing query/mutations.
Sequence of events seems to be as following:
Increase in number of mutations (edge creation rate) → consistent increase in pending queries → high server latencies
We were able to recover completely from this only after rolling restart of the alpha nodes.
Dgraph uses Raft protocol internally. It acts runs as a state machine. Hence at any point of time only one mutation is applied(however same mutation is broken into multiple parts and applied concurrently).
This can be the reason for difference in time when alter is run on fresh cluster and alter is run on cluster with running mutations/queries. Although difference mentioned by you is very high(2s vs 60s). Even though predicates are new, we do some checks for them like cancelling in progress transactions for predicates etc.
Can you share your benchmarks with us? We would like to reproduce this at our end. I am not very sure if the latency difference should be this big. Also I feel that increase in mutations/pending queries should be for sometime, but eventually everything should run normally.
@ashishgoswami Let me know what all details you need regarding benchmark. The metrics shared are from our production cluster.
Total number of predicates: 4882
Number of predicates per schema alter operation: 16
Most of our mutations span across 3-4 predicates.
Pending queries, latencies recovered once we restarted the pods. This also looks like a problem with default helm charts, liveliness, readiness probes are disabled. They could have been useful if an alpha process gets stuck.
Is there any recommended configuration for readiness, liveliness probe that we can use. Not sure of the story behind disabling these.
Thanks for providing the above information. As you said, you are able to reproduce this on test setup, would it be possible to share your test scripts with us? You can share those by personal message.
We had reproduced the issue on test deployment of a microservice responsible for generating mutations and alter ops. I will need to reproduce it using a simpler script which can be easily shared.
@ashishgoswami Meanwhile is there a recommended configuration for readiness, liveliness probes that we can use. I could use default values after enabling the probes.
@ashishgoswami I have shared the script to reproduce schema alter operations via message.
Running the same set of schema alter ops even on a fresh cluster causes very high memory usage on alpha pods and eventually gets stuck in state where no more alter ops are possible.
Hi Kesharee, Could you please also share the resources view (RAM, CPU) for each alpha and zero? There are a few switches that have been recently added that might help, and I would like to compare.
From the profile, it looks like there were too many memtables (the newArena). This will happen when a lot of badger instances are opened at the same time. Each re-indexing call opens a new badger DB and that’s why we’re seeing high memory usage.
v1.2.7 should have the same issue since we do reindexing the same way in all the versions.
We’re starting too many builders in the background and that’s causing too many badger DB which leads to heavy memory usage.
@killerknv I’ll accept this as a bug and we’ll try to resolve this soon.