High memory utilization on alpha node (use of memory cache)

We have set up Dgraph on EKS. we have loaded ~220GB of data on alpha and took r5a.8xlarge ec2 i.e 256GB ram.
it is showing 210GB cache memory on Grafana and actually, only 4.9 GB is used out of 256GB.
what is this cache memory? I have checked the documentation as well Dgraph internally not supported cache. link (Cached Results - GraphQL)

each alpha is on a different ec2 server. we have a total of 5 alphas and facing the same issue in every instance.

below are the screenshot of both.
Grafana (ec2 server stats):-

Alpha Memory Stats:-

AWS console:-

2 Likes

waiting for respone

What exactly do you mean with “we have loaded ~220GB of data on alpha”? The graph data are stored on disk, the memory that you refer to here is RAM, cache memory looks like the Unix filesystem cache that caches recently read files. This looks like a general Unix question, not a specific Dgraph question.

What is the “problem” here? What is your expectation?

We have set up Dgraph on EKS and data load means data size on the graph. we have taken 256 GB instances for the alpha node. now last screenshot is of stats AWS ec2 like it is consuming only 4.9 gb memory but when I check the same instance on grafana then it is showing 210 gb cached memory and it is consuming 215 gb memory out of 246gb.

Maybe it would help to share which stats from what sources you are talking about. Like, is it Prometheus my_metric_name from the default host agent? Is it wired memory you are talking about? Does it line up with values in top?

So I presume you want to understand what that 210 GB cached memory is. Is that your question?

I reckon this is the filesystem cache of the Linux kernel.

To validate that, can you ssh into that machine and

  1. run top?
  2. run sudo bash -c "sync; echo 1 > /proc/sys/vm/drop_caches"?

The latter will flush the filesystem cache and you should see a significant drop in your metrics. This would show the fraction of the cache memory that is used by the OS filesystem cache.

When you start your instance, how do you get the data onto the alpha node? Do you mount a network share or do you copy / bulk load / live load the data?

Also interesting to know: which Dgraph version do you use?

I upload data through the live loader.
actually, If I use a 128gb machine then dgraph consumes 122-124GB of memory.
Now, I have switched to a smaller machine 16cpu and 64GB memory and dgraph consume 62 GB memory out of 64GB and gradually response time of dgraph becomes very high.

I have tried lru_mb = 30gb but still didn’t get any improvement.

This is expected as outlined above. LInux uses the entire un-used RAM for the filesystem cache.

but what about dgraph response time. how to fix that.
and also, I am getting memory full alert on alpha nodes