Execute query, alpha takes up high memory and ends up OOM

What I want to do

I use dgraph for query operations, and at the same time there will be real-time data written through JavaClient

What I did

I built the dgraph cluster in the figure below:

Each machine is configured as: 8cpu, 16G RAM.

I define a type and write 17 million data, then I execute a query with multiple conditions, the query takes 2-3 seconds, the query efficiency is very low, I want to know why the query is so slow,my query statement is as follows:

query q($timeMin: string!, $timeMax: string!, $first: int, $offset: int, $pid: int, $text: string) {
    var(func: eq(pid, $pid)) @filter(between(create_time, $timeMin, $timeMax)  and eq(stage,  1)and eq(op_type,  2) and anyoftext(op_content_text, $text)) {
        a as uid
    var(func: eq(pid, $pid)) @filter(anyoftext(field_value, $text)and eq(is_delete, 0)) {
        log_id @filter(between(create_time, $timeMin, $timeMax) and eq(stage,  1)and eq(op_type,  2)) {
            b as uid
    var(func: eq(op_type, 67)) @filter(eq(pid, $pid)) {
        c as business_id
    q(func: uid(a, b), first: $first, offset: $offset, orderdesc: create_time) @filter(not eq(id, val(c))) @cascade(id) {

When I used Jmeter for stress testing, I found that the memory of alpha would immediately take up 10-14G, and it would crash soon after, and I found OOM. I don’t know why the query would take up such a high memory, which eventually led to the crash of alpha.

Dgraph metadata


About OOM https://dgraph.io/docs/deploy/troubleshooting/#running-out-of-memory-oom

Each Alpha needs to be in a single machine. 16GB for each, and I personally would suggest more RAM and CPU for you case as you use 17 mi nodes and does a complex query.

Also, bottleneck can be encountered when not using PCI-Express NVMe. Every part of your cluster needs to be carefully configured so there are no bottlenecks. If you have 200GB of RAM and little CPU it’s a crooked balance.

In your query, try as much as possible to break the query into several blocks. Avoid complex filters in a single block. Make a pipeline that sends to another block. That way your query will behave better because multiple blocks are executed concurrently.

Are you mutating at the same time as querying?

I think your query is too complex. Such a complex query I would do from time to time. If your application insists on complex queries every day, you’re going to need more resources than that.

In day to day running, the vast majority of applications do not make complex queries. Query for a user, for a post, for a message is usually what it takes. If this query is only for some internal dashboard of your enterprise. So the response time matches the complexity and the amount of resources available.

Hi @MichelDiz
Thank you for your reply, I understand that the current way of building the cluster is unreasonable, but at present my resources are limited, so I can only do this. The split you mentioned is divided into multiple queries, I will try.

But I still want to know, why does dgraph take up such a high memory.

From my experience, Graphs when expanded in the cluster(no written) takes a lot of RAM to do so. The same happens if you expand JSON in an application. It will escalate the RAM usage. I’ve did a JavaScript application once to deal with gigantic JSONs and it consumes a lot of memory. But shortly after it freed up the RAM.

And the RAM management in GO is tricky. We’ve already implemented pretty good RAM management with jemalloc. But Graphs are a bit RAM consumers. Especially when you do very complex queries. That add multiple filters and so on. The need to use filters and other parameters has to be evaluated. Because all this will be computed in some instance of Dgraph (on the server or cloud), not on your machine.

Complex queries made every now and then is normal. But making them everyday use is something you need to calculate the increase in resources to be made available.

Thank you, I will optimize on my query, or increase the RAM.