Single thread

I feel that dgraph is a single threaded thing. With the increase of requests, the query latency is increasing. Please tell me what’s going on

No, in general, all go programs are using the concurrency model. Using go is kind of mandatory to be multithreaded.

This can happen in some scenarios. Please share your context of configs and stats. So I can guide you.

2 Likes

Thank you for your reply!

Here’s how I start:

node1:

nohup ./dgraph zero --idx 200 --my=node1:5080 --jaeger.collector=http://node1:14268 --replicas=3 2>&1 &
nohup ./dgraph alpha --idx 201 --my=node1:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node2

nohup ./dgraph alpha --idx 202 --my=node2:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node3

nohup ./dgraph alpha --idx 203 --my=node3:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node4

nohup ./dgraph alpha --idx 204 --my=node4:7080 --ludicrous_mode --max_retries 5
 --jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 
> alpha.log 2>&1 &

node5

nohup ./dgraph alpha --idx 205 --my=node5:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node6

nohup ./dgraph alpha --idx 206 --my=node6:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 
--lru_mb 8192 --cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 
--zero node1:5080 > alpha.log 2>&1 &

We access it through the HTTP protocol。The QPS is less than 200。

Are you hitting Dgraph serially or concurrently?

It is called concurrently with httpclient

What version of Dgraph are you using, and can you add steps/code to reproduce?

Weird, when I use ludicrous_mode I can reach 400 thousand mutations per second.

Can you use a common config in your cluster?

e.g:
remove --idx, --snapshot_after, --max_retries,

Also --cache_mb and --lru_mb are the same thing, stick to --cache_mb.

Are you hitting a single Alpha instance or round-robin between alphas?

What is the version you are using?
Are you using Docker? K8s? Vagrant? KVM?

@shanye80, wondering if you could also share the query(ies) that you are using?

Even if Dgraph were single threaded*, I observe that you are running 5 Alphas in your cluster. That’s 5 processes, and at least 5 threads. So the matter of single threading is not the issue.

However, to really make sure that’s not the case, can you also run echo $GOMAXPROCS and paste the results? It should be empty.

The version used is v20.11.0。The first step is to start the dgraph cluster and then call the 8080 port of alpha to query.
eg:

{   

search(func:eq(typeCode, ["2000000000000","2000300000000","6000700080000"]))

          @filter(eq(typeName,JobType)

          and (eq(name, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"])

          or eq(alias, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"]))

            and eq(isDelete,0)

          )@recurse(depth: 10, loop: true){

            name

            kgId

            conceptType

            conceptTypeName

            topicNameEN

            version

            weight

            typeCode

            level

            alias

            parentNode

        }   
}

Mutation is also very fast, but the query is very slow。 Common configuration has been tried, and the same is true。
The version used is v20.11.0。
Six physical machines are used instead of virtual machines,CPU:48cores,Memory:96GB

Okay, add a load-balancer in front of your cluster. An NGINX can do the job even for gRPC. I’ve seen users complaining about the query perf in a similar way and the load balancer was game-change.

I’m not sure if it’s single threaded. I just doubt it. Look at Jaeger’s chart. In the last 200 queries, we used 40 threads to make concurrent calls, but no two points overlap. It’s very strange。

Running echo $gomaxprocs result is indeed empty

The query statement looks like this:

{   

search(func:eq(typeCode, ["2000000000000","2000300000000","6000700080000"]))

          @filter(eq(typeName,JobType)

          and (eq(name, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"])

          or eq(alias, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"]))

            and eq(isDelete,0)

          )@recurse(depth: 10, loop: true){

            name

            kgId

            conceptType

            conceptTypeName

            topicNameEN

            version

            weight

            typeCode

            level

            alias

            parentNode

        }   
}

We have also tried to use grpc to query. Like the above code, we used this code in the UDF of Flink and opened 40 parallelism to call. Will multiple dgrapstubs load balance themselves?

I believe so. I have tested in JS and the balancing worked great. But I feel that HTTP requests are better.

Using the grpc method of the above code to call QPS is still very low. If one thread calls, it will return the result in about 20ms. However, if 60 threads call, the delay may reach 400ms-600ms. If the parallelism is higher, the delay can reach more than 1s

I’m confused, are you already testing a load-balancer?

The method of HTTP request is being tested, and the method of grpc has been tried, but the effect is not ideal。

Is there any problem with the query statement, please? Is “filter query” CPU intensive? In which node the predicate is distributed, the CPU utilization rate of that node will exceed 90%

I met the same problem. Has this problem been solved

Would you mind to change your query to:

  q1 as var(func:eq(typeCode, "2000000000000"))
  q2 as var(func:eq(typeCode, "2000300000000"))
  q3 as var(func:eq(typeCode, "6000700080000"))
  
  search(func: uid(q1, q2, q3)) @filter(eq(typeName, "JobType")

When you use multiple blocks it improves performance. Cuz they are executed in parallel in the query.

Do you really need this size of filtering? are you gonna see this type of query on daily basis? I was trying to analyze this query. It was a bit hard for me. This query is complex. 3 types of nodes to find at query root, a ton of filtering, and using recurse. This is really pushing hard, it is simple but it is pushing.

Also, can you confirm that you are using the load-balancer or not? It wasn’t clear to me in your last answers.

Cheers.

Yes, any additional filtering will use resources.

Can you reformulate this question?