Single thread

shanye80 · January 15, 2021, 11:24am

I feel that dgraph is a single threaded thing. With the increase of requests, the query latency is increasing. Please tell me what’s going on

MichelDiz · January 15, 2021, 4:20pm

No, in general, all go programs are using the concurrency model. Using go is kind of mandatory to be multithreaded.

This can happen in some scenarios. Please share your context of configs and stats. So I can guide you.

shanye80 · January 18, 2021, 6:03am

Thank you for your reply！

Here’s how I start：

node1:

nohup ./dgraph zero --idx 200 --my=node1:5080 --jaeger.collector=http://node1:14268 --replicas=3 2>&1 &
nohup ./dgraph alpha --idx 201 --my=node1:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node2

nohup ./dgraph alpha --idx 202 --my=node2:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node3

nohup ./dgraph alpha --idx 203 --my=node3:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node4

nohup ./dgraph alpha --idx 204 --my=node4:7080 --ludicrous_mode --max_retries 5
 --jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 
> alpha.log 2>&1 &

node5

nohup ./dgraph alpha --idx 205 --my=node5:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 --lru_mb 8192 
--cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 --zero node1:5080 > 
alpha.log 2>&1 &

node6

nohup ./dgraph alpha --idx 206 --my=node6:7080 --ludicrous_mode --max_retries 5 
--jaeger.collector=http://node1:14268 --snapshot_after 500000 
--lru_mb 8192 --cache_mb 16384 --whitelist 10.0.0.0:10.255.255.255 
--zero node1:5080 > alpha.log 2>&1 &

We access it through the HTTP protocol。The QPS is less than 200。

mrjn · January 18, 2021, 11:24am

Are you hitting Dgraph serially or concurrently?

shanye80 · January 18, 2021, 11:37am

It is called concurrently with httpclient

mrjn · January 18, 2021, 2:35pm

What version of Dgraph are you using, and can you add steps/code to reproduce?

MichelDiz · January 18, 2021, 5:16pm

Weird, when I use ludicrous_mode I can reach 400 thousand mutations per second.

Can you use a common config in your cluster?

e.g:
remove --idx, --snapshot_after, --max_retries,

Also --cache_mb and --lru_mb are the same thing, stick to --cache_mb.

Are you hitting a single Alpha instance or round-robin between alphas?

What is the version you are using?
Are you using Docker? K8s? Vagrant? KVM?

chewxy · January 18, 2021, 9:59pm

@shanye80, wondering if you could also share the query(ies) that you are using?

Even if Dgraph were single threaded*, I observe that you are running 5 Alphas in your cluster. That’s 5 processes, and at least 5 threads. So the matter of single threading is not the issue.

However, to really make sure that’s not the case, can you also run echo $GOMAXPROCS and paste the results? It should be empty.

shanye80 · January 19, 2021, 2:54am

The version used is v20.11.0。The first step is to start the dgraph cluster and then call the 8080 port of alpha to query.
eg:

{   

search(func:eq(typeCode, ["2000000000000","2000300000000","6000700080000"]))

          @filter(eq(typeName,JobType)

          and (eq(name, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"])

          or eq(alias, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"]))

            and eq(isDelete,0)

          )@recurse(depth: 10, loop: true){

            name

            kgId

            conceptType

            conceptTypeName

            topicNameEN

            version

            weight

            typeCode

            level

            alias

            parentNode

        }   
}

shanye80 · January 19, 2021, 3:00am

Mutation is also very fast, but the query is very slow。 Common configuration has been tried, and the same is true。
The version used is v20.11.0。
Six physical machines are used instead of virtual machines，CPU：48cores，Memory：96GB

MichelDiz · January 19, 2021, 3:05am

Okay, add a load-balancer in front of your cluster. An NGINX can do the job even for gRPC. I’ve seen users complaining about the query perf in a similar way and the load balancer was game-change.

shanye80 · January 19, 2021, 3:08am

I’m not sure if it’s single threaded. I just doubt it. Look at Jaeger’s chart. In the last 200 queries, we used 40 threads to make concurrent calls, but no two points overlap. It’s very strange。

Running echo $gomaxprocs result is indeed empty

The query statement looks like this：

{   

search(func:eq(typeCode, ["2000000000000","2000300000000","6000700080000"]))

          @filter(eq(typeName,JobType)

          and (eq(name, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"])

          or eq(alias, ["接待","顾客","主动","热情","有无","工作","经验","工作经验","均可","公司","带薪","带薪培训","培训","保证","快速","上岗","面试","入职","安排","上班","无试用期","周到","宾至如归","之感","当天","宿舍","试用","工作","内容","负责","公司","车辆","保养","维修","清洁","协助","处理","车辆保险","保险","办理","不良","驾驶","记录","重大","事故","交通","具有","较强","安全","意识","商务","接待","礼仪","一定","服务","服务意识","为人","踏实","保密","责任","责任心","适应","加班","老实","忠厚","车辆保养","维修和清洁","清洁工作","索赔","年检办理","交通违章","商务接待","接待礼仪","商务接待礼仪","维修和清洁工作","工作","时间","工作时间","公司","待遇","每天","小时","保底","安排","住宿","带薪","带薪休假","休假","免费","带薪培训","培训","空调","洗衣机","热水器","公寓","接待","顾客","主动","热情","有无","经验","工作经验","均可","保证","快速","上岗","面试","入职","福利","提成","月休","6天","宿舍","标配","WIFI","jobSummary","周到","宾至如归","之感","当天","商务","司机","年底","急招","高薪","商务司机","C1"]))

            and eq(isDelete,0)

          )@recurse(depth: 10, loop: true){

            name

            kgId

            conceptType

            conceptTypeName

            topicNameEN

            version

            weight

            typeCode

            level

            alias

            parentNode

        }   
}

shanye80 · January 19, 2021, 3:43am

We have also tried to use grpc to query. Like the above code, we used this code in the UDF of Flink and opened 40 parallelism to call. Will multiple dgrapstubs load balance themselves?

MichelDiz · January 19, 2021, 3:50am

I believe so. I have tested in JS and the balancing worked great. But I feel that HTTP requests are better.

shanye80 · January 19, 2021, 4:02am

Using the grpc method of the above code to call QPS is still very low. If one thread calls, it will return the result in about 20ms. However, if 60 threads call, the delay may reach 400ms-600ms. If the parallelism is higher, the delay can reach more than 1s

MichelDiz · January 19, 2021, 4:03am

I’m confused, are you already testing a load-balancer?

shanye80 · January 19, 2021, 4:26am

The method of HTTP request is being tested, and the method of grpc has been tried, but the effect is not ideal。

shanye80 · January 19, 2021, 4:33am

Is there any problem with the query statement, please? Is “filter query” CPU intensive? In which node the predicate is distributed, the CPU utilization rate of that node will exceed 90%

shadow · January 19, 2021, 6:16am

I met the same problem. Has this problem been solved

MichelDiz · January 19, 2021, 7:26am

Would you mind to change your query to:

  q1 as var(func:eq(typeCode, "2000000000000"))
  q2 as var(func:eq(typeCode, "2000300000000"))
  q3 as var(func:eq(typeCode, "6000700080000"))
  
  search(func: uid(q1, q2, q3)) @filter(eq(typeName, "JobType")

When you use multiple blocks it improves performance. Cuz they are executed in parallel in the query.

Do you really need this size of filtering? are you gonna see this type of query on daily basis? I was trying to analyze this query. It was a bit hard for me. This query is complex. 3 types of nodes to find at query root, a ton of filtering, and using recurse. This is really pushing hard, it is simple but it is pushing.

Also, can you confirm that you are using the load-balancer or not? It wasn’t clear to me in your last answers.

Cheers.

Yes, any additional filtering will use resources.

Can you reformulate this question?

Topic		Replies	Views
Does dgraph do multithreaded query execution? Dgraph	1	517	March 29, 2019
Schema Alter operation causing high query/mutation latencies Dgraph schema , status:accepted , ticket:created	17	1415	October 13, 2020
Dgraph Latency Tests with Embargo Dev benchmark	0	731	June 1, 2020
Queries are slow using the javaAPI Users	4	909	October 24, 2018
Transaction Dgraph kind:question , dgraph	7	676	April 21, 2021

Single thread

Related topics