What I want to do
想确认下分片后为什么吞吐量反而下降了。
I noticed after sharding, the throughput drops. Why?
What I did
压测方式:使用jmeter访问http端口。
Method: Use JMeter to monitor/query the HTTP port.
Dgraph Query URI:http://{vip}/dgraph-alpha-public/query
Query parameters:
{"query":"{ \nnodes(func:eq(identity_id, distinct_id_lyf_test4)) { \n uid \n identity_id \n tenant_id \n namespace \n type \n create_time \n model_type \n account_reverse : ~account_relation{ \n namespace \n identity_id \n type \n model_type \n create_time \n tenant_id \n uid \n relation{ \n uid \n namespace \n identity_id \n type \n create_time \n model_type \n tenant_id \n source{ \n uid \n source_id \n source_type \n create_time \n primary \n tenant_id \n model_type \n } \n } \n } \n account_relation{ \n namespace \n identity_id \n type \n model_type \n create_time \n tenant_id \n uid \n relation{ \n uid \n namespace \n identity_id \n type \n create_time \n model_type \n tenant_id \n source{ \n uid \n source_id \n source_type \n create_time \n primary \n tenant_id \n model_type \n } \n } \n } \n relation{ \n uid \n identity_id \n tenant_id \n namespace \n type \n create_time \n model_type \n source{ \n uid \n source_id \n source_type \n create_time \n primary \n tenant_id \n model_type \n } \n } \n reverse : ~relation{ \n uid \n identity_id \n tenant_id \n namespace \n type \n create_time \n model_type \n account_reverse : ~account_relation{ \n namespace \n identity_id \n type \n model_type \n create_time \n tenant_id \n uid \n relation{ \n uid \n namespace \n identity_id \n type \n create_time \n model_type \n tenant_id \n source{ \n uid \n source_id \n source_type \n create_time \n primary \n tenant_id \n model_type \n } \n } \n } \n account_relation{ \n namespace \n identity_id \n type \n model_type \n create_time \n tenant_id \n uid \n relation{ \n uid \n namespace \n identity_id \n type \n create_time \n model_type \n tenant_id \n source{ \n uid \n source_id \n source_type \n create_time \n primary \n tenant_id \n model_type \n } \n } \n } \n relation{ \n uid \n identity_id \n tenant_id \n namespace \n type \n create_time\n model_type \n source{ \n uid \n source_id \n source_type \n create_time \n primary \n tenant_id \n model_type \n } \n } \n } \n} \n}","variables":{}}
Mutation URI:http://{vip}/dgraph-alpha-public/mutation
Mutation parameters:
{
set{
_:kYEqZY085a <type> "aff_id" .
_:kYEqZY085a <namespace> "cdp" .
_:kYEqZY085a <identity_id> "1505814931620393" .
_:kYEqZY085a <create_time> "2021-02-22T15:33:09.412" .
_:kYEqZY085a <tenant_id> "lyf_test" .
_:kYEqZY085a <dgraph.type> "Identity" .
_:kYEqZYzkli <type> "phone" .
_:kYEqZYzkli <namespace> "phone" .
_:kYEqZYzkli <identity_id> "13366246858_1" .
_:kYEqZYzkli <create_time> "2021-02-09T12:54:55.026" .
_:kYEqZYzkli <tenant_id> "lyf_test" .
_:kYEqZYzkli <dgraph.type> "Identity" .
_:kYEqZY085a <relation> _:kYEqZYzkli .
_:kYEqZYQVtm <source_type> "AFF_CRM" .
_:kYEqZYQVtm <source_id> "-1" .
_:kYEqZYQVtm <create_time> "2021-02-09T12:54:55.026" .
_:kYEqZYQVtm <tenant_id> "lyf_test" .
_:kYEqZYQVtm <primary> "false" .
_:kYEqZYQVtm <dgraph.type> "Source" .
_:kYEqZYzkli <source> _:kYEqZYQVtm .
}
}
{vip}是访问k8s ServiceName的vip机器。
{vip] is the k8s ServiceName query machine.
部署配置:
三个zero,三个alpha,均是默认启动参数:
查询最大吞吐量:5000并发查询循环3次,最大吞吐量3587。
突变最大吞吐量:1000并发突变循环3次,最大吞吐量1642。
三个zero,五个alpha,均是默认启动参数:
重新均衡谓词后的结果:
Deployment configuration:
Three zeros and three alphas with the default startup parameters:
Maximum query throughput: 5000 concurrent query loops 3 times, with a maximum throughput of 3587.
Maximum throughput of mutation: 1000 concurrent mutation cycles for 3 times, maximum throughput of 1642.
Three zeros and five alphas with the default startup parameters:
The result after rebalancing the predicate:
查询最大吞吐量:5000并发查询循环3次,最大吞吐量358。
突变最大吞吐量:1000并发突变循环3次,最大吞吐量1447。
吞吐量下降的很明显。
Maximum query throughput: 5000 concurrent query loops 3 times, and the maximum throughput is 358.
Maximum throughput of mutation: 1000 concurrent mutation cycles for 3 times, and the maximum throughput is 1447.
The throughput has dropped significantly.
指标数据:
已有的内存、cpu和go_goroutines、dgraph_alpha_health_status等指标数据均正常。其中alphadgraph_pending_proposals_total指标只有alpha-4达到上限256,而其他节点没有明显上涨。将该指标调整至1024后,使用jmeter模拟3000并发请求90秒,发现只有alpha-4的pending proposals达到上限1024,且cpu使用率很高,出现处理异常和事务停止异常。而其他的alpha却是低cpu使用率。
Other metrics:
All other internal metrics (CPU, goroutines, dgraph_alpha_health_status, etc) are normal. The alphadgraph_pending_proposals_total
metric is a little odd: Only alph-4 is hitting the limit of 256, while other nodes have not risen significantly.
After adjusting the index to 1024, using jmeter to simulate 3000 concurrent requests for 90 seconds, we found that only the pending proposals of alpha-4 reached the upper limit of 1024, and the CPU usage rate was high, processing exceptions and transaction stop exceptions occurred. The other alphas are low cpu usage.
异常信息:
{"errors":[{"message":"Server overloaded with pending proposals. Please retry later","extensions":{"code":"ErrorInvalidRequest"}}],"data":null}
{"errors":[{"message":"rpc error: code = Aborted desc = Transaction has been aborted. Please retry","extensions":{"code":"ErrorInvalidRequest"}}],"data":null}
Dgraph metadata
部署环境:k8s
alpha pod配置:
Limits:
cpu: 12
memory: 32Gi
Requests:
cpu: 8
memory: 16Gi
zero pod配置:
Limits:
cpu: 4
memory: 8Gi
Requests:
cpu: 2
memory: 8G
schema:
<account_relation>: [uid] @reverse .
<create_time>: default .
<dgraph.cors>: [string] @index(exact) @upsert .
<dgraph.drop.op>: string .
<dgraph.graphql.p_query>: string .
<dgraph.graphql.p_sha256hash>: string @index(exact) .
<dgraph.graphql.schema>: string .
<dgraph.graphql.schema_created_at>: datetime .
<dgraph.graphql.schema_history>: string .
<dgraph.graphql.xid>: string @index(exact) @upsert .
<identity_id>: string @index(hash) .
<model_type>: int @index(int) .
<namespace>: string @index(hash) .
<primary>: bool @index(bool) .
<relation>: [uid] @reverse .
<source>: [uid] @reverse .
<source_id>: string @index(hash) .
<source_type>: default .
<tenant_id>: string @index(hash) .
<type>: string @index(hash) .
type <AccountIdentity> {
account_relation
create_time
identity_id
model_type
namespace
relation
source
tenant_id
type
}
type <Identity> {
create_time
identity_id
namespace
relation
source
tenant_id
type
}
type <Source> {
source_type
source_id
primary
create_time
tenant_id
}
type <dgraph.graphql> {
dgraph.graphql.schema
dgraph.graphql.xid
}
type <dgraph.graphql.history> {
dgraph.graphql.schema_history
dgraph.graphql.schema_created_at
}
type <dgraph.graphql.persisted_query> {
dgraph.graphql.p_query
dgraph.graphql.p_sha256hash
}
type <dgraph.type.cors> {
dgraph.cors
}
dgraph version
Dgraph version : v20.11.2
Dgraph codename : tchalla-2
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true
当前集群状态信息:
Current status
{"counter":"1031133","groups":{"1":{"members":{"2":{"id":"2","groupId":1,"addr":"dgraph-alpha-1.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1619678374","clusterInfoOnly":false,"forceGroupId":false},"6":{"id":"6","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":true,"amDead":false,"lastUpdate":"1619763968","clusterInfoOnly":false,"forceGroupId":false},"7":{"id":"7","groupId":1,"addr":"dgraph-alpha-0.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1619763911","clusterInfoOnly":false,"forceGroupId":false}},"tablets":{"account_relation":{"groupId":1,"predicate":"account_relation","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"create_time":{"groupId":1,"predicate":"create_time","force":false,"onDiskBytes":"250372977","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"1056268508"},"dgraph.cors":{"groupId":1,"predicate":"dgraph.cors","force":false,"onDiskBytes":"199","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"75"},"dgraph.drop.op":{"groupId":1,"predicate":"dgraph.drop.op","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.graphql.p_query":{"groupId":1,"predicate":"dgraph.graphql.p_query","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.graphql.p_sha256hash":{"groupId":1,"predicate":"dgraph.graphql.p_sha256hash","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.graphql.schema":{"groupId":1,"predicate":"dgraph.graphql.schema","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.graphql.schema_created_at":{"groupId":1,"predicate":"dgraph.graphql.schema_created_at","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.graphql.schema_history":{"groupId":1,"predicate":"dgraph.graphql.schema_history","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.graphql.xid":{"groupId":1,"predicate":"dgraph.graphql.xid","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"dgraph.type":{"groupId":1,"predicate":"dgraph.type","force":false,"onDiskBytes":"316320365","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"987516220"},"model_type":{"groupId":1,"predicate":"model_type","force":false,"onDiskBytes":"0","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"0"},"namespace":{"groupId":1,"predicate":"namespace","force":false,"onDiskBytes":"176700607","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"637472108"},"relation":{"groupId":1,"predicate":"relation","force":false,"onDiskBytes":"196587853","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"437145360"},"source":{"groupId":1,"predicate":"source","force":false,"onDiskBytes":"203302443","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"455500437"},"tenant_id":{"groupId":1,"predicate":"tenant_id","force":false,"onDiskBytes":"359673272","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"981317713"}},"snapshotTs":"2561814","checksum":"3340003164330354535","checkpointTs":"0"},"2":{"members":{"8":{"id":"8","groupId":2,"addr":"dgraph-alpha-3.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":true,"amDead":false,"lastUpdate":"1619763902","clusterInfoOnly":false,"forceGroupId":false},"9":{"id":"9","groupId":2,"addr":"dgraph-alpha-4.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false}},"tablets":{"identity_id":{"groupId":2,"predicate":"identity_id","force":false,"onDiskBytes":"744577361","remove":false,"readOnly":false,"moveTs":"1552836","uncompressedBytes":"1274366935"},"primary":{"groupId":2,"predicate":"primary","force":false,"onDiskBytes":"73127349","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"354713747"},"source_id":{"groupId":2,"predicate":"source_id","force":false,"onDiskBytes":"368934064","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"827124029"},"source_type":{"groupId":2,"predicate":"source_type","force":false,"onDiskBytes":"63111162","remove":false,"readOnly":false,"moveTs":"2561815","uncompressedBytes":"245374047"},"type":{"groupId":2,"predicate":"type","force":false,"onDiskBytes":"197447263","remove":false,"readOnly":false,"moveTs":"0","uncompressedBytes":"910255979"}},"snapshotTs":"2561814","checksum":"13039052466481007906","checkpointTs":"0"}},"zeros":{"2":{"id":"2","groupId":0,"addr":"dgraph-zero-1.dgraph-zero.crm-test.svc.cluster.local:5080","leader":true,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},"3":{"id":"3","groupId":0,"addr":"dgraph-zero-2.dgraph-zero.crm-test.svc.cluster.local:5080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},"4":{"id":"4","groupId":0,"addr":"dgraph-zero-0.dgraph-zero.crm-test.svc.cluster.local:5080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false}},"maxLeaseId":"17900000","maxTxnTs":"2570000","maxRaftId":"9","removed":[{"id":"3","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1615991006","clusterInfoOnly":false,"forceGroupId":false},{"id":"4","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},{"id":"5","groupId":1,"addr":"dgraph-alpha-2.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},{"id":"1","groupId":0,"addr":"dgraph-zero-0.dgraph-zero.crm-test.svc.cluster.local:5080","leader":true,"amDead":false,"lastUpdate":"0","clusterInfoOnly":false,"forceGroupId":false},{"id":"1","groupId":1,"addr":"dgraph-alpha-0.dgraph-alpha.crm-test.svc.cluster.local:7080","leader":false,"amDead":false,"lastUpdate":"1615990975","clusterInfoOnly":false,"forceGroupId":false}],"cid":"fec6a030-0c07-4fe9-a775-219482e41177","license":{"user":"","maxNodes":"18446744073709551615","expiryTs":"1617290026","enabled":false}}