Hi Dgraph team,
I want to to count distinct values of a predicate. I have objects with a predicate o.color
and would like to know how often I have green, blue etc.
Here is what I did:
curl -H "Content-Type: application/dql" -X POST localhost:8080/query -d $'{
var(func: eq(o.tenant, "TheOneAndOnly")) {
filtered as uid
}
o.color(func: uid(filtered)) @groupby(o.color){
count(uid)
}
}' | jq
Result:
"data": {
"o.color": [
{
"@groupby": [
{
"o.color": "green",
"count": 82439
},
{
"o.color": "blue",
"count": 82536
},
{
"o.color": "yellow",
"count": 82603
},
{
"o.color": "orange",
"count": 82676
},
{
"o.color": "purple",
"count": 82746
},
{
"o.color": "red",
"count": 83000
}
]
}
]
}
This query gives me the desired results, but on a cluster with 500k objects it took 7 seconds to respond.
Using query blocks, i can significantly reduce the response time:
curl -H "Content-Type: application/dql" -X POST dgraphalpha2:8081/query -d $'{
green(func: eq(o.color, "green")) @filter(eq(o.tenant, "TheOneAndOnly")) {
count(uid)
}
blue(func: eq(o.color, "blue")) @filter(eq(o.tenant, "TheOneAndOnly")) {
count(uid)
}
yellow(func: eq(o.color, "yellow")) @filter(eq(o.tenant, "TheOneAndOnly")) {
count(uid)
}
purple(func: eq(o.color, "purple")) @filter(eq(o.tenant, "TheOneAndOnly")) {
count(uid)
}
red(func: eq(o.color, "red")) @filter(eq(o.tenant, "TheOneAndOnly")) {
count(uid)
}
orange(func: eq(o.color, "orange")) @filter(eq(o.tenant, "TheOneAndOnly")) {
count(uid)
}
}' | jq
Result:
"data": {
"green": [
{
"count": 82439
}
],
"blue": [
{
"count": 82536
}
],
"yellow": [
{
"count": 82603
}
],
"purple": [
{
"count": 82746
}
],
"red": [
{
"count": 83000
}
],
"orange": [
{
"count": 82676
}
]
}
Responds in 135 ms!!
Big BUT: Now, I need to know all distinct values in advance.
Does anybody have an idea how to achieve my goal of counting distinct values of a predicate, in a performant way? Is there a way, performance for this operation could be optimized?
This was run on a 6 alpha cluster (2 shards @ 3 replicas). Dgraph version 21.03.2.