Possible to group by facet value?

dpk · June 2, 2020, 11:49pm

Is it possible to group by the value in a facet, akin to list_edge @groupby(@facets(eq(name, "value"))) { count(uid) }? As far as I can tell it’s definitely not possible with @groupby itself, but I’m wondering if anyone’s found a way to do it with subqueries/value variables/etc.

MichelDiz · June 3, 2020, 3:20am

There’s no way to do this as far as I know. Do you have a use case in mind? You can use facets to order a result tho.

dpk · June 3, 2020, 3:16pm

Edited to add: I found a solution that works with without facets. I’ll share it when I have a chance, it might be helpful for others (although it no longer has anything to do with the title, heh.)

I’m tracking what amount to user polls. There’s users, candidates, and polls. Every user can only vote for one candidate in one poll, but the same candidate may appear in multiple polls. Polls don’t always have the same list of candidates. I need to count the number of votes in each poll, grouped by candidate. I also need to know who voted for what (it’s not a secret ballot). I need to be able to traverse a graph between users by way of their votes.

I had considered making a fourth type called “UserCandidateVote” (or similar) and then including it as an edge list between the poll and the user[1] but that has some issues. AFAICT Dgraph could not guarantee that every user can only vote once per poll, so I’d need to implement global locking around the vote process, which reduces performance.

I’m trying to use facets to store the candidate’s uid. It’s not ideal, but at least each user in the “votes” list can appear only once and can pick one candidate. There’s no built-in referential integrity but that’s easier to work around than the global locking method mentioned above. Grouping votes by facet value would give me the vote totals.

I’m considering generating a query request dynamically, with one subquery per known candidate, and filtering on facet value. I don’t know how performant that would be (especially when there’s 10k+ voters).

Is there a better option?

[1]

type User {
  name
}
name: string .
type Candidate {
  title
}
title: string .
type UserCandidateVote {
  user
  candidate
}
user: uid .
candidate: uid .
type Poll {
  topic
  votes
}
topic: string .
votes: [uid] . // UserCandidateVotes

dpk · June 3, 2020, 10:02pm

For folks looking for a solution using facets (like, if you find this via Google): Sorry, this isn’t it.

Here’s what I came up with: dgraph, using groupby to report votes · GitHub

JatinDevDG · June 4, 2020, 12:03pm

Are you able to set constraint " every user can only vote once per poll" without facets ?
As far as i can understand you have below schema.

type Candidate {
	candidate_name
}
candidate_name: string @index(exact) .

type Voter {
	voter_name
}
voter_name: string @index(exact) .

type Vote {
	vote_voter
	vote_candidate
}
vote_voter: uid @reverse .
vote_candidate: uid @reverse .

type Poll {
	poll_title
	poll_votes
}
poll_title: string @index(exact) .
poll_votes: [uid] @count @reverse .

First you set voters and candidates, then assign poll titles and poll votes.
But how you achieved above constraint(if you able to), that’s not cleared to me.

dpk · June 4, 2020, 6:41pm

I could well be wrong here, but based on the way I understand the way upserts work, the conditionals I set in the two mutations ensure that either:

a) An existing vote is found and is itself modified – no new Vote is created and thus the list isn’t updated
b) An existing vote is not found – a new Vote is created and appended to the list

The upsert conditionals are orthogonal, exactly one or the other can be true. (Or neither will be true if the voter or candidate are not found).

I think the only way you could end up with a double-vote is if a second transaction could modify the poll_votes list while the first transaction is in process. Is that possible? If so I’ll need to make some changes.

Edited to add: I see now[0] that “changes to [uid] predicates can be concurrently written”. Interestingly, I’m not seeing duplicates in my tests, even when running the same node script hundreds of times in parallel. Most of the transactions are aborted, as hoped. I am running this on a three node cluster, however the nodes are all very close to each other; I don’t expect there to be enough latency to naturally result in raft conflicts. Could I just be lucky?

0: https://dgraph.io/docs/master/clients/

JatinDevDG · June 8, 2020, 7:33am

okk, yeah the upsert works the same way you described.

Second transaction can’t modify poll list while first transaction is in process.They follow ACID properties.

Write to a singular uid predicate of the same node (changes to [uid] predicates can be concurrently written), it means that if we have array of uids([uid]) they can be written concurrently but not singular uid.

If multiple transactions modify singular uid they will confict and be aborted if other conflict transaction already commits. So, you won’t get duplicate values.

Topic		Replies	Views
Suggestions to Improve GroupBy Dgraph proposal , discussion	6	2223	December 10, 2021
Counting facets Dgraph	3	687	November 27, 2018
What is the technical reason we can't store variables of @groupby unless it uses "uid"? Dgraph	3	877	October 5, 2018
Groupby + Orderasc in Query Dgraph dgraph	11	723	October 26, 2021
Max(Facet attribute) after doing groupby Dgraph dgraph	14	445	June 27, 2023

Possible to group by facet value?

Related topics