DQL group by query

Luca_Trazzi · October 16, 2021, 7:52am

Hi boys, I’ve been using dgraph in the last two year in a big pet project that we’re about to launch live.

However, I’m struggling with some complex queries, I really think that the weak point of this amazing product is a decent IDE, ratel lacks some stuff, linting, formatting, autocompletes, making really hard to build queries.

Anyway, back to my question.

I have the following schema.

<id>: string @index(hash) @upsert .
<createdWhen>: datetime .
<dislikedBy>: uid @reverse .
<dislikedFeedback>: uid @reverse .
<likedBy>: uid @reverse .
<likedFeedback>: uid @reverse .

type <DislikeFeedbackGraphModel> {
	id
	createdWhen
	dislikedBy
	dislikedFeedback
}
type <LikeFeedbackGraphModel> {
	id
	createdWhen
	likedBy
	likedFeedback
}
type <FeedbackGraphModel> {
	id
	createdWhen
}
type <PersonGraphModel> {
	id
	createdWhen
}

So, think about facebook.

You have a feedback, a feedback can be liked / disliked by a person.

As I want some metadata of these actions (example, I liked two days ago, I disliked yesterday, and I liked again) I had to make the middle entity LikeGraphModel & DislikeGraphModel.

Now, in my UI, I just want to show only the last activity of this person on this feedback, meaning the Like.

So starting by a feedback, I need to grab all his like/dislikes, group by person (likedBy / dislikedBy), sort by createdWhen, and return the last one.

Is that even possible in DQL? I tried creating 3 vars, merging the stuff, but I could not succeed, unluckly. My current solution is just to grab all likes, dislikes, and do it in memory, but of course, I cannot keep this solution if we go big.

MichelDiz · October 16, 2021, 3:35pm

I would create this schema design a little differently. I would use intermediate nodes (I noticed it already is, but I would reduce it to a single entity). And that intermediate node would be both likes and dislikes and any other reaction. I would differentiate them by the edge only.

The Groupby function works only on edges.

Not sure what you mean, but doesn’t make much sense. We would group “Bobs, Alices and Lucas”? e.g. I wanna know how many Alices, Bob have liked? Is that?

The rest:

{
  q(func: type(User)) @filter(eq(name, "Bob")) {
    name
    age
    reactions {
      likedUser { UID }
      dislikedUser { UID }
      created_at
      closed_at
    }
  }
}

This above will show all likes and dislikes related to Bob

{
  q(func: type(User)) @filter(eq(name, "Bob")) {
    name
    age
    reactions_like : reactions (orderdesc: created_at, first: 1){
      likedUser { UID }
      created_at
      closed_at
    }
    reactions_dislike : reactions (orderdesc: closed_at, first: 1){
      dislikedUser { UID }
      created_at
      closed_at
    }
  }
}

Show what you have tried.

Luca_Trazzi · October 16, 2021, 6:47pm

Let’s try with an example.

<0x01> <dgraph.type> "Person" .
<0x01> <id> "luca" .

<0x02> <dgraph.type> "Person" .
<0x02> <id> "michel" .

<0x03> <dgraph.type> "Feedback" .
<0x03> <id> "some-feedback" .

<0x04> <dgraph.type> "LikeFeedback" .
<0x04> <likedBy> "0x01" .
<0x04> <likedFeedback> "0x03" .

<0x05> <dgraph.type> "LikeFeedback" .
<0x05> <likedBy> "0x02" .
<0x05> <likedFeedback> "0x03" .

<0x06> <dgraph.type> "DislikeFeedback" .
<0x06> <likedBy> "0x01" .
<0x06> <likedFeedback> "0x03" .

So, we have luca and michael.
we have 1 feedback.
Luca, I like, and then I dislike.
Michael, instead, will just dislike.

my feedback query should return, 1 like (the michael one) and 1 dislike (luca one) so it should discard the Luca like because it’s older than dislike.

You see what I mean now?

About having the same Like/Dislike type. totally makes sense, but I did this way in order to have an easier reverse query.

person {
 likedFeedbacks ~likedBy {}
 dislikedFeedbacks ~dislikedBy {}
}

Does this sound reasonable to you? Would you still use the same schema and instead filter the reverse with the type?

MichelDiz · October 16, 2021, 9:38pm

This RDF feels odd. e.g:

This
<0x06> <likedFeedback> "0x03" .
Should be like this
<0x06> <likedFeedback> <0x03> .

Nope, put an extra hop creates more work to traverse in a query.

Can you share the query itself?

You can reverse from any step. The problem is creating unnecessary hops between the objects/Types.

Humm, so this is like a “post”?

Share what you are actually doing and let’s talk.

Dave_Aitel · October 16, 2021, 9:38pm

What is @upsert in this context? I tried a quick google and couldn’t figure it out
-0dave

MichelDiz · October 16, 2021, 9:41pm

The upsert directive is used to avoid concurrent mutations on the fly. Won’t work in another context tho.

Luca_Trazzi · October 17, 2021, 5:29pm

Sorry, I did some mistakes with the RDF, now I’m able to share a replicable schema & data.

<createdWhen>: datetime .
<dgraph.drop.op>: string .
<dgraph.graphql.p_query>: string @index(sha256) .
<dgraph.graphql.schema>: string .
<dgraph.graphql.xid>: string @index(exact) @upsert .
<dislikedBy>: uid @reverse .
<dislikedFeedback>: uid @reverse .
<id>: string @index(hash) @upsert .
<likedBy>: uid @reverse .
<likedFeedback>: uid @reverse .
type <DislikeFeedbackGraphModel> {
	id
	createdWhen
	dislikedBy
	dislikedFeedback
}
type <FeedbackGraphModel> {
	id
	createdWhen
}
type <LikeFeedbackGraphModel> {
	id
	createdWhen
	likedBy
	likedFeedback
}
type <PersonGraphModel> {
	id
	createdWhen
}
type <dgraph.graphql> {
	dgraph.graphql.schema
	dgraph.graphql.xid
}
type <dgraph.graphql.persisted_query> {
	dgraph.graphql.p_query
}

{
  set {
    _:Luca <dgraph.type> "Person" .
    _:Luca <id> "luca" .

    _:Michel <dgraph.type> "Person" .
    _:Michel <id> "michel" .

    _:Feedback <dgraph.type> "Feedback" .
    _:Feedback <id> "some-feedback" .

    _:Like1 <dgraph.type> "LikeFeedback" .
    _:Like1 <id> "liked-by-luca" .
    _:Like1 <createdWhen> "1985-06-08"^^<xs:dateTime> .
    _:Like1 <likedBy> _:Luca .
    _:Like1 <likedFeedback> _:Feedback .

    _:Like2 <dgraph.type> "LikeFeedback" .
    _:Like2 <id> "liked-by-michel" .
    _:Like2 <createdWhen> "1990-06-08"^^<xs:dateTime> .
    _:Like2 <likedBy> _:Michel .
    _:Like2 <likedFeedback> _:Feedback .

    _:Dislike1 <dgraph.type> "DislikeFeedback" .
    _:Dislike1 <id> "disliked-by-luca" .
    _:Dislike1 <createdWhen> "2021-06-08"^^<xs:dateTime> .
    _:Dislike1 <dislikedBy> _:Luca .
    _:Dislike1 <dislikedFeedback> _:Feedback .
  }
}

now, let’s try this sample query:

query getFeedbackById(){ 

  feedback (func: type(Feedback)) @filter(eq(id, "some-feedback")) {
    id
  likesCount: count (~likedFeedback)
   liked: ~likedFeedback {
   createdWhen
  likedBy { id }
    }
    
  dislikesCount: count (~dislikedFeedback)
  disliked: ~dislikedFeedback {	
  createdWhen
  dislikedBy { id }
    }
  }
}

that will return

{
        "id": "some-feedback",
        "likesCount": 2,
        "liked": [
          {
            "createdWhen": "1985-06-08T00:00:00Z",
            "likedBy": {
              "id": "luca"
            }
          },
          {
            "createdWhen": "1990-06-08T00:00:00Z",
            "likedBy": {
              "id": "michel"
            }
          }
        ],
        "dislikesCount": 1,
        "disliked": [
          {
            "createdWhen": "2021-06-08T00:00:00Z",
            "dislikedBy": {
              "id": "luca"
            }
          }
        ]
      }

which is ok.

The only problem is that, I would like to see likesCount: 1 instead of 2.

Why? because Luca has a more recent Dislike than the Like, so the Dislike should “override” the like in terms of counting them.

So I’m trying these paths


                 
query getFeedbackById(){ 
  
   var(func:has(createdWhen), orderdesc: createdWhen)  {
    @groupBy(likedBy / dislikedBy)
    {
    likesAndDislikes as uid  
    }
  }
  
  likesAndDislikes(func: uid(likesAndDislikes)) {
		id
    dgraph.type
    createdWhen
    likedBy { id }
    dislikedBy { id }
  }
  
}

this is indeed an invalid query, but it’s to give the idea to you.

And yes, a “feedback” is like a “post”.

amaster507 · October 17, 2021, 7:52pm

Can you change the schema to fit your use case? Do you need historic likes and dislikes? If not, then use a single edge and a connecting node. In the connector node add a predicate for reaction type and track there if it was like or dislike. Then when someone changes, it is changing that predicate of the connector node instead of adding new nodes and new edges. You can still count using edge aliases and filters to get total likes and dislikes.

Luca_Trazzi · October 18, 2021, 7:21am

Unfortunately, I need to track history changes, that’s why I added middle nodes, it was a plain array of likes / dislikes before that.

But yes, I can change the schema to whatever it needs, I just need to meet this criteria.

Topic		Replies	Views
Groupby on multiple fields using DQL Dgraph kind:question	20	1624	September 16, 2022
Using custom DQL with groupby GraphQL kind:enhancement , status:accepted , ticket:created	5	1360	May 27, 2021
Select distinct values / deduplicate Dgraph kind:question , dql	2	1502	April 27, 2021
Groupby on top of Sorting on Dgraph GraphQL kind:question	0	533	February 12, 2022
DQL: Why Dgraph groupby query slow? Dgraph dgraph	0	499	March 14, 2022

DQL group by query

Related topics