Max(Facet attribute) after doing groupby

Nidhi_Shah · June 24, 2023, 12:45pm

I have certain nodes with predicates like name, protocol etc. Nodes may or may not have same name. These nodes are connected to each other via edges named link. Link has a facet called timestamp. Given a uid, I want to find max(timestamp) for nodes with a particular name connected to it. On doing groupby (name), I am able to find “distinct name” but the facet properties of these nodes are getting lost.

I want result similiar to output of the following query of neo4j:
match (n:node)-[r:links]-(m:node) where uid={uid} return n.name,m.name,max(r.time).

I have attached a photo of the the graph and the kind of output I want.

Thanks in advance.

MichelDiz · June 24, 2023, 6:47pm

Can you share the queries you have tried?

Nidhi_Shah · June 24, 2023, 7:02pm

query="""
            {
                    find(func:uid("0x9"))
                    {
                           name
                           link @groupby(name)
                           {
                                  count(uid)
                           }
                     }
               }"""

This returns :
A{link:{name:B count:3},{name:C, count:2},{name:D,count:1}

But I want to find max(timestamp) of all 3 links of different nodes with name=B that A is connected to , max(timestamp) of all 2 links of different nodes of name C that A is connected to.

Also, if I am trying to use a variable for the link it shows an error that variables cant be used when grouped by non uid values.

query="""
            {
                    find(func:uid("0x9"))
                    {
                           name
                           edges as link @groupby(name)
                           {
                                  count(uid)
                           }
                     }
               }"""

Could you please suggest a suitable query for this problem. Thanks in advance.

I tried finding all the distinct names using the above query and then used a for loop in python to send one name at a time and running the query below:

query-"""
           {
                 find(func:uid("0x9"))
                 {
                       links @facets(orderdesc:timestamp,first:1) @filter(eq(name,$name))
                       {
                              timestamp
                        }
                  }
             }""""

This is feasible if number of nodes A is connected with is less. But on increasing the number of nodes A is connected to to 25k, the time complexity would increase drastically.

MichelDiz · June 25, 2023, 12:10am

Alright. What is the purpose behind using the groupby directive? The manner in which you are employing it is not recommended. This approach will yield a raw value from the directive, which might prove useful, but it is important to note that our documentation does not endorse users utilizing this result. Typically, the groupby directive is intended to be paired with a secondary block, which in turn applies the structure devised by the directive. It’s also crucial to understand that this directive exclusively operates with Edges. As of now, it’s not compatible with scalar values.

I am not entirely clear on the nature of the complexity issue.

Would it be possible for you to provide a sample of your dataset in JSON format? This would be extremely helpful to have a better understanding of the situation.

I did this before leaving home, but had not completed. I did this by trying to figure out the point of your case.

{
	  var(func: eq(name, "A"))  {
	    name
	    links @facets(a as timestamp) {
	      name
	    }
	  }

	   q(func: uid(a), orderasc:val(a)) {
			name
			val(a)
		}

		a(){
			max(val(a))
		}
	}

But give me a sample of your data so I can explore it.

One thing you need to keep in mind is that in Dgraph the responses are not flat like in Neo4J. We try to keep the graph structure in JSON format at all costs.

Nidhi_Shah · June 25, 2023, 1:20am

Lets say this is our sample data :

 {
   "set": [
     {
       "name":"A",
       "dgraph.type": "node" ,
       "links" : [
         {
           "name": "B" ,
           "dgraph.type": "node",
            "links|timestamp" :1
         },
         {
             "name": "B" ,
           "dgraph.type": "node",
            "links|timestamp" :2
         },
        {
            "name": "C" ,
           "dgraph.type": "node",
            "links|timestamp" :3
        },
        {
            "name": "D" ,
           "dgraph.type": "node",
            "links|timestamp" :4
        },
         {
           "name": "B" ,
           "dgraph.type": "node",
            "links|timestamp" :5
         },
         {
            "name": "C" ,
           "dgraph.type": "node",
            "links|timestamp" :6
        }
       ]
     }]

We can see that A is connected to nodes having 3 distinct names: B,C,D. 
Out of all links that connect A to nodes with name B, I want to find the max value of the facet timestamp. 
Similarly, out of all links that connect A to nodes with name C, I want to find the max value of the facet timestamp and same for nodes with name D.

If I had the above dataset in neo4j,
match (n:node)-[r:links]-(m:node) where uid={uid} return n.name,m.name,max(r.time).
This query would have returned:
A,B,5
A,C,6
A,D,4

But how do I write this in Dgraph??
Thanks in advance.

Nidhi_Shah · June 25, 2023, 1:37am

The query that you have written will give an output:
q :[
{B,1}, {B,2},{C,3},{D,4},{B,5},{C,6}
]

a: {6}

But this is not what I want,

Output I want:
q :[
{B,5}, {C,6},{D,4},
]

MichelDiz · June 25, 2023, 2:08am

This should work

{
	  var(func: eq(name, "A"))  {
	    name
	    links @facets(a as timestamp, orderdesc: timestamp, first:1)
	  }

    # q(func: uid(a), orderasc:val(a)) {
    q(func: uid(a), orderasc:name) {
       uid
       name
       val(a)
	 }

	}

The nodes should be unique. In your sample they are not.

Mutations

{
  "set": [
    {
      "uid": "_:A",
      "name": "A",
      "dgraph.type": "node",
      "links": [
        {
          "uid": "_:B",
          "name": "B",
          "dgraph.type": "node",
          "links|timestamp": 1
        },
        {
          "uid": "_:B",
          "name": "B",
          "dgraph.type": "node",
          "links|timestamp": 2
        },
        {
          "uid": "_:C",
          "name": "C",
          "dgraph.type": "node",
          "links|timestamp": 3
        },
        {
          "uid": "_:D",
          "name": "D",
          "dgraph.type": "node",
          "links|timestamp": 4
        },
        {
          "uid": "_:B",
          "name": "B",
          "dgraph.type": "node",
          "links|timestamp": 5
        },
        {
          "uid": "_:C",
          "name": "C",
          "dgraph.type": "node",
          "links|timestamp": 6
        }
      ]
    }
  ]
}

Nidhi_Shah · June 25, 2023, 2:21am

Thanks for helping.

A follow up question to this.

Suppose we edited the dataset like this:

{
set: [
{
“uid”: “:A",
“name”: “A”,
“dgraph.type”: “node”,
“links”: [
{
“uid”: ":B”,
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 1
},
{
“uid”: “:B",
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 2
},
{
“uid”: ":C”,
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 3
},
{
“uid”: “:D",
“name”: “D”,
“dgraph.type”: “node”,
“links|timestamp”: 4
},
{
“uid”: ":B”,
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 5
},
{
“uid”: “:C",
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 6
}
]
},
{
“uid”: ":A”,
“name”: “A”,
“dgraph.type”: “node”,
“links”: [
{
“uid”: “:B",
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 10
},
{
“uid”: ":B”,
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 3
},
{
“uid”: “:C",
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 1
}]
},
{
“uid”: ":B”,
“name”: “B”,
“dgraph.type”: “node”,
“links”: [
{
“uid”: “_:C”,
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 7
}
]
}
}]}

Corresponding to this dataset,
The output I want :
q :[
{A,B,10}, {A,C,6},{A,D,4},{B,C,7)
]

This is similar to the below query in neo4j if I give start=1 and end=19.
Match (n:node)-[r:links]-(m:node)
Where {start} <= r.timestamp<= {end}
Return n.name,m.name,max(r.timestamp)

How do I approach this?

Nidhi_Shah · June 25, 2023, 2:55am

links @facets(a as timestamp, orderdesc: timestamp, first:1)
Why did we do “first:1” here?? What does it mean?

#q(func: uid(a), orderasc:val(a))

“#” what does this mean? And what does val(a) contain?

Thanks in advance.

MichelDiz · June 25, 2023, 3:26am

That is paggination. Pagination - Query language

let me explain what I did

@facets(a as timestamp, orderdesc: timestamp, first:1)

So, in those params. I’m setting timestamp in a variable and ordering the facets results by timestamp. And using the paggination to kind of “filter out” the other results. Note that it is in descending order. That is, when filtering, there will only be the result that I need that is at the top. This is a “hack” way of filtering something.

This is a comment. I have commented it out hoping you would get it as an example of other possibility of query params.

Nidhi_Shah · June 25, 2023, 3:38am

Thanks!!
Could you suggest an approach for the follow up question too.
Thanks in advance

Nidhi_Shah · June 25, 2023, 3:56pm

I tried doing this the output I got:

{
q:[ {‘uid’:0x32, ‘name’:‘B’, ‘val(a)’:1} , {‘uid’:0x33, ‘name’:‘B’, ‘val(a)’:5} , {‘uid’:0x34,
‘name’:‘B’, ‘val(a)’:2}, {‘uid’:0x35, ‘name’:‘C’, ‘val(a)’:3}, {‘uid’:0x36, ‘name’:‘C’, ‘val(a)’:6},
{‘uid’:0x37, ‘name’:‘D’, ‘val(a)’:4}]

what I wanted:
q:[ {‘uid’:0x33, ‘name’:‘B’, ‘val(a)’:5} , {‘uid’:0x36, ‘name’:‘C’, ‘val(a)’:6},{‘uid’:0x37,
‘name’:‘D’, ‘val(a)’:4}]

Could you please help.

MichelDiz · June 25, 2023, 6:41pm

I am currently not in the office. On Monday, I will be able to review this thoroughly. However, regarding your latest inquiry, it is imperative that you clear the database prior to attempting to use the modified sample I sent you. The presence of any outdated data in the DB could adversely affect the output. I tested it before sending it to you. It should output correctly.

Nidhi_Shah · June 26, 2023, 2:30am

Could you send your output?
I tried it after deleting outdated data but didn’t get expected output.

MichelDiz · June 27, 2023, 12:14am

Topic		Replies	Views
Possible to group by facet value? Dgraph	6	593	June 8, 2020
Aggregating only top x nodes Dgraph	9	698	June 15, 2018
Suggestions to Improve GroupBy Dgraph proposal , discussion	6	2223	December 10, 2021
Query Help: Summarising paths Dgraph kind:question	5	650	October 4, 2020
How to find duplicate nodes based on a scalar attribute value? Users	10	2352	October 20, 2018

Max(Facet attribute) after doing groupby

Related topics