Max(Facet attribute) after doing groupby

I have certain nodes with predicates like name, protocol etc. Nodes may or may not have same name. These nodes are connected to each other via edges named link. Link has a facet called timestamp. Given a uid, I want to find max(timestamp) for nodes with a particular name connected to it. On doing groupby (name), I am able to find “distinct name” but the facet properties of these nodes are getting lost.

I want result similiar to output of the following query of neo4j:
match (n:node)-[r:links]-(m:node) where uid={uid} return n.name,m.name,max(r.time).

I have attached a photo of the the graph and the kind of output I want.

Thanks in advance.

Can you share the queries you have tried?

query="""
            {
                    find(func:uid("0x9"))
                    {
                           name
                           link @groupby(name)
                           {
                                  count(uid)
                           }
                     }
               }"""

This returns :
A{link:{name:B count:3},{name:C, count:2},{name:D,count:1}

But I want to find max(timestamp) of all 3 links of different nodes with name=B that A is connected to , max(timestamp) of all 2 links of different nodes of name C that A is connected to.

Also, if I am trying to use a variable for the link it shows an error that variables cant be used when grouped by non uid values.

query="""
            {
                    find(func:uid("0x9"))
                    {
                           name
                           edges as link @groupby(name)
                           {
                                  count(uid)
                           }
                     }
               }"""

Could you please suggest a suitable query for this problem. Thanks in advance.

I tried finding all the distinct names using the above query and then used a for loop in python to send one name at a time and running the query below:

query-"""
           {
                 find(func:uid("0x9"))
                 {
                       links @facets(orderdesc:timestamp,first:1) @filter(eq(name,$name))
                       {
                              timestamp
                        }
                  }
             }""""

This is feasible if number of nodes A is connected with is less. But on increasing the number of nodes A is connected to to 25k, the time complexity would increase drastically.

Alright. What is the purpose behind using the groupby directive? The manner in which you are employing it is not recommended. This approach will yield a raw value from the directive, which might prove useful, but it is important to note that our documentation does not endorse users utilizing this result. Typically, the groupby directive is intended to be paired with a secondary block, which in turn applies the structure devised by the directive. It’s also crucial to understand that this directive exclusively operates with Edges. As of now, it’s not compatible with scalar values.

I am not entirely clear on the nature of the complexity issue.

Would it be possible for you to provide a sample of your dataset in JSON format? This would be extremely helpful to have a better understanding of the situation.

I did this before leaving home, but had not completed. I did this by trying to figure out the point of your case.

{
	  var(func: eq(name, "A"))  {
	    name
	    links @facets(a as timestamp) {
	      name
	    }
	  }

	   q(func: uid(a), orderasc:val(a)) {
			name
			val(a)
		}

		a(){
			max(val(a))
		}
	}

But give me a sample of your data so I can explore it.

One thing you need to keep in mind is that in Dgraph the responses are not flat like in Neo4J. We try to keep the graph structure in JSON format at all costs.

Lets say this is our sample data :

 {
   "set": [
     {
       "name":"A",
       "dgraph.type": "node" ,
       "links" : [
         {
           "name": "B" ,
           "dgraph.type": "node",
            "links|timestamp" :1
         },
         {
             "name": "B" ,
           "dgraph.type": "node",
            "links|timestamp" :2
         },
        {
            "name": "C" ,
           "dgraph.type": "node",
            "links|timestamp" :3
        },
        {
            "name": "D" ,
           "dgraph.type": "node",
            "links|timestamp" :4
        },
         {
           "name": "B" ,
           "dgraph.type": "node",
            "links|timestamp" :5
         },
         {
            "name": "C" ,
           "dgraph.type": "node",
            "links|timestamp" :6
        }
       ]
     }]

We can see that A is connected to nodes having 3 distinct names: B,C,D. 
Out of all links that connect A to nodes with name B, I want to find the max value of the facet timestamp. 
Similarly, out of all links that connect A to nodes with name C, I want to find the max value of the facet timestamp and same for nodes with name D.

If I had the above dataset in neo4j,
match (n:node)-[r:links]-(m:node) where uid={uid} return n.name,m.name,max(r.time).
This query would have returned:
A,B,5
A,C,6
A,D,4

But how do I write this in Dgraph??
Thanks in advance.

The query that you have written will give an output:
q :[
{B,1}, {B,2},{C,3},{D,4},{B,5},{C,6}
]

a: {6}

But this is not what I want,

Output I want:
q :[
{B,5}, {C,6},{D,4},
]

This should work

{
	  var(func: eq(name, "A"))  {
	    name
	    links @facets(a as timestamp, orderdesc: timestamp, first:1)
	  }

    # q(func: uid(a), orderasc:val(a)) {
    q(func: uid(a), orderasc:name) {
       uid
       name
       val(a)
	 }

	}

The nodes should be unique. In your sample they are not.

Mutations

{
  "set": [
    {
      "uid": "_:A",
      "name": "A",
      "dgraph.type": "node",
      "links": [
        {
          "uid": "_:B",
          "name": "B",
          "dgraph.type": "node",
          "links|timestamp": 1
        },
        {
          "uid": "_:B",
          "name": "B",
          "dgraph.type": "node",
          "links|timestamp": 2
        },
        {
          "uid": "_:C",
          "name": "C",
          "dgraph.type": "node",
          "links|timestamp": 3
        },
        {
          "uid": "_:D",
          "name": "D",
          "dgraph.type": "node",
          "links|timestamp": 4
        },
        {
          "uid": "_:B",
          "name": "B",
          "dgraph.type": "node",
          "links|timestamp": 5
        },
        {
          "uid": "_:C",
          "name": "C",
          "dgraph.type": "node",
          "links|timestamp": 6
        }
      ]
    }
  ]
}

Thanks for helping.

A follow up question to this.

Suppose we edited the dataset like this:

{
set: [
{
“uid”: “:A",
“name”: “A”,
“dgraph.type”: “node”,
“links”: [
{
“uid”: "
:B”,
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 1
},
{
“uid”: “:B",
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 2
},
{
“uid”: "
:C”,
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 3
},
{
“uid”: “:D",
“name”: “D”,
“dgraph.type”: “node”,
“links|timestamp”: 4
},
{
“uid”: "
:B”,
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 5
},
{
“uid”: “:C",
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 6
}
]
},
{
“uid”: "
:A”,
“name”: “A”,
“dgraph.type”: “node”,
“links”: [
{
“uid”: “:B",
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 10
},
{
“uid”: "
:B”,
“name”: “B”,
“dgraph.type”: “node”,
“links|timestamp”: 3
},
{
“uid”: “:C",
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 1
}]
},
{
“uid”: "
:B”,
“name”: “B”,
“dgraph.type”: “node”,
“links”: [
{
“uid”: “_:C”,
“name”: “C”,
“dgraph.type”: “node”,
“links|timestamp”: 7
}
]
}
}]}

Corresponding to this dataset,
The output I want :
q :[
{A,B,10}, {A,C,6},{A,D,4},{B,C,7)
]

This is similar to the below query in neo4j if I give start=1 and end=19.
Match (n:node)-[r:links]-(m:node)
Where {start} <= r.timestamp<= {end}
Return n.name,m.name,max(r.timestamp)

How do I approach this?

links @facets(a as timestamp, orderdesc: timestamp, first:1)
Why did we do “first:1” here?? What does it mean?

#q(func: uid(a), orderasc:val(a))

“#” what does this mean? And what does val(a) contain?

Thanks in advance.

That is paggination. Pagination - Query language

let me explain what I did

@facets(a as timestamp, orderdesc: timestamp, first:1)

So, in those params. I’m setting timestamp in a variable and ordering the facets results by timestamp. And using the paggination to kind of “filter out” the other results. Note that it is in descending order. That is, when filtering, there will only be the result that I need that is at the top. This is a “hack” way of filtering something.

This is a comment. I have commented it out hoping you would get it as an example of other possibility of query params.

1 Like

Thanks!!
Could you suggest an approach for the follow up question too.
Thanks in advance

I tried doing this the output I got:

{
q:[ {‘uid’:0x32, ‘name’:‘B’, ‘val(a)’:1} , {‘uid’:0x33, ‘name’:‘B’, ‘val(a)’:5} , {‘uid’:0x34,
‘name’:‘B’, ‘val(a)’:2}, {‘uid’:0x35, ‘name’:‘C’, ‘val(a)’:3}, {‘uid’:0x36, ‘name’:‘C’, ‘val(a)’:6},
{‘uid’:0x37, ‘name’:‘D’, ‘val(a)’:4}]

what I wanted:
q:[ {‘uid’:0x33, ‘name’:‘B’, ‘val(a)’:5} , {‘uid’:0x36, ‘name’:‘C’, ‘val(a)’:6},{‘uid’:0x37,
‘name’:‘D’, ‘val(a)’:4}]

Could you please help.

I am currently not in the office. On Monday, I will be able to review this thoroughly. However, regarding your latest inquiry, it is imperative that you clear the database prior to attempting to use the modified sample I sent you. The presence of any outdated data in the DB could adversely affect the output. I tested it before sending it to you. It should output correctly.

Could you send your output?
I tried it after deleting outdated data but didn’t get expected output.