Unique count of edges

the42 · November 19, 2020, 9:04am

Assume I have a schema like

Office → Job → Employee

In my DB I have inserted multiple Offices, which have multiple jobs and multiple employees.

An employee can be assigned multiple jobs.

I am interested in a query which returns the office with the max employees per head count. Note, as employees can be logically assigned multiple jobs, a unique count at the level of office is required.

This far I have:

{
  q(func:type("Office")) {
    officename: Office.bezeichnung
    jobs: Office.jobs {
      description: Job.beschreibung
      cntemployees: count(Job.employees)
      Job.employees{
        uid
      }
    }
  }
}

What I need is to count the unuique ids of employees per office and from that retrieve the office having the max employees.

the42 · November 19, 2020, 10:32am

Maybe this get’s me somewhere?

{
 q(func:type("Office"))  {
   officename: Office.bezeichnung
   	Office.jobs @normalize  {
      Job.employees  {
      employeeid: uid
     }
   }
 }
}

Now if I could combine normalize with groupby(employeeid) …

the42 · November 19, 2020, 12:52pm

Getting closer: This query

{
  q(func:type("Office"))  {
    officename: Office.bezeichnung
		Office.jobs @groupby(emp: Job.employees) 
  }
}

returns (among others):

{
  "data": {
    "q": [
      {
        "officename": "My Company",
        "Office.jobs": [
          {
            "@groupby": [
              {
                "emp": "0x1b"
              },
              {
                "emp": "0x1c"
              },
              {
                "emp": "0x18"
              },
              {
                "emp": "0x1d"
              }
            ]
          }
        ]
      },

Now what I want to get is the count of the @groupby

Maxing out to get only the office with the highest count on this groupby

MichelDiz · November 19, 2020, 5:28pm

Something like this?

{
  var(func:type(Office))  {
    officename: Office.bezeichnung
		Office.jobs @groupby(Job.employees) {
           OJ as count(uid)
   }
  }
 q(func: uid(OJ))  {
    expand(_all_)
    total: val(OJ)
  }
}

the42 · November 19, 2020, 6:58pm

Result is:

{
  "name": "t",
  "url": "http://localhost:8080/query?timeout=20s",
  "errors": [
    {
      "message": ": Query couldn't be executed",
      "extensions": {
        "code": "ErrorInvalidRequest"
      }
    }
  ]
}

dgraph version:

Dgraph version : v2.0.0-rc1-910-g5a5f131c
Dgraph codename : unnamed
Dgraph SHA-256 : af3d768c384a55cf8dab648e5fed470ea977358383b7b77965ae6fddfae4f1b0
Commit SHA-1 : 5a5f131c
Commit timestamp : 2020-11-05 16:37:08 +0530
Branch : master
Go version : go1.14.4
jemalloc enabled : true

MichelDiz · November 19, 2020, 7:42pm

Sorry, I have shared a wrong query, I have updated it.

the42 · November 20, 2020, 6:00am

Unfortunately no, this would give me all employees. I am primarily interested in the number of employees per office. With the query you provided I lose the information in what office which employee is working. Even if I would go from there using reverse edges I still would have to hop over the job edge, again resulting in a different answer than what I am looking for.

Getting the cardinality of the @groupby array Unique count of edges - #3 by the42 is what I am looking for - it represents the number of distinct employees per office.

the42 · November 20, 2020, 6:07am

I shouldn’t have shorted the result I showed because

{
  q(func:type("Office"))  {
    officename: Office.bezeichnung
		Office.jobs @groupby(emp: Job.employees) 
  }
}

gives

{
  "data": {
    "q": [
      {
        "officename": "My office",
        "Office.jobs": [
          {
            "@groupby": [
              {
                "emp": "0x1b"
              },
              {
                "emp": "0x1c"
              },
              {
                "emp": "0x18"
              },
              {
                "emp": "0x1d"
              }
            ]
          }
        ]
      },
      {
        "officename": "Your office",
        "Office.jobs": [
          {
            "@groupby": [
              {
                "emp": "0x1a"
              }
            ]
          }
        ]
      }
    ]
  },

the xids of employees per / per Office.jobs. How can I get the count / per office?

ppp225 · November 20, 2020, 10:32pm

Michel’s query will work well per office, so if you query a single office at root, you can get employees per job.
What you want is to have groupby of employees by office. For that to work you’d need an additional
Office --> Employee edge. I feel it makes logical sense to have this edge as well and it will probably simplify a lot of queries.

How can I get the count / per office?

Without the edge, I would do it like below:

{
  q(func:type("Office")) @normalize {
    officename: Office.bezeichnung
    Office.jobs {
      EMP_PER_JOB as count(Job.employees)
    }
    emp: EMP_PER_OFFICE as sum(val(EMP_PER_JOB))
  }
  max() {
    max: max(val(EMP_PER_OFFICE))
  }
}

But now I see you say that

employees can be logically assigned multiple jobs

So this won’t work. And I’m not sure it’s possible do do in a single query, probably not.
Maybe you can keep an officeEmployees predicate, and then use it in queries.

the42 · November 23, 2020, 11:23am

Thank you for your support!

Coming from an SQL background, graph-thinking is still new for me. I am evaluating a Graph database as a means to implement some sort of “data lake” for my organisation. It feels natural to use a graph data base as it naturally supports easy interlinking between entities, sparse objects and schema-free operations.

However, performing operations on data structures do not come naturally with DGraph. So instead of ELT I would have to do ETL - transform the data before loading to give answers. However this violates in my opinion two “big data” principles:

Extract - Load - Transfer = Load the data first (instead of ETL in a BI-world), Transfer while querying
The Algorithm should come to the data, not the data to the algorithm. Sure, I can always extract data into a third-party tool and perform further aggregations there. But I think it is better that the algorithm comes to the data instead of sending gobs of data over the network.

ppp225 · November 23, 2020, 1:12pm

The problem with SQL is that adding a new relation can be problematic. In dgraph you can add this edge relatively quickly at basically no cost.

I think your problem could also be solved with foreach loop, which is a proposal for now. Then you could loop for each office, get empl for each one; and aggregate/range at the end.

You would be basically looping a query like this, which I think gets what you need for one office:

{
  office(func:type("Office"), first: 1) {
    officename: Office.bezeichnung
    Office.jobs {
      EMPL as Job.employees
    }
  }
  allemplforoffice(func: uid(EMPL)) {
    count(uid)
  }
}

Topic		Replies	Views
Unique count for each node of the same type App Development kind:question	0	690	December 10, 2021
Best way to count unique values? Dgraph	4	1426	September 12, 2023
Count distinct values of predicate: @groupby + count performance question Dgraph kind:question , dgraph	1	530	March 13, 2023
Get unique values for a predicate? GraphQL	2	2122	October 3, 2022
Count all nodes in Dgraph GraphQL dgraph	6	1899	August 6, 2020

Unique count of edges

Related topics