Best way to count unique values?


(Ashwin) #1

Given a simple schema:

type Entity {
    entity.type: string
    entity.name: string
}

<entity.name>: string @index(hash) .
<entity.type>: string @index(hash) .

What would be the best way to count the number of unique <entity.type> values?
There are only 3 types (organisation, person and product) but I’d like to know if there is a way to get that number by way of a query.


(Michel Conrado) #2
{
  type1(func: eq(entity.type, "myType1")){
   total: count(uid) 
  }
  type2(func: eq(entity.type, "myType2")){
   total: count(uid) 
  }  
  type3(func: eq(entity.type, "myType3")){
   total: count(uid) 
  }  
}

(Ashwin) #3

Thanks, but that’s not quite the count I’m looking for. There are ~250k Entity nodes, each containing an entity.value and an entity.type. I’m not so much interested in figuring out how many Nodes there are of a specific type, but how many unique entity.type string values there are. Suppose we have 5 Entity nodes with the following values:

{
    uid = 0x?
    entity.type = "person"
    entity.value = "Bob"
}

{
    uid = 0x?
    entity.type = "person"
    entity.value = "Alice"
}

{
    uid = 0x?
    entity.type = "organisation"
    entity.value = "Google"
}

{
    uid = 0x?
    entity.type = "product"
    entity.value = "Dgraph"
}

{
    uid = 0x?
    entity.type = "organisation"
    entity.value = "Apache"
}

There are only 3 unique values for entity.type: “person”, “organisation” and “product”. Is there any way of having these three values returned from a query?


(Michel Conrado) #4

Maybe this approach works for you, let me know if don’t.

you have “unique” count using @groupby(name)


    {
      var(func: anyofterms(name, "Alexei Jack Jose Zoe Ivy")) @groupby(works_for) {
       a as count(uid)
      }
    
      q(func: uid(a), orderdesc: val(a)) {
        name
        total_workers : val(a)
        workHere: ~works_for @groupby(name){
          count(uid)
        }
      }
    }

The Result

This query is not perfect as it exploits an unimplemented feature for this in GroupBy. However, you can use this or expect the GroupBy function to be improved.

    {
      "data": {
        "q": [
          {
            "name": "CompanyABC",
            "total_workers": 6,
            "workHere": [
              {
                "@groupby": [
                  {
                    "name": "Ivy",
                    "count": 2
                  },
                  {
                    "name": "Jack",
                    "count": 2
                  },
                  {
                    "name": "Zoe",
                    "count": 2
                  }
                ]
              }
            ]
          },
          {
            "name": "The other company",
            "total_workers": 4,
            "workHere": [
              {
                "@groupby": [
                  {
                    "name": "Alexei",
                    "count": 2
                  },
                  {
                    "name": "Jose",
                    "count": 2
                  }
                ]
              }
            ]
          }
        ]
      }
    }

Using this dataset as reference: https://tour.dgraph.io/master/schema/2/

I’ve edited it and it looks like this:

    {
      set {
        _:company1 <name> "CompanyABC" .
        _:company1 <dgraph.type> "Company" .
        _:company2 <name> "The other company" .
        _:company2 <dgraph.type> "Company" .
    
        _:company1 <industry> "Machinery" .
    
        _:company2 <industry> "High Tech" .
    
        _:jack <works_for> _:company1 .
        _:jack <dgraph.type> "Person" .
    
        _:ivy <works_for> _:company1 .
        _:ivy <dgraph.type> "Person" .
    
        _:zoe <works_for> _:company1 .
        _:zoe <dgraph.type> "Person" .
    
        _:jack <name> "Jack" .
        _:ivy <name> "Ivy" .
        _:zoe <name> "Zoe" .
        _:jose <name> "Jose" .
        _:alexei <name> "Alexei" .
        
        #duplicated
        
        _:jack2 <name> "Jack" .
     		_:jack2 <works_for> _:company1 .
        _:jack2 <dgraph.type> "Person" .
        
        _:ivy2 <name> "Ivy" .
     		_:ivy2 <works_for> _:company1 .
        _:ivy2 <dgraph.type> "Person" .
        
        _:zoe2 <name> "Zoe" .
     		_:zoe2 <works_for> _:company1 .
        _:zoe2 <dgraph.type> "Person" .
        
        _:jose2 <name> "Jose" .
     		_:jose2 <works_for> _:company2 .
        _:jose2 <dgraph.type> "Person" .
        
        _:alexei2 <name> "Alexei" .
     		_:alexei2 <works_for> _:company2 .
        _:alexei2 <dgraph.type> "Person" .
        
    		#duplicated end
    
        _:jose <works_for> _:company2 .
        _:jose <dgraph.type> "Person" .
        _:alexei <works_for> _:company2 .
        _:alexei <dgraph.type> "Person" .
    
        _:ivy <boss_of> _:jack .
    
        _:alexei <boss_of> _:jose .
      }
    }