SELECT DISTINCT in Dgraph

MichelDiz · June 6, 2020, 9:19pm

PS. I’m using v20.03.3

In SQL <SLC DIST> is limited to selecting values from a row that are not the same. In Dgraph, this concept changes a little bit. Since we don’t have columns and rows, not every SQL function makes sense in a GraphDB. In the case of SLC DIST, it is possible, but with some modifications in the dataset.

We can use Dgraph’s GroupBy to achieve a result very similar to the desired one with SLC DIST. But for that, when migrating your dataset from SQL to Dgraph, you need to transform the value you want to select as “DISTINCT” as an entity instead of a value. GroupBy does not group by value for now.

To properly migrate a value to an entity I suggest using Upsert Block. Or if you have a huge dataset you could use OpenRefine software that you can make data mining and fix the data itself.

e.g.:

Let’s consider this dataset:

{
   "set": [
      {
         "userID": "1",
         "name": "Thomas A. Anderson",
         "Country": "Matrix"
      },
      {
         "userID": "2",
         "name": "Agent Smith",
         "Country": "Matrix"
      },
      {
         "userID": "3",
         "name": "Arthur Schopenhauer",
         "Country": "Germany"
      },
      {
         "userID": "4",
         "name": "Till Lindemann",
         "Country": "Germany"
      },
      {
         "userID": "5",
         "name": "Michel",
         "Country": "Brazil"
      },
      {
         "userID": "6",
         "name": "Andreas Rudolf Kisser",
         "Country": "Brazil"
      },
      {
         "userID": "7",
         "name": "Derrick Green",
         "Country": "United States"
      }
   ]
}

Run this Upsert 7 times:

upsert {
  query {
    q(func: has(Country), first:1) @filter(NOT has(migrated)) {
      v as uid
      C as Country
    }
    q2(func: eq(name, val(C))) {
      val(C)
      u2 as uid
    }
    tmp(){
    Ctmp as sum(val(C))
    }
  }

  mutation @if(eq(len(u2), 0) AND eq(len(v), 1)) {
    set {
      _:New <name> val(Ctmp) .
      _:New <dgraph.type> "Country" .
      uid(v) <livesIN> _:New .
      uid(v) <migrated> "" .
    }
  }

  mutation @if(eq(len(u2), 1) AND eq(len(v), 1)) {
    set {
      uid(v) <livesIN> uid(u2) .
      uid(v) <migrated> "" .
    }
  }
}

You can use this query to check the dataset

{
  me(func: has(name)) @filter(NOT type(Country)) {
    uid
    name
    Country
    livesIN {
      name
    }
  }
}

The comparison

Considering the previous dataset, in SQL it would do like:

SELECT DISTINCT Country FROM something;

Since we don’t have Tables, we could consider “FROM” related to Dgraph Type. Anyway, we are not working wih type for now. In Dgraph this SQL would be converted to (Of course, after migrating the value to the entity as mentioned previously):

{
  var(func: has(name)) @groupby(livesIN) {
      a as count(uid)
  }

  SELECT_DISTINCT(func: uid(a), orderdesc: val(a)) {
    name
    total : val(a)
    total_unique : count(uid)
  }
}

The total_unique value would be in SQL

SELECT COUNT(DISTINCT Country) FROM something;

Result

{
  "data": {
    "SELECT_DISTINCT": [
     {
        "total_unique": 4
      },
      {
        "name": "Matrix",
        "total": 2
      },
      {
        "name": "Germany",
        "total": 2
      },
      {
        "name": "Brazil",
        "total": 2
      },
      {
        "name": "United States",
        "total": 1
      }
    ]
  },
}

dmai · June 18, 2020, 4:31pm

Three ideas we can explore:

Support group by with values
A function called val_distinct(a) or distinct(a) that operates on a value variable
A function called distinct(val(a)) that operators on values.

amaster507 · January 15, 2021, 5:23pm

Is there any update on this? I am trying to generate some dynamic option fields based on queries. I have a query that returns 27K nodes, but only has a few distinct values for the predicate I am using. Trying to do all of this in GraphQL which doesn’t seem possible for this one yet. Has this even made it into DQL yet?

amaster507 · April 18, 2021, 1:03am

Deja vu, I find myself back here again.

@dmai, has any of these ideas been explored?

UPDATE(for myself next time searching for this): too much to remember sometimes…

Here is the syntax query for the above data:

{
  distinctCountries(func: has(Country)) @groupby(Country) {
    # no need to select any fields, Country will be selected in the group by results
    # if you want to count the duplicates, use: count(uid)
  }
}

This syntax would return the following results:

{
  "data": {
    "distinctCountries": [
      {
        "@groupby": [
          {
            "Country": "Matrix"
          },
          {
            "Country": "Germany"
          },
          {
            "Country": "Brazil"
          },
          {
            "Country": "United States"
          }
        ]
      }
    ]
  }
}

And for reference, this can then be included as a custom GraphQL query using the @remoteResponse directive

Topic		Replies	Views
Select distinct values / deduplicate Dgraph kind:question , dql	2	1505	April 27, 2021
Remove Duplicates from Results GraphQL graphql , dgraph	3	3066	September 1, 2020
How to get the distinct node property Dgraph	1	980	July 1, 2018
Is it possible to select distinct non-uid predicate? Dgraph	9	1877	August 7, 2018
Get unique values for a predicate? GraphQL	2	2237	October 3, 2022

SELECT DISTINCT in Dgraph

The comparison

Result

Related topics