PS. I’m using v20.03.3
In SQL <SLC DIST
> is limited to selecting values from a row that are not the same. In Dgraph, this concept changes a little bit. Since we don’t have columns and rows, not every SQL function makes sense in a GraphDB. In the case of SLC DIST
, it is possible, but with some modifications in the dataset.
We can use Dgraph’s GroupBy to achieve a result very similar to the desired one with SLC DIST
. But for that, when migrating your dataset from SQL to Dgraph, you need to transform the value you want to select as “DISTINCT” as an entity instead of a value. GroupBy does not group by value for now.
To properly migrate a value to an entity I suggest using Upsert Block. Or if you have a huge dataset you could use OpenRefine software that you can make data mining and fix the data itself.
e.g.:
Let’s consider this dataset:
{
"set": [
{
"userID": "1",
"name": "Thomas A. Anderson",
"Country": "Matrix"
},
{
"userID": "2",
"name": "Agent Smith",
"Country": "Matrix"
},
{
"userID": "3",
"name": "Arthur Schopenhauer",
"Country": "Germany"
},
{
"userID": "4",
"name": "Till Lindemann",
"Country": "Germany"
},
{
"userID": "5",
"name": "Michel",
"Country": "Brazil"
},
{
"userID": "6",
"name": "Andreas Rudolf Kisser",
"Country": "Brazil"
},
{
"userID": "7",
"name": "Derrick Green",
"Country": "United States"
}
]
}
Run this Upsert 7 times:
upsert {
query {
q(func: has(Country), first:1) @filter(NOT has(migrated)) {
v as uid
C as Country
}
q2(func: eq(name, val(C))) {
val(C)
u2 as uid
}
tmp(){
Ctmp as sum(val(C))
}
}
mutation @if(eq(len(u2), 0) AND eq(len(v), 1)) {
set {
_:New <name> val(Ctmp) .
_:New <dgraph.type> "Country" .
uid(v) <livesIN> _:New .
uid(v) <migrated> "" .
}
}
mutation @if(eq(len(u2), 1) AND eq(len(v), 1)) {
set {
uid(v) <livesIN> uid(u2) .
uid(v) <migrated> "" .
}
}
}
You can use this query to check the dataset
{
me(func: has(name)) @filter(NOT type(Country)) {
uid
name
Country
livesIN {
name
}
}
}
The comparison
Considering the previous dataset, in SQL it would do like:
SELECT DISTINCT Country FROM something;
Since we don’t have Tables, we could consider “FROM” related to Dgraph Type. Anyway, we are not working wih type for now. In Dgraph this SQL would be converted to (Of course, after migrating the value to the entity as mentioned previously):
{
var(func: has(name)) @groupby(livesIN) {
a as count(uid)
}
SELECT_DISTINCT(func: uid(a), orderdesc: val(a)) {
name
total : val(a)
total_unique : count(uid)
}
}
The total_unique value would be in SQL
SELECT COUNT(DISTINCT Country) FROM something;
Result
{
"data": {
"SELECT_DISTINCT": [
{
"total_unique": 4
},
{
"name": "Matrix",
"total": 2
},
{
"name": "Germany",
"total": 2
},
{
"name": "Brazil",
"total": 2
},
{
"name": "United States",
"total": 1
}
]
},
}