Is it possible to select distinct non-uid predicate?


(Ondrej Slinták) #1

Given following schema:

user: string .
group_owner: uid @reverse .
group_type: string .

and following dataset:

{
    set {
        _:user1 <user> "user1" .
        _:user2 <user> "user2" .
        _:a1 <group_type> "a" .
        _:a1 <group_owner> _:user1 .
        _:a2 <group_type> "a" .
        _:a2 <group_owner> _:user2 .
        _:b1 <group_type> "b" .
        _:b1 <group_owner> _:user1 .
        _:b2 <group_type> "b" .
        _:b2 <group_owner> _:user2 .
        _:c <group_type> "c" .
    }
}

Is it possible to return distinct group_type from groups owned by multiple users and then paginate them? I tried the following:

{
  var(func: uid(0x10a90e, 0x10a90f)) {
    ~group_owner {
    	gt as group_type
    }  
  }
  
  q(func: uid(gt), first: 2) {
    group_type
  }
}

Which lists even duplicates. I also tried to @groupby(group_type) and then count/max, but that doesn’t seem to be a way forward if I want to also paginate the results. Is this even currently possible in Dgraph?


(Michel Conrado (Support Engineer)) #2

To do reverse you need to set your schema as @reverse

here => group_owner: uid @reverse .

Your mutation is confusing to read just do:

{
    set {
        _:user1 <user> "user1" .
        _:user2 <user> "user2" .
        _:a1 <group_type> "a1" .
        _:a1 <group_owner> _:user1 .
        _:a2 <group_type> "a2" .
        _:a2 <group_owner> _:user2 .
        _:b1 <group_type> "b1" .
        _:b1 <group_owner> _:user1 .
        _:b2 <group_type> "b2" .
        _:b2 <group_owner> _:user2 .
        _:c <group_type> "c1" .
    }
}

Actually I did not quite understand what u wanted, I spent some time trying to understand and simulate at the same time. But I do not think I’ve reached to any conclusion.

This was what I tried:

{
  GET_USER as var(func: has(user))

 test(func: uid(GET_USER)) {
    uid
    user
    ~group_owner (first:2) {
    	group_type    } 
}
}

other Q.

{
  test(func: has(group_owner)){
    uid
    group_type
    group_owner (first:2) { 
       uid 
       user
    }
  }
}

(Ondrej Slinták) #3

I wanted to achieve the same as SELECT DISTINCT would accomplish in SQL. I want to select all groups where given users are group_owner, take all group_type from these groups and return ones with unique name. So with my original dataset (don’t change it, else it doesn’t make sense), I want to return only a, b, c. With the query from original post, it returns a, a, b, b, c.


(Michel Conrado (Support Engineer)) #4

Can you check this discussion to see if help you? Get output of uid(a,b,...) as intersection not union or digg more about SQL here on discuss.

From your example " _:c <group_type> “c1” ." isn’t linked with any user. So you’ll never get C.

In fact from your example I can not think of anything in relation to what you say. In other cases I would try to use “uid_in”, but in your Set I can not use it as you need. Then a set closer to reality would help to follow the logic and analyze whether can or can’t do.

So, your q returns this:

"data": {
    "q": [
      {
        "group_type": "a2"
      },
      {
        "group_type": "b1"
      },
      {
        "group_type": "b2"
      },
      {
        "group_type": "a1"
      }
    ]

The q I presentend returns this (the obvious):

"test": [
      {
        "uid": "0x2723",
        "user": "user1",
        "~group_owner": [
          {
            "group_type": "b1"
          },
          {
            "group_type": "a1"
          }
        ]
      },
      {
        "uid": "0x2724",
        "user": "user2",
        "~group_owner": [
          {
            "group_type": "a2"
          },
          {
            "group_type": "b2"
          }
        ]
      }
    ]
  }

Cheers


(Ondrej Slinták) #5

Something is being lost in translation, so I’ll try to rephrase the question.

I have nodes with people. Each person can have assigned one or more skills through skill_owner. Skill name is always a string.

person: string @index(exact) .
skill_name: string .
skill_owner: uid @reverse .

Let’s assign skills to 2 people. Michel with Java and Dgraph. Ondrej with Python and Dgraph.

{
  set {
    _:michel <person> "Michel" .

    _:m_skill_1 <skill_name> "Java" .
    _:m_skill_1 <skill_owner> _:michel .
    
    _:m_skill_2 <skill_name> "Dgraph" .
    _:m_skill_2 <skill_owner> _:michel .
    
    _:ondrej <person> "Ondrej" .
    
    _:o_skill_1 <skill_name> "Python" .
    _:o_skill_1 <skill_owner> _:ondrej .
    
    _:o_skill_2 <skill_name> "Dgraph" .
    _:o_skill_2 <skill_owner> _:ondrej .
  }
}

Now I’m assigning a team and need to know what unique skills it’s gonna have. My other requirement is that those unique skills have to be paginable.

In SQL the query I need would be:

SELECT DISTINCT skill.name
FROM skill
LEFT JOIN person
ON person.id = skill.owner
WHERE person.name = 'Ondrej' OR person.name = 'Michel'
LIMIT 10

Which returns:

Dgraph
Java
Python

In Dgraph, I tried this:

{
  var(func: eq(person, ["Ondrej", "Michel"])) {
    ~skill_owner {
    	s as skill_name
    }  
  }
  
  q(func: uid(s), first: 10) {
    skill_name
  }
}

Which returns:

"data": {
  "q": [
    {
      "skill_name": "Java"
    },
    {
      "skill_name": "Dgraph"
    },
    {
      "skill_name": "Python"
    },
    {
      "skill_name": "Dgraph"
    }
  ]
},

As you can see, it returns Dgraph twice. I need only unique skill names, so I can paginate them properly.


(Michel Conrado (Support Engineer)) #6

Now I get it!

So:


{
  set {
    
 # First we set Unique nodes for Skills
    _:Skill_Java <skill_name> "Java" .
    _:Skill_Dgraph <skill_name> "Dgraph" .
    _:Skill_Python <skill_name> "Python" .

 #Now we assign the users to that skills using your way to relate those
    _:michel <person> "Michel" .
    _:Skill_Java <skill_owner> _:michel .
    _:Skill_Dgraph <skill_owner> _:michel .
    
    _:ondrej <person> "Ondrej" .
    _:Skill_Python <skill_owner> _:ondrej .
    _:Skill_Dgraph <skill_owner> _:ondrej .

  }
}

And the same q:

{
  var(func: eq(person, ["Ondrej", "Michel"])) {
    ~skill_owner {
    	s as skill_name
    }  
  }
  
  q(func: uid(s), first: 10) {
    skill_name
  }
}
{
  "data": {
    "q": [
      {
        "skill_name": "Dgraph"
      },
      {
        "skill_name": "Python"
      },
      {
        "skill_name": "Java"
      }
    ]
  }


(Ondrej Slinták) #7

Thanks. That was my backup plan, but hoped it’d be possible to do this with strings as well. Are there are any plans to introduce this functionality in the future?


(Michel Conrado (Support Engineer)) #8

Can you exemplify what you mean? We have several types of indexing for strings, how would that use, that you mention?


(Ondrej Slinták) #9

With your solution, it means I have to create unique nodes with skill names and use uids in the query to get the uniqueness. It means a slight complication which could be prevented if I could just list all unique strings (without creating any extra nodes) as I originally wanted. The same as SELECT DISTINCT would do in SQL.


(Michel Conrado (Support Engineer)) #10

Okay, I see. But this is how the paradigm of graphs databases works. I believe that this logic you want would never be implemented other than via user application (means your app).

Graphs DBs are completely distinct from what SQL would be. The idea is that something that isn’t related via edges is a “new/diff thing/obj”. That is, you would always have two “dgraphs” in the query response. Dgraph understands that each result is unique, even though the string is the same.

This could even be implemented, but has no strength to be. For it is another paradigm. This could even generate a lot of useless chunks in the way. Making a little hell to clean the DB or unnecessary disk usage.