hi all.
this issue is more addressed to developer team, not support one, so if you @mrjn or @dmai have time to take a look, it would be great.
we have large amount of items and need to explore relationships with applying filters to outcoming nodes. actually we are making library management system for us/europe, but it would be easier to explain our problem on person/friends test schema.
so, for example, we have billions of persons with structure like:
{
set {
_:person1 <person_id> "person1" .
_:person1 <name> "John" .
_:person1 <age> "17" .
_:person1 <sex> "M" .
_:person2 <person_id> "person2" .
_:person2 <name> "Martin" .
_:person2 <age> "19" .
_:person2 <sex> "M" .
_:person3 <person_id> "person3" .
_:person3 <name> "Peter" .
_:person3 <age> "22" .
_:person3 <sex> "M" .
_:person4 <person_id> "person4" .
_:person4 <name> "Melissa" .
_:person4 <age> "17" .
_:person4 <sex> "F" .
_:person1 <has_friend> _:person2 .
_:person1 <has_friend> _:person3 .
_:person1 <has_friend> _:person4 .
}
}
we create an index on person_id
column to have ability to find persons by their ids.
person_id: string @index(hash) .
age: int .
now we need to find a person with id=“person1” (John) and display all his friends. it works ok:
{
get_friends(func: eq(person_id, "person1")) {
has_friend {
person_id
name
}
}
}
{
"data": {
"get_friends": [
{
"has_friend": [
{
"person_id": "person2",
"name": "Martin"
},
{
"person_id": "person3",
"name": "Peter"
},
{
"person_id": "person4",
"name": "Melissa"
}
]
}
]
}
}
now we want to filter his friends by age, to display only friends with age > 18. all friends are already found by outcoming has_friend
edge, we just need to filter out returned nodes:
{
get_friends(func: eq(person_id, "person1")) {
has_friend @filter(ge(age, 18)) {
person_id
name
}
}
}
{
"errors": [
{
"code": "ErrorInvalidRequest",
"message": ": Attribute age is not indexed."
}
],
"data": null
}
and we have this error. why do we need to have an index here? we don’t need to retrieve all age
edges with value >= 18, because there are billions of them, we just need to filter out already found nodes. having index is completely redundant here, and i am afraid that even if i add this index, all nodes with age
>= 18 will be fetched (otherwise why do we need it here?). the same error appears if i place eq
and other inequality filters.
if i want to find John’s male friends (having “M” sex), i also need an index and i am afraid that if i create it, all persons with “M” sex will be retrieved instead of just filtering John’s friends:
{
get_friends(func: eq(person_id, "person1")) {
has_friend @filter(eq(sex, "M")) {
person_id
name
}
}
}
{
"errors": [
{
"code": "ErrorInvalidRequest",
"message": ": Attribute sex is not indexed."
}
],
"data": null
}
and the same problem is when i want to find all John’s friends having name starting with “M” letter. here is another complexity that if i even create a trigram index (which shouldn’t be used in filter section at all), i am not able to use regular expressions shorter than 3 letters because trigram index cannot be used in this case:
{
get_friends(func: eq(person_id, "person1")) {
has_friend @filter(regexp(name, /^M.*/)) {
person_id
name
}
}
}
{
"errors": [
{
"code": "ErrorInvalidRequest",
"message": ": Predicate name is not indexed"
}
],
"data": null
}
name: string @index(trigram) .
{
"errors": [
{
"code": "ErrorInvalidRequest",
"message": ": Regular expression is too wide-ranging and can't be executed efficiently."
}
],
"data": null
}
but it’s just a filter, i need to filter out 3 nodes i already have.
my opinion is that filter should be a filter. it shouldn’t use an index and retrieve all nodes matching filter condition (we have billions of such).
i hope that this problem may be fixed.
thank you