Using "regex function"/"trigram index" on sets(lists)

RafaARV · October 18, 2017, 4:57pm

I can’t find the accurate way to use this index on arrays. Not to say that this regular expressions seems to be so specific that they end losing the flexibility of using them (they end looking more like a oneofterms or a anyofterms so they lose their purpose/functionality). For example:

I have this schema:

mutation {
    schema {
        group: string @index(exact,term) .
        name: string @index(exact,term) .
        array_str: [string] @index(trigram) .
    }
}

Then I add this data (I don’t know if there is a shortcut to massively add all the “array_str” elements, please tell me if there is one):

mutation {
  set {
    _:root <group> "test" .
    _:root <name> "root" .
    _:root <array_str> "first" .
    _:root <array_str> "second" .
    _:root <array_str> "third" .
    _:root <array_str> "fourth" .
  }
}

But making queries like the following won’t work at all:

{
  root(func:regexp(array_str, /.*rd$/))@filter(eq(group,"test") and regexp(array_str,/.*rd$/)){
    expand(_all_)
  }
}

RETURNS: Regular expression is too wide-ranging and can’t be executed efficiently.

{
  root(func:regexp(array_str, /[a-z]+rd$/))@filter(eq(group,"test") and regexp(array_str,/[a-z]+rd$/)){
    expand(_all_)
  }
}

RETURNS: Regular expression is too wide-ranging and can’t be executed efficiently.

{
  root(func:regexp(array_str, /third/))@filter(eq(group,"test") and regexp(array_str,/third/)){
    expand(_all_)
  }
}

RETURNS: NOTHING - Showing 0 nodes and 0 edges (I will expect the just inserted node as it contains “third”)

Any suggestions to make this work? Seems like right now this index doesn’t provides hard regex functionality and is just another way of implementing “anyofterms” and “oneofterms”.

Thank you!

mrjn · October 19, 2017, 10:00pm

Hey @RafaARV,

This error is thrown if your regexp returns more than a million results. Executing a query like this can cause a huge memory spike, something better avoided.

I don’t know if there is a shortcut to massively add all the “array_str” elements, please tell me if there is one

Maybe not in this one, but in the upcoming v0.9 release onwards, we’re going to move all mutations to JSON and then you would be able to set all the elements of array directly.

Make the regexp more specific so it generates less than a million results.

RafaARV · October 23, 2017, 9:47pm

Thank you Manish,

This error is thrown if your regexp returns more than a million results.

Make the regexp more specific so it generates less than a million results.

But this was a clean database with only one node (“_:root”), it is not even close to the million of results not even that possibility exists yet (only one node in database).
Making that calculation only by prediction is too arbitrary because most regex operators will result simply banned, for example:

Operators like “*” or “+” will be banned as they could reach infinity.

Shouldn’t the efficiency of the queries relay on how much knowledge the user has in making efficient queries among its data and not simply ban the opportunity of making them?

Thank you!.

mrjn · October 24, 2017, 7:35pm

There’s no prediction in the system. It runs the regexp, and only if it ends up matching more than a million results, would it throw this error. If there’s only one result, something weird is going on. Can you please file a bug?

system · November 23, 2017, 7:36pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not to use trigrams when filtering by regexp Dgraph	8	960	January 18, 2019
@regexp vs @trigram Dgraph Cloud graphql , kind:question , slash-graphql	7	1094	June 9, 2021
Error: Regular expression is too wide-ranging and can’t be executed efficiently Dgraph dgraph	3	755	August 10, 2020
Equality checking with the trigram index Dgraph kind:question , dgraph	4	593	July 15, 2021
How to use regexp in @filter Dgraph dgraph	10	547	September 20, 2023

Using "regex function"/"trigram index" on sets(lists)

Related topics