Not to use trigrams when filtering by regexp

makitka · September 12, 2018, 4:15pm

it’s strange that for using regexp expression in filter section, still need to create trigram index.
it’s just filtering on already fetched data, not a search by some condition. when using it in filter section, it makes sense just to apply regular expression to a string value and filter out values that don’t match.

as for now, trigram index is needed, so need additional space to store it, and, what’s more important, is not possible to use regular expressions like "^f.*" to keep values starting with “f” letter only. whereas all nodes are already found by outcoming edges from some other node.

i found 1 workaround to add 2 junk letters to every string value i need to filter with “start with” condition, so "^f.*" condition becomes "^AAf.*" . it works but looks real ugly

so, my question is - why not using regular “regexp” go function when processing regular expression in “filter” section, and use trigram index when use it in “func” one?

MichelDiz · September 12, 2018, 4:19pm

@gus what do you think about it?

gus · September 14, 2018, 11:22pm

the reason you can bypass the check with junk is a bug in the 3rd party package. we could add support to match all instead of exact matching, i don’t think we need to change regexp packages yet. i’ll check.

makitka · September 15, 2018, 5:21pm

actually, my question was not about why junk check passes, but why we need trigram index and 3 chars limit for regexp in filter section, when we just filter out already fetched predicate values

makitka · September 22, 2018, 8:25pm

mmmmmmmm… any updates here?

gus · October 2, 2018, 11:43pm

Sorry I couldnt get back to you sooner. I’m reopening the original issue and will investigate. I don’t know why regexp isn’t used for filters, but i think it’s worth checking. My gut feeling is that regexps are very slow, not to mention that they can use lots of memory.

If others could weigh in on this, it would be great.

Ref: Not to use trigrams when filtering by regexp · Issue #2565 · dgraph-io/dgraph · GitHub

makitka · October 3, 2018, 11:52am

good news! since it’s filtering only, i don’t see any performance issues here - quite every database (elasticsearch, for example) has ability to post-filter returned data by regexp

gus · January 18, 2019, 4:11am

I’ve made a PR to change the regexp behavior to use an index if found, otherwise will run without index as discussed.

makitka · January 18, 2019, 7:33am

thank you! this is very important feature for us

Topic		Replies	Views
How to use regexp in @filter Dgraph dgraph	10	546	September 20, 2023
@regexp vs @trigram Dgraph Cloud graphql , kind:question , slash-graphql	7	1094	June 9, 2021
Error: Regular expression is too wide-ranging and can’t be executed efficiently Dgraph dgraph	3	755	August 10, 2020
Equality checking with the trigram index Dgraph kind:question , dgraph	4	593	July 15, 2021
How to filter accurately? Dgraph	0	280	September 15, 2023

Not to use trigrams when filtering by regexp

Related topics