I have created a schema for a Profile and want to create efficant indexes for partial text search. My goal is to return relevant profiles through a search on the username, firstname, or lastname.
For example with the query string jo, only Nodes should be returned which username, firstname, or lastname start with jo. Later maybe rank them on a score that takes into consideration the number of times the query is included.
After my research I think this is only possible through the trigram index but based on this answer it needs a regex of atleast 3 characters. Does that mean that a query string with only 2 characters would not work with the trigram index?
Or is there another strategy that would return a similar result, maybe just check if the username, firstname, or lastname contain the query characters?
My basic schema:
type Profile {
id // will store reference to id from MongoDB
username
firstname
lastname
phone
picture
}
id: string @index(exact) .
username: string @index(trigram)
firstname: string @index(trigram)
lastname: string @index(trigram)
That is the current limitation. You cannot search for jo in a regex, but you should be able to search for jo. because that has a third character in it. Quite a bad limitation, but probably one of the more complicated functions in dgraph if I had to guess.
Also let it be known that regex searches are really slow in Dgraph compared to any other filter functions.
Yea regex searches are mostly slower. MongoDB’s regex search is also quite slow. I tried Neo4j before trying DGraph now and Neo4j had the STARTS WITH operator which was very useful for my case.
I probably wont try to mess with regex and just try to build a search with the term argument
This was probably one of the biggest challenges in development for us. How do we do “global searches” that we are use to doing with MySQL like operator and wild cards across dynamically concatenated fields.
SELECT
*
FROM
user
LEFT JOIN address ON ...
WHERE
CONCAT_WS(user.username, user.first_name, user.last_name, address.city, address.state) LIKE `%jo%`
Sorry, I wasn’t responding to the original question, but your SQL equivalent. Obviously two letters can’t work with trigrams. The match would work otherwise though, correct?
query {
block1 as var(func: regexp(User.username, /foo/i))
block2 as var(func: regexp(User.first_name, /foo/i))
block3 as var(func: regexp(User.last_name, /foo/i))
var(func: regexp(City.name)) {
City.hasAddresses {
block4 as Address.forUser
}
}
searchUser(func: uid(block1, block2, block3, block4)) {
id: uid
username: User.username
first_name: User.first_name
# more fields/edges
}
}
But again, very slow. I removed most places where I allowed regexp indexes in my app by the user just because how poorly they performed.
But where this fails is when you want to search for full names like: “oo ba” should match user with first name “foo” and last name “bar”, but you simply can’t do that without then splitting every string on spaces and do even more blocks, that gets really complex really quick and is not worth the effort for the poor query performance in the end.
But to give Dgraph credit,
That is what the above Dgraph query does, just not the same equivalence of concatenating a string and searching across that concatenated string. Even for the same size of data in MySQL to Dgraph, MySQL handled this without any noticable query performance degredation while Dgraph was very poor performance with any regexp query from my experience. I use it now only for administration and restrict if from users.
Yeah, that is what I have been talking about where you can’t do fuzzy phrase searches. I don’t even know if what you just described is possible, as it would guarantee the individual words, but not necessarily in the desired order.
I wonder if it is possible to speed this up to equal or better performance of the SQL equivalent.