Dgraph already supports fulltext search via the DQL function anyoftext, but it does not currently expose a score to the user. It would be very useful for anyone building search functionality on top of Dgraph to have some metric to measure relevance of each result.
Instead of vanilla TF-IDF, we will use a variant called BM25, currently employed by Lucene (and, by extension, Solr and ElasticSearch), which addresses the shortcomings of TF-IDF.
User Impact
The user would be able to access the score of a result like so:
{
movie(func:alloftext(name@en, "the dog which barks")) {
name@en
score: dgraph.score
}
}
The proposed predicate dgraph.score would be consistent with the existing reserved predicate dgraph.type.
Ideally, the user should also be able to filter and sort by dgraph.score.
Architecture
Dgraph already utilizes Bleve for tokenizing, multi-language stemming and stop-word removal.
Computing TF-IDF itself would require some additional indexing.
Assume the following documents:
doc1: “Rain, rain, go away!”
doc2: “She hates rain.”
The values we need are:
N: how many documents are there in total?
Dgraph’s count index already does this
docCount = 2
n: for a term t, how many documents contain t?
Dgraph’s current fulltext index can give us the list of documents with t, we can just use the length of it
[away:1 go:1 hate:1 rain:2 she:1]
freq: for a term t and document d, how many times does t occur in d?
Would this also support scoring for multiple fulltext searches in the same level?
{
movie(func:type(movie)) @filter(alloftext(name@en, "the dog which barks") OR alloftext(description, "the dog which barks")) {
name@en
score: dgraph.score
}
}
@amaster507 you’re right, a global dgraph.score within a query block would not support multiple alloftext queries. Perhaps this should be made available as a Facet instead.
@amaster507 here’s a solution I’ve worked out. This is a sample of what a DQL query would look like, compare this with an ElasticSearch Boolean query:
{
# must(Princess): must match, contributes to score
var(func: alloftext(name@en, "Princess")) {
scores1 as score()
}
# should(Bride): may or may not match, contributes to score
var(func: alloftext(name@en, "Bride")) {
scores2 as score()
}
# filter(The): must match, doesn't contribute to score
# must_not(Terminator) must not match, doesn't contribute to score
var(func: uid(scores1, scores2)) @filter(alloftext("The") AND NOT anyoftext("Terminator")) {
scores as math(max(scores1, 0) + max(scores2, 0))
}
# sort and display the results
search(func: uid(scores), orderdesc: val(scores)) {
name@en
roles: val(scores)
}
}
TL;DR:
If you use fulltext search as a func, it will give you a score
If you use fulltext search as a @filter, it will not give a score
You can use value variables to combine / filter / sort scores.