Cannot search `*` with term index

Hi,

our schema is:

<text>: string @index(term) .

and following query does not return expected results:

{
  all(func: has(text)) {
    uid
    text
  }
  
  w(func: eq(text,"*")) {
    uid
    text
  }
}
{
  "data": {
    "all": [
      {
        "uid": "0x2",
        "text": "a"
      },
      {
        "uid": "0x3",
        "text": "*"
      }
    ],
    "w": []
  }
}

if we change schema to:

<text>: string @index(exact) .

than query returns what is expected:

{
  "data": {
    "all": [
      {
        "uid": "0x2",
        "text": "a"
      },
      {
        "uid": "0x3",
        "text": "*"
      }
    ],
    "w": [
      {
        "uid": "0x3",
        "text": "*"
      }
    ]
  }
}

Backend info:
Dgraph Version - v20.11.0-11-gb36b4862
and same problem on
Dgraph Version v20.11.0-rc2

Humm, I think there’s no issue here. Not sure. Cuz Term is based on “Terms”, not Symbols. The Exact index fits better your case.

Hi,

we are running old 1.0.x cluster and it is working there.

Problem is more complicated because we need term index for our queries, but ratel does no allow to create exact and term index for same predicate.

I think,thereis good reasons why term and exact indexes should not be created at same time

You can force it via Bulk edit or via Alter command via Curl.

exact index is not a solutions.
Here is another problem.
Lets have a node with text = text with * in middle .

How I can find this when I don’t know exact match?

{
  all(func: has(text)) {
    uid
    text
  }
  
  w(func: anyofterms(text,"*")) {
    uid
    text
  }
}

results:

{
  "data": {
    "all": [
      {
        "uid": "0x2",
        "text": "a"
      },
      {
        "uid": "0x3",
        "text": "*"
      },
      {
        "uid": "0x4",
        "text": "text with * in middle"
      }
    ],
    "w": []
  }
}

Again, almost all indexes are based on words and stopwords. So, you won’t query for symbols.

For a single asterisk, you can use Hash.

Maybe you could request an index that can index symbols too. I think such an index would be bigger and consume more resources to process. So it makes sense to be a separated one.

Hmm, this is very confusing because when i change text to text with 1 in middle I can now search 1 with same query. And 1 has in UTF-8 same size like *. There are no special requirements for indexing, in my opinion.

{
  all(func: has(text)) {
    uid
    text
  }
  
  w(func: anyofterms(text,"1")) {
    uid
    text
  }
}

results:

{
  "data": {
    "all": [
      {
        "uid": "0x2",
        "text": "a"
      },
      {
        "uid": "0x3",
        "text": "*"
      },
      {
        "uid": "0x4",
        "text": "text with 1 in middle"
      }
    ],
    "w": [
      {
        "uid": "0x4",
        "text": "text with 1 in middle"
      }
    ]
  }
}

Yeah, numbers are indexable by any index type we have. Especially when your predicate is Int.

Numbers are meaningful for humans in a text, Characters like *&%$!@#( don’t. Unless you are indexing code. But why not? make a feature request. :stuck_out_tongue:

So, what I should say to our customers which have business use case where they want to find * in text?

Should I say that SlashQL is not able to find simple * and that they are not human because they want to find something which are not usefull for humans?

SlashQL not allow to install custom tokenizer.

And this is really simple task in MongoDB, Elastic and also SQL DB.

1 Like

This isn’t Slash related.

make a feature request.

Hey @selmeci

We’ll get this looked at and get back to you. v1.0.x is a pretty old cluster so the code has changed quite a lot since then but I agree this should work with the term tokenizer.

1 Like