Question about String indexing

I was wondering the efficiency of String indexing of dgraph
Currently i store all node of the database with a generated id (normally it’s an hash the properties) and i use it to perform upsert.
Integrating GraphQL I started to feel it as a useless overhead, in particular for nodes containing localized text.

Given the schema

interface Metadata {
  id: String! @id @search(by: [hash])
  createdAt: DateTime!
  modifiedAt: DateTime!
  generation: Int!
  version: Int!
}


type Text implements Metadata {
  text: String! @search(by: [trigram, term, fulltext])
"""
  en-US, ...
  """
  localization: [String!]! 
}

would affect the performance to change the id field to the text itself?

type Text {
  text: String! @search(by: [trigram, term, fulltext]) @id
"""
  en-US, ...
  """
  localization: [String!]! 
}

Does dgraph slow down when using a trigram, term, fulltext indexed string as unique id?

Hi @Luscha,

As per my understanding, I believe that they should be stored in separate indexes, so you should not see any slow down. However, from a domain point of view, you might not want to keep the text itself as an ID, as you would have a tough time updating the text.

Tejas

Mutations would take more time with more indexes added to any field as more indexes have to be kept up to date. At the same time queries using indexes would become faster. Are you using all kind of queries with your id field, like regex, allofterms, alloftext so as to need all those indexes?

2 Likes

I would like to build something like a search engine, so I guess I need all those indexes to perform a complete and reliable search query.
The question’s focus was more about using an indexed string as the @id of a node than the efficency of the indexing itself:

Option 1:
Get the text → perform an hash → use the hash as id → perform an upsert / query of the node.
Option 2:
Get the text → use the full text to search the node associated to the text → perform an upsert / query of the node.

Using an indexed string as an @id should be ok to do what you want to do here. The runtime of both the approaches would be similar as both would require searching a string field.