To clarify, if I do:
type Cat {
birthDate: DateTime @search(by: [day])
}
and then try to find a Cat that has birthDate equal to some date D
, it’ll only compare up to the day and ignore hours, minutes and seconds etc?
type Cat {
birthDate: DateTime @search(by: [day])
}
and then try to find a Cat that has birthDate equal to some date D
, it’ll only compare up to the day and ignore hours, minutes and seconds etc?
No. An index basically tells the search function where to search, so your searches happen faster.
For example, let’s say you have a word that you want to find in a dictionary. Say the word is “quaff”. What is the fastest way to find the word in the dictionary? You jump to the part of the book that starts with “Q”. How do you know which page is where Q starts? Most dictionaries have an alphabet index, which tells you things like “Q - page 324”. So you know to start searching at page 324.
The dictionary example from above is indexed by first letter.
You can index by day, so you are saying “Day 1: page 0, Day 2: page 5…”. It doesn’t change the comparison. It just changes the organization of the data so your query can return results fast.
The docs isn’t very clear on what @search(by: [day])
actually means. Is it by day (monday, tuesday, etc…), is it by day of month (1, 2, 3…31), is it by date up to the day (2020-01-01, 2020-01-02…). I will try to get that answered and amend the docs asap.
Thanks for the quick response!
Does that mean it would be more performant to have more indexes?
Applications, such as the movies examples in these docs, that require searching over dates but have relatively few nodes per year may prefer the
year
tokenizer
Wouldn’t the month
tokenizer be faster than year
since there’s less movies in a month of some year compared to movies in that year?
Since there is no way to search for day of month and not a good way to do day of week without some math calculations, it is probably date up to day. I am thinking more specifically indexing every 60x60x24 seconds.
Aha, I’ve never seen those docs before.
But, it’s now clear that @search(by: [day])
means date up to the day (2020-01-01, 2020-01-02, …).
Now to answer your question:
Up to a certain point. Usually that point is just to have 1 index. After that, the cost of maintaining the multiple index would outweigh the increase in search performance (i.e. everytime you add a new mutation, you need to update multiple indices).
Probably true. Again, will be seeking to amend the docs
I see, I took it from here: https://dgraph.io/docs/query-language/schema/#datetime-indices
Dgraph handles this right? From this, I’m guessing that the cost of more precise indexes is more memory/storage required for the indexes.
Yes.
Runtime processing costs too. If a query can use multiple indices, then the query will be run with each index, essentially doing N times the work.
How I understand it, I am thinking that the index should be based on the normal use cases in range of data asked for.
If my queries usually ask for data within a month range then I would be better off doing the month index. If the normal use is to see data in chunks of days then day would be the best index, etc.
I understand that index works like dictionary guide words. The index quickly divides data in chunks. If I apply a month index and then apply a gt function filter. It will get the chunk of data around that index. And then do additional process that chunk of data to find the ones that match the filter. Then it will get all of the chunks matching greater than that index and append it to the data.
The index helps narrow down the size of the chunk that needs to be processed without the index.
Another way to look at it is, the closer your data is together the closer your indexes may need to be.