When to index a string with hash instead of exact?


#1

I’m aware that the documentation here mention that hash does the same as exact but hashes string.

Am I correct in the assumption that the hashing is done using MD5, and therefore we should preferably use exact index if we know the string will be less than 32 characters, whereas we should use hash index if the string will be more than 32 characters?


(Manish R Jain) #2

Dgraph takes care of dealing with collissions. So, functionality wise, they’re exactly the same. But, they matter in terms of storage and performance. If you expect strings longer than say 16 bytes, I’d suggest using hash. In general, hash is a good idea – exact is useful only if you already have a string whose length is confined to a small limit (like username).


#3

Thanks, appreciate it :slight_smile:


(Manish R Jain) #4

Update: Dgraph no longer does collision detection with hashes for better performance. However, I think we use a 128-bit hash to decrease the chance of collisions significantly (been a while, need to verify).