UUID, efficiency and indices

mbn18 · May 6, 2020, 3:56pm

We need an external unique ID for each entity (type). For example: Device, user, Measurement.
ID type is UUID4 => 437e8c3b-0b92-4e9f-b111-f361b22a5888

Our first thought was:

uuid: string @index(exact) @upsert .

Our concern for performance raised the following questions:

Is string is the most efficient type? Maybe list of bytes (is it possible?)
Which index foot print is smaller? hash or exact? The uuid is 128bit, but it is also saved as UTF8 which is more bloated than binary.
Should we split the uuid to types, like user.uuid, device.uuid and etc, And that to avoid over bloated index?

Other thoughts (of less urgency but interesting non the less).

split the uuid and search by the prefix and filter by the suffix:
1. uuid_prefix => 437e8c3b
2. uuid_postfix => 0b92-4e9f-b111-f361b22a5888

Or maybe write a custom tokenizer that specialize with uuid?

mrjn · May 6, 2020, 9:42pm

String is fine. We thought about having blob, but then realized it’s the same thing as string in Go.

Hash is slightly smaller, I think. We’d hash the string and store that hash as key for the index internally.

You could do that, if you expect each of these to become quite big.

That’s a possibility too. But, hash should be able to do the job.

system · June 5, 2020, 9:42pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Efficient querying a large database Dgraph	2	248	July 7, 2021
What counts as a "long string" which would be worthwhile to index via a "hash" instead of via "exact" Users	2	726	February 3, 2018
Predicate in a Go struct and uid is now string? Users	4	839	January 14, 2018
Adding value type to posting list Users	7	1432	November 28, 2017
Combine sorted queries Dgraph	9	1084	June 12, 2020

UUID, efficiency and indices

Related Topics