Is there a significant advantage to short field names?

eugaia · April 18, 2021, 8:58pm

Hi,

I’m wondering if there is a significant argument for choosing field names that are very short (ignoring arguments of clarity, since that would be irrelevant in my particular case). I have two questions:

When saving nodes/edges internally, is the field name stored independently for each new node, or is the name stored as a string only once?

i.e. With the following two types:

type Type1 {
  sn: Int
}

type Type2 {
  longName: Int
}

If you have 1 million nodes of Type1 and 1 million nodes of Type2, will the storage capacity of the Type2 nodes be 6 million bytes more (1m * 6 bytes) or just 6 bytes more - approximately, excluding things like rounding up of memory blocks, pagination etc.?

Assuming you’re getting data from the same field of 1000 nodes, will the field name be checked with something equivalent to strcmp (in C/C++) for each node (i.e. 1000 strcmp operations in total), or would the strcmp operation just be done once (presumably when parsing the DQL or just after), and some kind of efficient offset/lookup done that isn’t based on the actual field name as defined in the schema (perhaps using a uint64, say)?

Basically with the above questions I’m trying to determine if there is a multiplicative effect on memory and/or the processing based on field length and the number of nodes.

Thanks.

MichelDiz · April 19, 2021, 6:54pm

Under the hood, the values are stored like this.

key = <follower, 0x01>
value = <0xab, 0xbc, 0xcd, ...>

key = <follower, 0x02>
value = <0xba8, 0xbc, 0x35, ...>

key = <follower, 0x03>
value = <0x01, 0x35, 0xa453, ...>

So, depending on the type of the predicate. It will be repeated on storage. About the impact, I personally have no clue maybe @ibrahim, @mrjn, or @Anurag might help.

See more in Graph Database White Paper | Dgraph

About the naming

Well, I personally would not be concern about this. I like the idea to have a human-friendly DB. Shorting the name isn’t friendly.

mrjn · April 19, 2021, 7:55pm

The length of the predicate name won’t have much impact on storage. @Naman had done an analysis with replacing predicates with uint64s or even uint32s, and found 1% difference in storage / computation optimization.

eugaia · April 19, 2021, 8:01pm

@MichelDiz @mrjn - Thank you both for your replies.

Topic		Replies	Views
Does the length of a property name matter? Dgraph	3	522	May 17, 2019
Best practices on reusing common fields like "Name" Dgraph kind:question	2	432	September 12, 2021
Why do I use so much memory in uid when I use shortest query？ Dgraph kind:question	0	331	June 18, 2021
Question about schema design - string literal object vs node object Dgraph	6	648	May 18, 2023
ID convention for nodes Dgraph dgraph , help-wanted	3	590	August 7, 2020

Is there a significant advantage to short field names?

About the naming

Related topics