Is there a significant advantage to short field names?

Hi,

I’m wondering if there is a significant argument for choosing field names that are very short (ignoring arguments of clarity, since that would be irrelevant in my particular case). I have two questions:

  1. When saving nodes/edges internally, is the field name stored independently for each new node, or is the name stored as a string only once?

i.e. With the following two types:

type Type1 {
  sn: Int
}

type Type2 {
  longName: Int
}

If you have 1 million nodes of Type1 and 1 million nodes of Type2, will the storage capacity of the Type2 nodes be 6 million bytes more (1m * 6 bytes) or just 6 bytes more - approximately, excluding things like rounding up of memory blocks, pagination etc.?

  1. Assuming you’re getting data from the same field of 1000 nodes, will the field name be checked with something equivalent to strcmp (in C/C++) for each node (i.e. 1000 strcmp operations in total), or would the strcmp operation just be done once (presumably when parsing the DQL or just after), and some kind of efficient offset/lookup done that isn’t based on the actual field name as defined in the schema (perhaps using a uint64, say)?

Basically with the above questions I’m trying to determine if there is a multiplicative effect on memory and/or the processing based on field length and the number of nodes.

Thanks.

Under the hood, the values are stored like this.

key = <follower, 0x01>
value = <0xab, 0xbc, 0xcd, ...>

key = <follower, 0x02>
value = <0xba8, 0xbc, 0x35, ...>

key = <follower, 0x03>
value = <0x01, 0x35, 0xa453, ...>

So, depending on the type of the predicate. It will be repeated on storage. About the impact, I personally have no clue maybe @ibrahim, @mrjn, or @Anurag might help.

See more in Graph Database White Paper | Dgraph

About the naming

Well, I personally would not be concern about this. I like the idea to have a human-friendly DB. Shorting the name isn’t friendly.

The length of the predicate name won’t have much impact on storage. @Naman had done an analysis with replacing predicates with uint64s or even uint32s, and found 1% difference in storage / computation optimization.

2 Likes

@MichelDiz @mrjn - Thank you both for your replies.