Where does "1-balance" comes from?

Hey,

I’m reading https://dgraph.io/docs/clients/raw-http/ and after the mutation

  set {
    <0x1> <balance> "110" .
    <0x1> <dgraph.type> "Balance" .
    <0x2> <balance> "60" .
    <0x2> <dgraph.type> "Balance" .
  }

I get more keys than the example

"keys": [
        "1v5ti2a0lowz5",
        "1ypxvqw9cramz",
        "63oai4wqy425",
        "63oai4wqy426",
        "ejk4l0yqefmi",
        "zpxhtex53lym"
      ]

Can you explain the difference ?

Thanks

“1-balance” comes from?

The number in front of the predicate is related to its namespace.

Those keys will never be the same. The size of it doesn’t matter. It is like a “hash” of the transaction. So it can be found and committed.

And can you explain what they refer to exactly ?

To a transaction.

Do you have any documentation I could look at to learn more about it ? Like why a transaction that supposedly touches 4 keys balance/1, dgraph.type/1, balance/2, dgraph.type/2 actually touches 6 keys

Thanks

Unfortunately no, there’s no docs specific about it. Only by digging in code or reading the paper would you learn about it.

From the paper:
See Dgraph Whitepapers: In-Depth Insights and Analysis

Dgraph follows a lock-free transaction model. Each transaction pursues its
course concurrently, never blocking on other transactions, while reading the
committed data at or below its start timestamp. As mentioned before, Zero leader
maintains an Oracle which hands out logical transaction timestamps to Alphas.
Oracle also keeps track of a commit map, storing a conflict key → latest commit
timestamp. As shown in algorithm commit, every transaction provides the
Oracle the list of conflict keys, along with the start timestamp of the
transaction. Conflict keys are derived from the modified keys, but are not the
same. For each write, a conflict key is calculated depending upon the schema.
When a transaction requests a commit, Zero would check if any of those keys has
a commit timestamp higher than the start timestamp of the transaction. If the
condition is met, the transaction is aborted. Otherwise, a new timestamp is
leased by the Oracle, set as the commit timestamp and conflict keys in the map
are updated.

I think the keys are bigger now cuz now we have namespaces. And that example has not been updated with the current state. But this is not of high relevance.

I’m not sure what you mean by it’s not of high relevance… From my POV, trying to understand what’s happening, it is relevant. I’ve read the paper but I didn’t make any connection to the conflicting keys, so it seems you’re saying some keys there are conflicting keys and it then makes sense that there are more keys than I expected :thinking:

I guess I’ll have to get deeper in the code to understand better what’s going on, especially the part where you’re saying namespaces would be involved. I would expect that namespaces just separate keys so there wouldn’t be any conflicting keys between 2 namespaces as they are completely isolated from one another

What I mean is that these details for an ordinary user is irrelevant. Unless you are auditing the code or wanting to contribute to the code. There would be no need to go that deep if it weren’t for these cases.

No, there are only “KV keys” and “Conflict keys”. Nothing more as far as I know.

“Conflict keys” are keys that contain the state of the transaction. Through it, Dgraph checks a series of things for the transaction to be successful. Using an ACID model.

Through this keys Dgraph can also do concurrent conflict checking trying to modify the same data at the same time. Using Upsert directive(not to be confused with upsert query).

I think your point has been addressed. The difference is that the example is outdated and Dgraph has changed a lot over the years. And it’s likely that the size has to do with the addition of Namespaces. Because namespaces add a string of zeros in front of the predicate. This makes a logical separation of the data. This may be the strongest source of this difference.

I believe that physically they are not 100% separate. But logically they are. They are separated by an extra suffix that cannot be accessed by a query executed in another namespace.

I haven’t studied in depth what namespaces are like. But according to my reviews, reading code. It seems to me that it is so. I think the paper doesn’t talk about namespaces. Well, it’s a recent thing.