Incremental index building for eventual consistency

Hey @jchiu,

Here’s a working solution to our indexing problem.

To build index, we derive deltas based on current version of the data.

So, when we get:

Current Val: uid -name-> A
Update: uid -name-> B

We’ll create two updates to the index:

A -DEL-> uid
B -SET-> uid

If there are no crashes, this works great. But, in the case of a crash, the original PL might have been updated to B, but the indices might not.

The data left would be something like this:

Data: uid -name-> B
Index: A → uid

Problem

On a mutation replay (uid -name-> B), nothing would happen to Data. So, no delta would be generated, and hence the index would stay out-of-sync.

Solution

Irrespective of the existing value of the data, we always send a SET instruction to the index. Hence, doing:

B -SET-> uid

This would leave us with both:
A -SET-> uid, &
B -SET-> uid

Periodically, we run a thread, which runs over all the indices, and verifies that the uid pointed has the index token. So, we encounter B, and check if (uid, name) has B. It won’t. In that case, we delete the uid from B.

Race condition

There might be a rare race condition here, when this periodic thread, runs the B -DEL-> uid instruction, but before it could, another data update instruction comes in, which does uid -name-> B, and goes on to set B → uid. However, this is really rare, because we acquire mutex locks over the posting list data, and therefore, the uid -name->B instruction can only happen after we’ve already read (uid, name). Otherwise, we’d have read the B in (uid, name).

So, after reading, and before doing the write to index, this new update instruction would have to read the data, update the data, generate the delta, and apply both the DEL and SET to the index; before we could set the index. That would be very rare.

Potential solution to rare race condition

We could acquire one lock over the original data, and all the indices before doing any reads or writes. This would solve the above rare race condition, because any new update would have to wait before the periodic thread can finish it’s operation. Alternatively, the update would have already happened, and then the periodic thread would see A as the extra token, and not B.

Generating new indices

We still need a way to generate new indices, once we start having dynamic schemas. Or, if we introduce a new index to existing dgraph instance. That is a separate use case though; and we need to find a way to determine that a new index has been added, that we need to regenerate.

Alternatively, the user may give us a regenerate instruction for a particular predicate, in which case we should do it.

Is it possible to mark the operation as failed if the index mutation failed, such that during the mutation replay, we will do all three operations again? (data SET, index DEL and index SET)?

Regarding generating new indices, let’s go for the “user gives regenerate instruction” first for two reasons. (1) The code can be reused. (2) The other solution is better but would require more thought, e.g., maybe each predicate should maintain some timestamp or state of its index. @mrjn

I think there’s some misunderstanding. During mutation replay, we would only know data SET. The index DEL and SET are generated when we do data SET; based on the previous state. And the problem arises when data’s previous state has already caught up to current, but the index has not. RAFT doesn’t do index mutations; they happen beyond RAFT; so there’s no replay for indices.

Yeah, we can do the user specified regenerate index instruction; so we can support the addition of new indices without having to reload the whole data. Just use it the code you have for those purposes.

Note that the user would have to specify which predicate to regenerate the index for; and the server which receives the request would have to forward it to the relevant servers serving those predicates.

Regarding fixing the index, I think the approach I propose above is better. The race condition is going to be very rare; and if it does, we can introduce multiple posting list spanning locks; which would completely fix that issue.

The proposed method sounds good, and we can go for that.

I was hoping that we could bundle these mutations (data SET and index DEL and index SET) somehow as a single unit of “mutation” to RAFT, but that is easier said than done.

One concern is: if the index DEL is not made, some queries results might be wrong. I suppose that is ok (?) as it will eventually be correct? Or we have to query attribute values more often?

True. Each RAFT mutation contains 1000s of invididual mutations, all of which need to be applied before we know what the deltas would be. And I thought about ways how we could maybe store these deltas, and then use them on recovery; but that’s very tricky. RAFT could instruct the node to pick up a snapshot from the leader, instead of using what it has; and then the deltas would be rendered ineffective. Also, we have to worry about the mutations in RAFT logs, which don’t have corresponding deltas; it’s a mess that way.

I think that’s the best we can do for now. Eventual consistency should take care of this. If it becomes a problem, we can probably run further queries to ensure that we can weed out the uids which no longer have that value.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.