Index consistency during shard population

mrjn · November 24, 2016, 5:41am

Hey @minions, @jchiu,

I’m trying to figure out of our shard move would keep both the shard and the index consistent.
Shard:

When we copy over the shard, we copy directly the RocksDB data. This does mean that some PLs might still be in memory, and not flushed out to RocksDB; when we do the copy. However, when the mutations flow through, they should be able to rectify the missing data, and bring it back up to the right state. So, I think shard move works.

Index:

There’re three ways to handle index.

First

We copy over the indexing data as well. Note that index is generated via goroutines which might be written out later. So, the indexing data wouldn’t be consistent either. Now, when the mutations flow through, they would do mods to the index; but I’m not sure that they’d be idempotent. For e.g.

Data: SET [1, name, Tom hanks]
Now assume that this part of the data was copied over just fine during RocksDB copy, but the index wasn’t. So, the PL exists with “Tom hanks” as value.

Now we run AddMutationWithIndex with this fact, it would see that the value already exists, and not mutate it; which would also not update the index.

This then leads to an inconsistent index, which would never be fixed.

The second way is

We ignore all the indexing data; and force everything via AddMutationWithIndex. That would then require us to not write directly to RocksDB, but pass everything as a mutation; which is going to be a lot more expensive than what we’re currently doing.

A third way is to have a syncIndex function which can look over all the data, and sync up the index. We can then run this after the move, and also as a periodic thread to ensure no inconsistencies lie between data and index.

Suggestions?

jchiu · November 24, 2016, 6:06am

Quote: When the mutations flow through, they should be able to rectify the missing data.

I wonder if these mutations can somehow include index mutations as well. In other words, can index mutations be replayed just like other mutations?

mrjn · November 24, 2016, 6:13am

For that to happen, we’ll have to append the index mutation to the data mutation, before proposing (them together) to the cluster. The problem with that approach is, we’ll have to do the read in advance to determine what the index mutations should be; which means if we have two data mutations come up, really quickly:

Existing: Robert de-niro
SET → 1, name, Tom hanks
SET → 1, name, Bradley Cooper

Then, both of these would read [Robert de-niro], and attempt to delete it; and both would add 1 to tom hanks and bradley cooper.

That would still cause inconsistent behavior.

jchiu · November 24, 2016, 6:30am

For the third way, we don’t copy the index data and just rebuild it? If so, this seems like the most direct way and is harder to go wrong. Do we let “subsequent mutations flow through” only after fully syncing the index?

mrjn · November 24, 2016, 8:13am

Yeah, this shard copy process is a preparation process for the server, before it serves any real queries; so there’s no issue with any pending mutations.

So, what we could do is to copy over the index data as well, and then run the syncIndex; this might speed up our sync process. Also, for syncIndex to be run periodically, it would have to deal with existing index data anyway.

mrjn · November 28, 2017, 1:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incremental index building for eventual consistency Users	5	1033	November 28, 2017
Implementing indexing and filtering Users	5	857	November 28, 2017
How can I bypass the mutation stop while re-indexing? Dgraph kind:question	1	439	November 7, 2021
Predicate mutations while moving predicates between groups Dgraph mutation , status:accepted , ticket:created	9	1329	April 30, 2021
Batch update returns non-indexed errors GraphQL kind:question , kind:enhancement , status:accepted , ticket:created	1	620	September 15, 2020

Index consistency during shard population

Related topics