Upserts in ludicrous mode

Question about normal vs. ludicrous mode.

I implemented an upsert (Query+Mutation) using the Go client. My implementation follows this example very closely: dgraph/main.go at master · dgraph-io/dgraph · GitHub. Following the example code, the upsert logic uses Txn.The upserts are executed in parallel and working fine in normal mode.

Now to speed up my indexing process, I enabled ludicrous mode. Now the upserts fail with the error message “StartTs mismatch”. From the docs, I take it I cannot use Txn while running DGraph in ludicrous mode. Unfortunately, all the other examples I find for doing upserts using the Golang client (here for example: GitHub - dgraph-io/dgo: Official Dgraph Go client) are using Txn.

How would I go about implementing upserts using the Go client while running DGraph in ludicrous mode?

Thanks for the advice.

UPDATE: Ignore my comment. I was wrong, the ludicrous mode can run Upsert Block.

Indeed, it is not possible to use functions/features that require transactions. I can’t imagine what a solution would be like for that. Feel free to open an issue on Github.

So if I understand correctly, ludicrous mode covers the same / similar use cases as the bulk loader / live loader, i.e. one time bulk data ingestion. It is not intended to be used to incrementally update an existing graph, i.e. incrementally adding, updating and deleting (existing) nodes?

My use case requires both one time data ingestion and incrementally updating an existing graph. Unless I can somehow change ludicrous mode programmatically from the client at runtime (after the ingestion phase that is), that sounds like ludicrous mode does not fit my usage scenario.

I can open a Github issue of course. Do you think transactional upsert support in ludicrous mode is a feature request of a bug?

Ludicrous mode gives you eventual consistency guarantees. Also, there are no transactions in ludicrous mode. If you need linearizable reads and transactions, then you shouldn’t use this mode.

That said, you can still do queries and updates for existing data in Dgraph with ludicrous mode. It wouldn’t be transactional, but you can still do updates.

If you need to do a one-time load to bootstrap a new cluster, then bulk loader would be the fastest way to load the data.

Thanks for the clarification.

If those updates are done concurrently, and my code can’t use transactions, there is the risk of a race condition. So that is not an option, I believe.

@dmai, @MichelDiz: We should be able to do upserts in ludicrous mode. In fact, we had a task to benchmark how many upserts we can run in ludicrous mode; and the numbers were looking great. Wasn’t there a topic about that?

1 Like

Not sure, I don’t remember participating in that topic.

But yeah, I think we should make upsert block works in ludicrous mode. But we should have a ticket about it. Cuz we have to remove the transaction code in that mutation procedure.

I have noticed that there is some confusion about Upserts.

This link is about the old Upsert. Which is different from Upsert Block

And this one is about Upsert Block.

I think these names confuses users. We should think about renaming it for real.

And we need to check if the old Upsert Procedure works in ludicrous mode too.

Yes, when I started implementing upserts in my app, I found the code in the upsert acceptance test first. Only later did I find the Upsert Block example in the second link.

The Upsert Block example looks very clean; I plan on refactoring my code to use that style. But the “old” upsert style is working fine as well (without ludicrous mode). Is there a problem with this “old” style that I am not aware of?

No problem, but it is different from Upsert Block. Which is kind of more sophisticated way to upsert data.

Yeah, I was referencing the upsert block approach. That should definitely work with ludicrous mode and in fact, speeding those up was the motivation behind it.

1 Like

@mrjn You were right, ludicrous mode can run Upsert Block - there’s no transaction relation there. I have tested it via Ratel and works just fine. Need to check the old approach. As the other approach do checks during a transaction, in theory, shouldn’t work.

@dmai maybe Upsert Block don’t work well on dgo side? As I tested on Ratel, it may be related to the way Dgo needs to mutate (e.g. open a txn? not sure, as we skip transactions and committed right away. So Dgo should work). I can be wrong. I gonna test Upsert Block on Dgo tomorrow.

@mvcatsifma You may* have another problem on your side if it is not related to Dgo. Can you do a sanity check on your code? And share a gist?

If I understand correctly, I should refactor my code to use “new” style of upserts, called “Upsert Block”, using this example as a reference: GitHub - dgraph-io/dgo: Official Dgraph Go client

This “new” style of upserts should work in ludicrous mode.

Is this conclusion correct?

What confuses me is that “new” style upsert block uses transactions as well.
From the example:

// Update email only if matching uid found.
if _, err := dg.**NewTxn()**.Do(ctx, req); err != nil {
  log.Fatal(err)
}

@MichelDiz Please share the result of your test of Upsert Block on Dgo.

Wanted to chime in an also express confusion over the limitations of the upsert block (as well as the upsert directive) and how both behave in ludicrous mode.

An example of what is safe vs not safe would be appreciated :slight_smile:

For example (in ludicrous mode), if two upserts attempt to modify the same predicate do they conflict? If one upsert queries for a variable that changes before the mutation part of the request completes, will it abort?

In ludicrous mode, we don’t do conflict detection, so there are NO aborts. The downside of that is, if two concurrent upserts ask for the same node which doesn’t exist at the time, both might end up creating it. But, once present, it would be found and returned.

1 Like

I have tested it using the Go test files. All test pass fine, but the TestTxnErrAborted don’t. And this one is expected.

Thanks @mrjn. Specifically I’m interested in the behavior of the mutations across concurrent upsert blocks. Is the entire upsert block considered one “unit” of work? Or can the mutations within an upsert block get interlaced with mutations from another upsert block happening simultaneously?

Consider the following two upserts executed at the same time, on a person node that is known to exist (ludicrous mode):

upsert {
  query {
    person as var(func: eq(personId, "47"))
  }
  mutation {
    set {
      uid(person) <favoriteColor> "blue" .
      uid(person) <favoriteAnimal> "cat" .
    }
  }
}

and

upsert {
  query {
    person as var(func: eq(personId, "47"))
  }
  mutation {
    set {
      uid(person) <favoriteColor> "green" .
      uid(person) <favoriteAnimal> "dog" .
    }
  }
}

Is the resulting state either
{ favoriteColor: "blue", favoriteAnimal: "cat" }
or
{ favoriteColor: "green", favoriteAnimal: "dog" }

… or could you possibly end up with
{ favoriteColor: "green", favoriteAnimal: "cat" } ?

Additionally, does the behavior differ if the mutations were across multiple mutation blocks within the upserts? Imagine the previous example but instead of the changes all falling within one set block for each upsert, they are split across multiple mutations:

  mutation @if(someTrueCond) {
    set {
      uid(person) <favoriteAnimal> "dog" .
    }
  }
  mutation @if(someOtherTrueCond) {
    set {
      uid(person) <favoriteColor> "green" .
    }
  }

Thanks!

What you’re looking for is transactionality. While ludicrous mode does not provide those guarantees, the internal workings of Dgraph are such that all mutations go single-file (via Raft WAL). In normal mode, they get applied single file too. But, in ludicrous mode, they get multiplexed to (subject, predicate) combo. However, within the same (sub, pred), there is still serialization of the mutations. Therefore, most likely, you will end up with either “blue cat” or “green dog”.

But again, hard to know in advance, just considering ludicrous is NOT giving you transactions. Instead, it is more akin to Cassandra-style writes.

1 Like

Excellent, thanks! “serialization of the mutations” was what I was honing in on since my use case is mostly focused on eventual consistency. So- given that ludicrous mode still abides by a serialized WAL for sub/pred mutations, is it a fair assessment to say “ludicrous provides serializability, but only to the extent that the WAL/cluster stays healthy”?

It is not a factor of cluster being healthy. The general single-file idea works when there are no shards.

In a sharded setup, where a mutation runs across multiple groups of Alphas, then that mutation would be part of multiple WALs – each corresponding to that Alpha group. In that case, two mutations may interleave depending upon the order in which they ended up on the corresponding WAL.

In your above example, if color predicate is in a different group than animal predicate, then there are two different WALs involved. So, it is possible that blue < green in WAL1, while dog < cat in WAL2. So, the eventual result would be “green cat”.

1 Like