Production Upsert - UNKNOWN: Uid: [834751] cannot be greater than lease: [0]

devinbost · October 23, 2019, 12:15am

After a successful upgrade to Dgraph v1.1, we rolled out an upsert function that was working locally on a small dataset, but it’s failing in production.
I’m getting this exception:

java.lang.RuntimeException: java.util.concurrent.CompletionException: java.lang.RuntimeException: The doRequest encountered an execution exception:
at io.dgraph.AsyncTransaction.lambda$doRequest$2(AsyncTransaction.java:173) ~[functions.jar:?]
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) ~[?:1.8.0_212]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[?:1.8.0_212]
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) ~[?:1.8.0_212]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) ~[?:1.8.0_212]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ~[?:1.8.0_212]
Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: The doRequest encountered an execution exception:
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592) ~[?:1.8.0_212]
… 5 more
Caused by: java.lang.RuntimeException: The doRequest encountered an execution exception:
at io.dgraph.DgraphAsyncClient.lambda$runWithRetries$2(DgraphAsyncClient.java:212) ~[functions.jar:?]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_212]
… 5 more
Caused by: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: UNKNOWN: Uid: [834751] cannot be greater than lease: [0]
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_212]
at io.dgraph.DgraphAsyncClient.lambda$runWithRetries$2(DgraphAsyncClient.java:180) ~[functions.jar:?]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_212]
… 5 more
Caused by: io.grpc.StatusRuntimeException: UNKNOWN: Uid: [834751] cannot be greater than lease: [0]
at io.grpc.Status.asRuntimeException(Status.java:533) ~[functions.jar:?]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:442) ~[functions.jar:?]
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[functions.jar:?]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[functions.jar:?]
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[functions.jar:?]
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700) ~[functions.jar:?]
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[functions.jar:?]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[functions.jar:?]
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[functions.jar:?]
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399) ~[functions.jar:?]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:507) ~[functions.jar:?]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:66) ~[functions.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:627) ~[functions.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:515) ~[functions.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:686) ~[functions.jar:?]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:675) ~[functions.jar:?]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[functions.jar:?]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[functions.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

on almost all upserts.

I read through some of the other discussion threads, and they all seem to be associated with bulk imports, not upserts or live mutations.

Any ideas?

devinbost · October 23, 2019, 12:18am

This is caused by the same code that was discussed here: Upsert with multiple UIDs - #2 by amanmangal

martinmr · October 23, 2019, 12:31am

This is usually called when you try to add mutations with specific UIDs (that don’t exist yet). For example, this mutation would not work.
<0xbeef> <name> "My name" .
but this one would work:
_:blankNode <name> "My name" .

So the issue seems specific to the contents of your upsert. I would make sure you are not trying to add mutations with specific uids that are not known by Dgraph.

devinbost · October 23, 2019, 12:41am

The only UIDs I’m using in the upsert are from querying Dgraph, so that can’t be the case unless there’s a serious bug in the bulk loader or something.

Here are the steps to reproduce the upsert (quoting my comments in my other post).

First, run this alter on the schema:

type Products { 
    products: [Product] 
} 
type Product { 
    productId: string 
    options: [Option] 
} 
type Option { 
    optionId: string 
    color: string 
}
<collectionId>: int @index(int) .
<color>: string .
<optionId>: int @index(int) .
<options>: [uid] .
<productId>: int @index(int) .
<products>: [uid] .

Then, run this mutate:

{
  "set":[ {
    "uid": "_:products",
    "dgraph.type": "Products",
    "collectionId": 1,
    "products": [
      {
        "dgraph.type": "Product",
        "uid": "_:product",
        "productId": 19610626,
        "options": [
          {
            "dgraph.type": "Option",
            "uid": "_:option",
            "optionId": 32661491,
            "color": "red"
          }
        ]
      }
    ]
}]
}

Then, run this Java code:

String query = "{\n" +
            "  getVals(func: has(products)) {\n" +
            "    productsUid as uid\n" +
            "    products @filter(eq(productId, 19610626)) {\n" +
            "      productUid  as uid\n" +
            "      options @filter(eq(optionId, 32661491)) {\n" +
            "        optionUid as uid\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";

DgraphProto.Mutation mu =
            DgraphProto.Mutation.newBuilder()
                    .setSetNquads(ByteString.copyFromUtf8("uid(productsUid) <products> uid(productUid) ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(productsUid) <dgraph.type> \"Products\" ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(productUid) <productId> \"19610626\" ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(productUid) <options> uid(optionUid) ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(productUid) <dgraph.type> \"Product\" ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(optionUid) <color> \"blue\" ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(optionUid) <dgraph.type> \"Option\" ."))
                    .setSetNquads(ByteString.copyFromUtf8("uid(optionUid) <optionId> \"32661491\" ."))
                    .build();
    Map uidsMap;
    try(DgraphConnection dgraphConnection = DGraphQueryHelper.createDgraphClient(false, context)){
        Transaction txn = dgraphConnection.getDgraphClient().newTransaction();
        try{
            DgraphProto.Request request = DgraphProto.Request.newBuilder()
                    .setQuery(query)
                    .addMutations(mu)
                    .setCommitNow(true)
                    .build();
            DgraphProto.Response res = txn.doRequest(request);
            uidsMap = res.getUidsMap();
            String exactOutput = res.getJson().toStringUtf8();
        } catch (Exception e){
            throw e;
        }
        finally {
            txn.discard();
        }

    } catch (Exception e){
        logger.error("The message on the exception is: " + e.getMessage());
        StringWriter sw = new StringWriter();
        PrintWriter pw = new PrintWriter(sw);
        e.printStackTrace(pw);
        logger.error("Showing stack trace: " + sw.toString());
        throw e;
    }

devinbost · October 23, 2019, 2:59pm

Has anyone else successfully run an upsert after a bulk import?

devinbost · October 23, 2019, 3:25pm

When we run this query from prod Ratel:

upsert {
  query {
  getVals(func: has(products)) {
    productsUid as uid
      products @filter(eq(productId, 19610626)) {
      productUid as uid
      options @filter(eq(optionId, 32661491)) {
        optionUid as uid
      }
    }
  }
}

  mutation {
    set {
      uid(productsUid) <products> uid(productUid) .
      uid(productsUid) <dgraph.type> "Products" .
      uid(productUid) <productId> "19610626" .
      uid(productUid) <options> uid(optionUid) .
      uid(productUid) <dgraph.type> "Product" .
      uid(optionUid) <dgraph.type> "Option" .
      uid(optionUid) <optionId> "32661491" .
    }
  }
}

we get the same error.

However, when we run it in our test environment (which we also ran a bulk import on), we get a different result:

MichelDiz · October 23, 2019, 6:20pm

Wha you mean? I’m a bit confused if you are calling “bulk” of upsert or if you bulkloaded before an upsert.

Well, if you have this situation after doing a bulk-load. It might mean you’re using another Zero instance. You should use the same Zero instance used during the bulk-load. Cuz in that instance we have the data of the allocated/mapped uids. If you use an Zero instance from scratch, this will happen because there are no allocated uids.

devinbost · October 23, 2019, 6:26pm

Here are the steps that we ran:

Connected to an Alpha node via docker-standalone exec -it dgraph-alpha /bin/bash
Ran export via curl localhost:8082/admin/export
Exited container
Copied export directory to shared directory
Backed up data files from Alpha and Zero nodes by copying to different shared directory
Stopped Ratel nodes
Stopped Alpha nodes
Stopped Zero nodes
Removed Ratel, Alpha, and Zero containers
Deleted all data files
Started Zero nodes
Identified leader node from logs
Ran gunzip on exported schema
14 Edited uid to [uid] in all instances in the unzipped schema file
Copied exported schema file and data file to Dgraph zero leader node
Ran this command on the Dgraph zero leader node:
dgraph bulk -f /dgraph/export/dgraph.r1233498.u1021.2005/g01.rdf.gz -s /dgraph/export/dgraph.r1233498.u1021.2005/g01.schema --map_shards=2 --reduce_shards=1 --http localhost:8002 --zero=localhost:5082
Copied the exported p files to all of the Alpha data directories
Started the Alpha containers
Started the Ratel container
Attempted to run mutation in Ratel by running the above query.

Are you saying that we need to connect Ratel to a specific instance in the cluster?

MichelDiz · October 23, 2019, 6:30pm

That’s the issue that I just talked about.

devinbost · October 23, 2019, 6:39pm

I’m still not following.

We were following the upgrade procedure to upgrade from an older version of Dgraph.

From what we can tell, the import was successful. We can see our data in the cluster. We just can’t run the mutation.

When you say that we’re using another Zero instance, what do you mean by that? We upgraded the entire cluster.

MichelDiz · October 23, 2019, 6:45pm

You are confirming my theory. When you bulkload, you use an instance of Zero to allocate UIDs. This information is in this instance only. Then you should not eliminate it.

Do the following, increase the number of nodes via API e.g.
/assign?what=uids&num=100000000000000000000000000 Just in case.
Export it again, and now do the bulk load process again without deleting the zero instance.

devinbost · October 23, 2019, 7:15pm

How would I avoid eliminating the Zero instance if we need to upgrade the cluster to v1.1?

Are we going to need to roll back the entire cluster and recover from our backup, re-perform the export, re-upgrade the nodes, and re-perform the import?

Or, are you saying that we should just try exporting and re-importing with our current cluster in its current state?

I still am not understanding how we would upgrade a cluster to v1.1 without replacing all zero containers with the ones that are running the upgraded version after performing the export from the alpha node (after which, we import from the Zero leader).

devinbost · October 23, 2019, 7:16pm

Wait… Are you saying that we should not perform the bulk load until we have already upgraded the cluster?

MichelDiz · October 23, 2019, 7:21pm

The upgrade is:

From an old version you just export the data. Nothing more. Put the cluster with the old version down.
Do the bulkload with the New version.
When the bulkload finishes, don’t remove the zero instance. Only the Alphas (you should not have Alphas running oO) or even Ratel (but no need to remove Ratel tho).
Now you start the rest of the cluster with the zero instance from the bulkload.

If you need to automate this, you can get inspiration from this repo GitHub - MichelDiz/Dgraph-Bulk-Script: Just a simple Sh to use Dgraph's Bulk Loader.

devinbost · October 23, 2019, 7:27pm

Thanks. I think we might have accidentally run the bulk import before we upgraded the Zero nodes. We will try removing the data directories, re-deploying the Zero nodes, re-running the bulk loader, re-copying the data to the p directories, and re-starting the Alpha nodes.

MichelDiz · October 23, 2019, 7:27pm

oh!! you’re using the standalone! never use the standalone in production! please! there’s a log informing this.

It’s okay if you’re just testing and export from it. But if you use this image to import or do something else, can be tricky.

devinbost · October 23, 2019, 7:55pm

Do you have a link to the log? Do you know what kinds of issues it can cause? I’ll need to share this information with my team.

MichelDiz · October 23, 2019, 7:57pm

When you start the standalone image it shows the log informing that you should not use it in prod. You could, but need to edit the image and add scripts to make it work with bulkload if that the case.

devinbost · October 23, 2019, 11:22pm

We ran through the entire process again, and things are working now. Thanks for the help!

Topic		Replies	Views
Upsert with multiple UIDs Users mutation , java-client	48	4699	November 15, 2019
Concurrent upserts creating aborted transactions Users mutation , java-client	30	4113	December 21, 2019
Init loading data and upsert Dgraph area:bulk-loader , area:live-loader	0	88	June 21, 2024
Dgraph live err Dgraph dgraph , untagged , kind:bug	3	639	September 30, 2020
Uid: [24387] cannot be greater than lease: [0] Dgraph	5	2515	February 28, 2019

Production Upsert - UNKNOWN: Uid: [834751] cannot be greater than lease: [0]

Related topics