During production environment use, the DGRAPH query is blocked and not responsive until it times out?

dgraph_logs.zip (1.8 KB) During production environment use, the DGRAPH query is blocked and not responsive until it times out?

Please check the attachment.Be urgent!!!

- parking to wait for  <0x00000006be96db48> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
        at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
        at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
        at io.dgraph.Transaction.lambda$queryWithVars$0(Transaction.java:50)
        at io.dgraph.Transaction$$Lambda$1309/154123469.get(Unknown Source)
        at io.dgraph.ExceptionUtil.withExceptionUnwrapped(ExceptionUtil.java:18)
        at io.dgraph.Transaction.queryWithVars(Transaction.java:49)
        at com.inveno.rrlike.graph.repository.ForwardUgcRepositoryWithDGraph.getUgcNodeResp(ForwardUgcRepositoryWithDGraph.java:67)

goroutine-91.txt (47.2 KB) goroutine-92.txt (35.9 KB) goroutine-93.txt (36.0 KB) Uploading: heap-91.txt

This upload failed. I tried to fix, but no luck.

Please, share more details about your cluster and what you are doing in steps. Or provide a reproducible code in gist or from a repo.

Also, can you try to use best-effort queries https://dgraph.io/docs/clients/raw-http/#running-best-effort-queries ? And read-only queries https://dgraph.io/docs/clients/raw-http/#running-read-only-queries

Query I as you said, the use of read-only query, the query does not block, but call txn.mutate(mutation);The thread is blocked again and remains unresponsive.

    private void execDGraphQL(UgcNodeEntity newUgcNode) {
        Transaction txn = dgraphClient.newTransaction();
        try {
            String json = JSON.toJSONString(newUgcNode);
            logger.info("execDGraphQL json:{}", json);
            // Run mutation
            DgraphProto.Mutation mutation =
                    DgraphProto.Mutation.newBuilder().setSetJson(ByteString.copyFromUtf8(json)).build();
            DgraphProto.Response res = txn.mutate(mutation);
            logger.info("DGraph create response: {}", res.getJson().toStringUtf8());
            txn.commit();
        } finally {
            txn.discard();
        }
    }

public class UgcNodeEntity {
    private String uid;
    private String type = "UGC";
    private Long ugcId;
    private Long ugcUid;
    private Long rootUgcId;
    private Integer isDeleted;
    private Integer isRoot;
    private Integer forwardCount;
    private Long createTime;
    private Long updateTime;
    private DGraphNode forwardFrom;
}
- parking to wait for  <0x00000006c46d6dd8> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
        at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
        at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
        at io.dgraph.Transaction.lambda$mutate$2(Transaction.java:97)
        at io.dgraph.Transaction$$Lambda$1296/1777856840.get(Unknown Source)
        at io.dgraph.ExceptionUtil.withExceptionUnwrapped(ExceptionUtil.java:18)
        at io.dgraph.Transaction.mutate(Transaction.java:97)
        at com.inveno.rrlike.graph.repository.ForwardUgcRepositoryWithDGraph.execDGraphQL(ForwardUgcRepositoryWithDGraph.java:418)
        at com.inveno.rrlike.graph.repository.ForwardUgcRepositoryWithDGraph.updateOrInsertUgcNode(ForwardUgcRepositoryWithDGraph.java:137)
        at com.inveno.rrlike.graph.repository.ForwardUgcRepositoryWithDGraph$$FastClassBySpringCGLIB$$141aa35f.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

return next.newCall(method, callOptions.withDeadlineAfter(10, TimeUnit.SECONDS));
Does the timeout set not take effect?How did they block the thread together and not release the timeout

    @Bean
    public DgraphClient dgraphClient() {
        String[] ips = graphServerIps.split(",");
        List<DgraphGrpc.DgraphStub> stubList = new ArrayList<>();
        for (String address : ips) {
            String[] addr = address.split(":");
            if (addr.length != 2)
                continue;
            String host = addr[0];
            Integer port = Integer.parseInt(addr[1]);
            
            ManagedChannel channel = ManagedChannelBuilder
                    .forAddress(host, port)
                    .usePlaintext().build();
           
            ClientInterceptor timeoutInterceptor = new ClientInterceptor() {
                @Override
                public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(
                        MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {
                    return next.newCall(method, callOptions.withDeadlineAfter(10, TimeUnit.SECONDS));
                }
            };
            DgraphGrpc.DgraphStub stub = DgraphGrpc.newStub(channel);
            stub.withInterceptors(timeoutInterceptor);
            stubList.add(stub);
        }
        DgraphGrpc.DgraphStub[] stubs = stubList.toArray(new DgraphGrpc.DgraphStub[stubList.size()]);
        return new DgraphClient(stubs);
    }

It depends on what you are doing. The write will be blocked anyway and there’s nothing to do other than just wait. You can increase some configs in your cluster.

Share details of your config and machine stats.

In general writes are blocked due to a predicate move or some other big background task.

BTW, if you don’t wanna be blocked often you can try the ludicrous mode.

BTW2, I don’t code JAVA. So I can’t help you analyze your code. I hope others can.

the cluster config

nohup ./dgraphv2011 zero  --my 10.4.19.91:5180  --port_offset 100 --replicas 3 --v 3 > zero.log &
nohup ./dgraphv2011 alpha --whitelist 10.4.0.0:10.5.0.0 --port_offset 100 --lru_mb 42000 --my 10.4.19.91:7180 --zero 10.4.19.91:5180 --v 3 > dgraph.log &

nohup ./dgraphv2011 alpha --whitelist 10.4.0.0:10.5.0.0 --port_offset 100 --lru_mb 42000 --my 10.4.19.92:7180 --zero 10.4.19.91:5180 --v 3 > dgraph.log &
nohup ./dgraphv2011 alpha --whitelist 10.4.0.0:10.5.0.0 --port_offset 100 --lru_mb 42000 --my 10.4.30.26:7180 --zero 10.4.19.91:5180 --v 3 > dgraph.log &

Hi @llooper-dev,

It seems that while running a heavy load of mutations, Dgraph is taking some time to send the response back for some of them and that causes the client to park its threads while waiting for that response as you are using the synchronous client.
You can avoid waiting for the response by using the asynchronous client. AsyncClient will give you AsyncTransaction. Using .mutate() on that transaction will give you a CompletableFuture which lets say you can keep collecting at the client-side in some list.

You can run a separate thread pool on the client to process that list of CompletableFuture. The processing will first check whether the future has completed using .isDone(), if that returns true, you can try .join() to get the Response otherwise you don’t try to .join() it. That way you can completely unblock your main thread. You can configure your thread pool to wake up every second and process the futures. And at the end when you are done with all the mutations, after giving some time-out, you can mark the remaining futures in that list using .completeExceptionally(), so that when you call .join() on them they throw an error.

The ClientInterceptor not working for your case seems like a bug to me with the Java-Client. That I will have to dig deeper and find out a fix. I do suspect these lines though: https://github.com/dgraph-io/dgraph4j/blob/master/src/main/java/io/dgraph/StreamObserverBridge.java#L36-L39
I guess what may be happening is the ClientInterceptor calls onCompleted() but that never marks the future as complete, so the thread remains blocked. But, I am not sure about that, so need to dig deeper.