Since Dgraph 1.1.1 and also with Dgraph 1.2.0rc1, several go routines of my parallel tests regularly get stuck at txn.Query()
At the deepest of the stack, they stuck at grpc/internal/transport.(*Stream).waitOnHeader()
I have no clue why it happens, and it is hard to give you a simple example to reproduce the lock.
What could cause that? How can I get debug info about it?
Another info: Some other routines are locked on a short (4sec) time.Sleep()
With my Goland IDE, if I pause the debugger, then put a breakpoint just after one of these time.Sleep(), then resume execution, then it never reaches the breakpoint!
How could that be possible?
Thank you for the logs, I am looking into this issue. It is possible that alpha is stuck and not responding to queries. I will keep you posted with the progress.
Are you using GraphQL endpoint by any chance?
I see an error log though that seems unrelated. We are looking into that in any case.
The logs are of around 1 min. I can see that an alter request is issued which would block normal reads. Are you still not able to query anything on the cluster?
That means, alpha is doing fine. Either something is off with the network, or there is an issue in your code. Happy to look at your code if it is possible for you to share it.
Maybe dGraph is not involved as when it happens EVERY routines are stuck.
If I set a debugger breakpoint at the next instruction of any routine, it is never reached.
I have one CPU at 100%.
I tried to run dgraph alpha source code as a project in my IDE, run in debug mode and then pause the program when it happens, and the result is the CPU activity stopping. So it looks like related to my client AND dGraph alpha.
Doing the same with dGraph zero didn’t not paused the CPU activity.
I also tried to split this upsert into individual query, mutation and mutate delete operations, in a single transaction or in individual transactions but without success.
New info: The lock is related to the recent_weight predicate which is defined like that:
recent_weight: float @index(float) .
And when I query it with an orderdesc in a big query including:
word_0_0 as var(func: ge(recent_weight, 0), first: 8, orderdesc: recent_weight) @filter(((NOT uid("0x51a9","0x51ad","0x3580b8","0x358527")) AND NOT uid("0x51bc","0x51af")))
If I remove the orderdesc, I don’t have any more lock.
Is it possible for you to provide a sequence of steps to reproduce the issue along with example dataset if possible? I tried with your query and a simple dataset. Worked fine for me here. Happy to look into the issue further then. Thanks for trying to figure it out.