I found Dgraph crashing/hanging up with strings of length larger than 64971. Error message: Error occurred while aborting transaction: rpc error: code = Canceled desc = context canceled.
Even after a restart Dgraph is not usable anymore.
Are you using .NET? looks like .NET can’t deal with which can be 65535 bytes per string literal. In go the limit would be your memory. Which is about 2^64/2 characters.
Our setup used to work just fine with the docker image v21.12.0. Strings with a length over 200.000 characters were no problem at all. But now we want to switch to v22.0.1 and string length seems to be an issue.
After some time there were some additional logs:
I1208 14:47:48.248839 31 draft.go:1592] Found 3 old transactions. Acting to abort them.
I1208 14:47:48.249546 31 draft.go:1553] TryAbort 3 txns with start ts. Error:
I1208 14:47:48.249568 31 draft.go:1569] TryAbort: No aborts found. Quitting.
I1208 14:47:48.249572 31 draft.go:1595] Done abortOldTransactions for 3 txns. Error:
Depending on the indexing you used. It will try to index the entire string.
Is that docker? How much resources is Docker enabled to consume?
What disk are you using?
Hum, this can be like 200KB…
Maybe it has some improvement in version v21.12.0. The current version had to go back to v21.03 due to several unforeseen bugs. And we’ll work through each PR to see which one is worth merging back. So you need to wait or maybe try to find out which PR made the improvement you need.
This may be linked to improved indexing. Maybe in types, but I don’t remember anybody working with that. You’ll have to wait. Really hard to tell what it is.
Jup that was a docker container. There is no resource constraint configured. I dont know the exact disc specs but it should be plenty fast.
But i did some testing, indexing really seems to be the problem here. I testet with a string of length 200.000 just containing the letter a. Index regexp is working. Indexes exact and term are not working. I use a combination of all three as our usecase is extensive and exact searching.
As indexing seems to be the problem, is there anything on your roadmap regarding indexing performance?
Edit:
I did some testing just with the regexp index, strings of length 5.000.000 seem to be no problem (which should be enugh i guess). Thus i will switch to regexp only for now and run searches with predefined regex patterns. But it would be nice to use the other indexes as well
what was the indexing you were using? We can create some e2e tests for this. Like simulating what you did. But also there are known limitations with indexing for some cases. The docs says I think. You can’t use any index for any case. Another example is that regex only work with 3+ characters due perf. But if you say exact is a problem. I gonna ask the team to go deep into this.
So initially the field was indexed with term, exact and regexp. After some testing i found out that a single term index or a single exact index are also troublesome (and the combination of both of course).
The following indexes lead to hang ups when inserting too long strings: