Report a Dgraph Bug
We’re adding edges to a node, and the edge is defined with @reverse.
If the mutation contains more than ~700 N-Quad, the resulting state of the relationship is inconsistent.
For example, always adding the same 700 relationships… navigating from parent to children (direct edge) I count 697 edges (this number can change at any upsert execution, but is consistent on read queries)… but if I could the edges via the reverse relationship, I find all expected 700.
Meaning, we have a reverse-edge, without the corresponding direct-edge. More details below.
For now we ended up limiting the mutations to maximum 500 n-quads, and doing multiple batches. This seems to give consistent results.
What version of Dgraph are you using?
Dgraph Version
Running in local docker:- dgraph/standalone:v21.03.2
- dgraph/ratel:v21.03.2
Golang Client
github.com/dgraph-io/dgo/v200Have you tried reproducing the issue with the latest release?
No, had lots of other issues with Zion so we gave up, but looking forward to the next version
What is the hardware spec (RAM, OS)?
MacBookPro 16’
2.4 GHz 8-Core Intel Core i9
32 GB 2667 MHz DDR4
Steps to reproduce the issue (command/config used to run Dgraph).
n.b. Name of entities and fields have been changed for the sake of the example.
I have the following schema:
type Collection {
nid
last_modified
modified_by
collection.books
}
type Book {
nid
name
}
nid: string @index(hash) @upsert .
collection.books: [uid] @count @reverse .
Our golang code prepares a request to append to the list of book uids in the collection like this:
func (p UpdateCollection) ToUpsertRequest() api.Request {
var fields strings.Builder
fmt.Fprintf(&fields, "uid(target) <last_modified> %q .\n", p.LastModified)
fmt.Fprintf(&fields, "uid(target) <modified_by> %q .\n", p.ModifiedBy)
for _, v := range p.TargetBookUids {
fmt.Fprintf(&fields, "uid(target) <collection.books> <%s> .\n", v) //adds one triplet for each book
}
mutations = append(mutations, &api.Mutation{
SetNquads: []byte(fields.String()),
})
request := api.Request{
Query: `
query getByNid($nid: string!) {
target as target_query (func:eq(nid,$nid)) {
uid
}
}
`,
Vars: map[string]string{
"$nid": p.Nid,
},
Mutations: mutations,
}
return request
}
The request is then sent via the dgo library inside a transaction. And the transaction is committed at the end if the request is successful.
We call this method with a certain number of book uids that we can configure, but it seems if we send more then ~700 uids, we create an inconsistent state in the nodes.
I noticed that if I sent 700 items, and then ran a count on Ratel (best effort disabled) on the collection, I get less items, sometimes 698, sometimes 697, sometimes as low as 650~, but with no apparent pattern. Also the same exact items are sent, with variable results.
Then I ran a query in the code, to fetch the attached items to the collection, and found out the missing ones, I took one of the missing Uids and ran this on ratel:
collection(func:eq(nid,"collection1")) {
uid
nid
last_modified
countBooks: count(collection.books)
collection.books @filter(uid(<0x5d1>)) {
uid
nid
}
}
missingBook(func:uid(<0x5d1>)) {
uid
nid
~collection.books @cascade {
uid
nid
}
}
where 0x5d1
is the uid of one of the missing books.
The result of the query is as follows:
"collection": [
{
"uid": "0xabd47a",
"nid": "collection1",
"last_modified": "2022-09-02T08:34:45+09:00",
"countBooks": 697
}
],
"missingBook": [
{
"uid": "0x5d1",
"nid": "HgkANjsRIQE",
"~collection.books": [
{
"uid": "0xabd47a",
"nid": "collection1"
}
]
}
]
so the collection has a lower count as I saw before (697 vs 700), and it did not find the node with uid 0x5d1
when navigating the direct edge from the collection.
BUT when I fetch the missing book, I can find the collection by navigating the reverse edge ~collection.books
This behavior is consistent for all the 3 books that are missing from the count, and is consistent every time some of the books are missing. So everytime we update, all the reverse-edges are present, but a few of the direct-edges are missing.
This problem disappears if we reduce the batch size, by trial and error at the moment we set it to 500 elements.
Expected behaviour and actual result.
Expected behavior is all or nothing, if the request contains too many mutations I would expect it to fail and rollback the transaction, with a message saying that it’s too many nodes to handle in a single request.
But if the request is accepted, I would always expect consistent direct/reverse edges…