hi
Live Loader came up with a lot of aborts,What went wrong
Hey @gumupaier, thanks for reaching out.
The aborts are due to arising conflicts. Nothing to worry about, all of the data would be committed after some retries.
Please tell us the following things for us better understanding the issue.
@Naman
Thank you for your reply
version of Dgraph: v1.2.2
schema
name: string @index(exact, fulltext, trigram) @upsert .
node_id: string @index(exact) @upsert .
type Person {
node_id
name
}
type Company{
node_id
name
}
Now I have a new problem, when I use live Loader to import multiple times to generate the same data, how can I ensure that the data is unique. Can you provide relevant JSON sample data
For the earlier query regarding aborts, that many aborts are fine and won’t be a big issue (just a small optimization/performance gain).
Dgraph doesn’t take care of uniqueness. You have to ensure that with your mutations. Feel free for further queries.
hi @namanj I see in the documentation that upsert can be used to ensure uniqueness. Our current usage scenario will have a lot of incremental data per day, about a few million. If we use upsert for a large amount of data, the performance will be greatly reduced.
Can a scenario like this, where a lot of data is constantly updated, give some Suggestions
Yeah, Upsert only ensure uniqueness for concurrent transactions. For example, lets say you have a predicate “email” with upsert. If you have two mutations running concurrently that add the same email, upsert would abort one of them. But if you run then one by one, it wouldn’t.
Ideally how you should write any mutation with upsert is that you should first check if the value exist or not. If you do that without upsert, both mutations would pass that check. So you require a combination of upsert and the check.
To support a feature that does this for you, would work in a similar fashion. So performance shouldn’t be affected. Checking time could be decreased by using indexes.
hi @harshil_goel Can live loader be used in upsert form, and if supported, can you help provide a json sample data, thank you very much!!
Sorry, the live loader cannot be used in the upsert form.
Hey @gumupaier,
This can be done by using -x
flag while using dgraph live
. With this you provide a path to a directory where Dgraph
stores all the uids
it has already used and takes care of duplicates even across different json
files. In your case while creating the json file
you can use your node_id
as uid
. For example two of your json
files are as follows
file1
<_:node_id1> <node_id> "node_id1" .
<_:node_id1> <name> "company_name_1" .
<_:node_id2> <node_id> "node_id2" .
<_:node_id2> <name> "company_name_2" .
file2
<_:node_id1> <node_id> "node_id1" .
<_:node_id1> <name> "company_name_1" .
<_:node_id3> <node_id> "node_id3" .
<_:node_id3> <name> "company_name_3" .
when you load these two files one by one using dgraph live
, there’ll be only one node with node_id1
.
P.S. : You have to use -x
flag every time you run dgraph live
.
thank you very much. This way looks very good, I test it, I will give you feedback later
hi @Neeraj @MichelDiz
This way I encountered a new error and found the following error during execution
Error while processing data file "A100_unlisted_2.json.gz": During parsing chunk in processLoadFile: strconv.ParseInt: parsing "2e-05": invalid syntax
My data file format is as follows
It looks like the data in the file is going to be truncated when it gets too big
I hope you can help me check what is wrong.
This error is in your dataset. Some key has a wrong type of INT. The log says “2e-05” this feels like scientific notation, not a valid Integer. And Dgraph can’t convert scientific notation to Integer as far as I know.
What I recommend is, as the log does not say what the name of the key is. That you then use for example “Grep”, to identify the line and find out which predicate is this value. And then evaluate whether to change the value itself or change the type of the predicate to String. If you want to preserve the scientific notation.
Unfortunately, Dgraph is not yet able to ignore these small details. There is an issue about this being analyzed, but for now, you need to edit it in your dataset.
And as I said in the other post. Aborts are fine, if you wanna get rid of them, you might need more resources in your cluster or check your dataset.
hi @MichelDiz
It was a data error. I have located the problem. Thank you for your help