Live Loader came up with a lot of aborts

gumupaier · June 18, 2020, 9:45am

hi

Live Loader came up with a lot of aborts，What went wrong

Naman · June 18, 2020, 10:41am

Hey @gumupaier, thanks for reaching out.
The aborts are due to arising conflicts. Nothing to worry about, all of the data would be committed after some retries.
Please tell us the following things for us better understanding the issue.

version of Dgraph are you using
the schema of data (if possible)

gumupaier · June 18, 2020, 11:21am

@Naman
Thank you for your reply

version of Dgraph: v1.2.2
schema

name: string @index(exact, fulltext, trigram) @upsert .
node_id: string @index(exact) @upsert .
type Person {
    node_id
    name
}
type Company{
    node_id
    name
}

gumupaier · June 18, 2020, 11:27am

Now I have a new problem, when I use live Loader to import multiple times to generate the same data, how can I ensure that the data is unique. Can you provide relevant JSON sample data

Naman · June 18, 2020, 11:33am

For the earlier query regarding aborts, that many aborts are fine and won’t be a big issue (just a small optimization/performance gain).

Dgraph doesn’t take care of uniqueness. You have to ensure that with your mutations. Feel free for further queries.

gumupaier · June 18, 2020, 11:41am

hi @namanj I see in the documentation that upsert can be used to ensure uniqueness. Our current usage scenario will have a lot of incremental data per day, about a few million. If we use upsert for a large amount of data, the performance will be greatly reduced.

Can a scenario like this, where a lot of data is constantly updated, give some Suggestions

harshil_goel · June 18, 2020, 12:04pm

Yeah, Upsert only ensure uniqueness for concurrent transactions. For example, lets say you have a predicate “email” with upsert. If you have two mutations running concurrently that add the same email, upsert would abort one of them. But if you run then one by one, it wouldn’t.

Ideally how you should write any mutation with upsert is that you should first check if the value exist or not. If you do that without upsert, both mutations would pass that check. So you require a combination of upsert and the check.

To support a feature that does this for you, would work in a similar fashion. So performance shouldn’t be affected. Checking time could be decreased by using indexes.

gumupaier · June 18, 2020, 12:24pm

hi @harshil_goel Can live loader be used in upsert form, and if supported, can you help provide a json sample data, thank you very much！！

harshil_goel · June 18, 2020, 1:19pm

Sorry, the live loader cannot be used in the upsert form.

Neeraj · June 19, 2020, 4:58am

Hey @gumupaier,
This can be done by using -x flag while using dgraph live. With this you provide a path to a directory where Dgraph stores all the uids it has already used and takes care of duplicates even across different json files. In your case while creating the json file you can use your node_id as uid. For example two of your json files are as follows

file1

<_:node_id1> <node_id> "node_id1" .
<_:node_id1> <name> "company_name_1" .
<_:node_id2> <node_id> "node_id2" .
<_:node_id2> <name> "company_name_2" .

file2

<_:node_id1> <node_id> "node_id1" .
<_:node_id1> <name> "company_name_1" .
<_:node_id3> <node_id> "node_id3" .
<_:node_id3> <name> "company_name_3" .

when you load these two files one by one using dgraph live, there’ll be only one node with node_id1.

P.S. : You have to use -x flag every time you run dgraph live.

gumupaier · June 19, 2020, 6:01am

thank you very much. This way looks very good, I test it, I will give you feedback later

gumupaier · June 22, 2020, 9:32am

hi @Neeraj @MichelDiz
This way I encountered a new error and found the following error during execution

Error while processing data file "A100_unlisted_2.json.gz": During parsing chunk in processLoadFile: strconv.ParseInt: parsing "2e-05": invalid syntax

My data file format is as follows

It looks like the data in the file is going to be truncated when it gets too big

I hope you can help me check what is wrong.

MichelDiz · June 23, 2020, 2:55am

This error is in your dataset. Some key has a wrong type of INT. The log says “2e-05” this feels like scientific notation, not a valid Integer. And Dgraph can’t convert scientific notation to Integer as far as I know.

What I recommend is, as the log does not say what the name of the key is. That you then use for example “Grep”, to identify the line and find out which predicate is this value. And then evaluate whether to change the value itself or change the type of the predicate to String. If you want to preserve the scientific notation.

Unfortunately, Dgraph is not yet able to ignore these small details. There is an issue about this being analyzed, but for now, you need to edit it in your dataset.

And as I said in the other post. Aborts are fine, if you wanna get rid of them, you might need more resources in your cluster or check your dataset.

gumupaier · June 24, 2020, 1:55am

hi @MichelDiz
It was a data error. I have located the problem. Thank you for your help

Topic		Replies	Views
Live loader produces duplicates with upsertPredicate enabled Dgraph	3	585	March 18, 2022
Duplicate Nodes while using live loader Dgraph dgraph	1	393	November 12, 2020
Debug which rows of data aborted in live loader? Dgraph	2	675	October 5, 2020
Upsert Failed for unique data Dgraph	12	604	November 3, 2020
Upsert Will duplicate nodes be generated in the process of multithreading data update using the upsert block Dgraph	9	655	March 13, 2021

Live Loader came up with a lot of aborts

Related topics