Live Loader came up with a lot of aborts

hi

Live Loader came up with a lot of aborts,What went wrong

Hey @gumupaier, thanks for reaching out.
The aborts are due to arising conflicts. Nothing to worry about, all of the data would be committed after some retries.
Please tell us the following things for us better understanding the issue.

  • version of Dgraph are you using
  • the schema of data (if possible)

@Naman
Thank you for your reply

version of Dgraph: v1.2.2
schema

name: string @index(exact, fulltext, trigram) @upsert .
node_id: string @index(exact) @upsert .
type Person {
    node_id
    name
}
type Company{
    node_id
    name
}

Now I have a new problem, when I use live Loader to import multiple times to generate the same data, how can I ensure that the data is unique. Can you provide relevant JSON sample data

For the earlier query regarding aborts, that many aborts are fine and won’t be a big issue (just a small optimization/performance gain).

Dgraph doesn’t take care of uniqueness. You have to ensure that with your mutations. Feel free for further queries.

hi @namanj I see in the documentation that upsert can be used to ensure uniqueness. Our current usage scenario will have a lot of incremental data per day, about a few million. If we use upsert for a large amount of data, the performance will be greatly reduced.

Can a scenario like this, where a lot of data is constantly updated, give some Suggestions

Yeah, Upsert only ensure uniqueness for concurrent transactions. For example, lets say you have a predicate “email” with upsert. If you have two mutations running concurrently that add the same email, upsert would abort one of them. But if you run then one by one, it wouldn’t.

Ideally how you should write any mutation with upsert is that you should first check if the value exist or not. If you do that without upsert, both mutations would pass that check. So you require a combination of upsert and the check.

To support a feature that does this for you, would work in a similar fashion. So performance shouldn’t be affected. Checking time could be decreased by using indexes.

3 Likes

hi @harshil_goel Can live loader be used in upsert form, and if supported, can you help provide a json sample data, thank you very much!!

Sorry, the live loader cannot be used in the upsert form.

Hey @gumupaier,
This can be done by using -x flag while using dgraph live. With this you provide a path to a directory where Dgraph stores all the uids it has already used and takes care of duplicates even across different json files. In your case while creating the json file you can use your node_id as uid. For example two of your json files are as follows

file1

<_:node_id1> <node_id> "node_id1" .
<_:node_id1> <name> "company_name_1" .
<_:node_id2> <node_id> "node_id2" .
<_:node_id2> <name> "company_name_2" .

file2

<_:node_id1> <node_id> "node_id1" .
<_:node_id1> <name> "company_name_1" .
<_:node_id3> <node_id> "node_id3" .
<_:node_id3> <name> "company_name_3" .

when you load these two files one by one using dgraph live, there’ll be only one node with node_id1.

P.S. : You have to use -x flag every time you run dgraph live.

1 Like

thank you very much. This way looks very good, I test it, I will give you feedback later

1 Like

hi @Neeraj @MichelDiz
This way I encountered a new error and found the following error during execution

Error while processing data file "A100_unlisted_2.json.gz": During parsing chunk in processLoadFile: strconv.ParseInt: parsing "2e-05": invalid syntax

My data file format is as follows

It looks like the data in the file is going to be truncated when it gets too big

I hope you can help me check what is wrong.

This error is in your dataset. Some key has a wrong type of INT. The log says “2e-05” this feels like scientific notation, not a valid Integer. And Dgraph can’t convert scientific notation to Integer as far as I know.

What I recommend is, as the log does not say what the name of the key is. That you then use for example “Grep”, to identify the line and find out which predicate is this value. And then evaluate whether to change the value itself or change the type of the predicate to String. If you want to preserve the scientific notation.

Unfortunately, Dgraph is not yet able to ignore these small details. There is an issue about this being analyzed, but for now, you need to edit it in your dataset.

And as I said in the other post. Aborts are fine, if you wanna get rid of them, you might need more resources in your cluster or check your dataset.

2 Likes

hi @MichelDiz
It was a data error. I have located the problem. Thank you for your help

1 Like