How to speed up using java client to upsert massive data

purist180 · January 14, 2020, 3:19am

I want to write about 1 million piece of data using upsert
but it took about 10 minutes using upsert, how can I speed up this process?

Here’s the code I write in java client
I need to get the uid of the node by <ex_id> I define for each unique node
so I have write a upsert like this,
and I try to speed up the upsert by using a little batch, each request will get 500 piece of data

            if (predicate_class.equals("relation")) {
                String object_ex_id = oneTraid.getString("object");

                String insideQuery =


                        "       var(func: eq(ex_id, \"" + subject_ex_id + "\")){\n" +
                                "          subject" + i + " as uid\n" +
                                "   }\n" +
                                "       var(func :eq(ex_id, \"" + object_ex_id + "\")){\n" +
                                "           object" + i + " as uid\n" +
                                "   }\n";
                allQuery.append(insideQuery);


                String muString =
                        "           uid(subject" + i + ") <" + predicate + "> uid(object" + i + ") " +
                                "(start_time=" + start_time + ", end_time=" + end_time + ") .\n" +
                                "           uid(subject" + i + ") <ex_id> \"" + subject_ex_id + "\" .\n" +
                                "           uid(object" + i + ") <ex_id> \"" + object_ex_id + "\" .\n";

                allMutate.append(muString);




            } else if (predicate_class.equals("property")) {
                String object = oneTraid.getString("object");

                String insideQuery =

                        "       var(func: eq(ex_id, \"" + subject_ex_id + "\")){\n" +
                                "          subject" + i + " as uid\n" +
                                "   }\n";
                allQuery.append(insideQuery);

                String muString =
                        "           uid(subject" + i + ") <" + predicate + "> \"" + object + "\" " +
                                "(start_time=" + start_time + ", end_time=" + end_time + ") .\n" +
                                "           uid(subject" + i + ") <ex_id> \"" + subject_ex_id + "\" .\n";

                allMutate.append(muString);
            }



            int batchSize = 500;
            if (i % batchSize == 0) {
                String batchQuery =
                        "   query{\n" +
                                allQuery +
                                "}\n";

                Mutation batchMu = Mutation.newBuilder()
                        .setSetNquads(ByteString.copyFromUtf8(allMutate.toString()))
                        .build();

                Transaction txn = dgraphClient.newTransaction();

                try {




                    Request request = Request.newBuilder().setQuery(batchQuery).addMutations(batchMu).setCommitNow(true).build();


                    txn.doRequest(request);

                    txn.close();



                    System.out.println("batch: " + (i / batchSize + 1) + " of " + totalSize / batchSize + " upsert success!");

                } catch (Exception e) {

                    JSONObject result = new JSONObject();
                    result.put("status", "fail");

                    System.out.println("batch: " + (i / batchSize + 1) + " of " + totalSize / batchSize + " upsert FAIL!");
                    e.printStackTrace();

                    return result;
                }

                allQuery = new StringBuilder();
                allMutate = new StringBuilder();

            }



        }

        JSONObject result = new JSONObject();
        result.put("status", "success");
        return result;
    }
}

purist180 · January 14, 2020, 3:29am

Will Asynchronous Client slove this problem?

amanmangal · January 15, 2020, 5:53am

You could send request to Alphas in parallel, that should further improve throughput. Though, it can sometimes slow down if there are lot of conflicts, would depend on the schema.

purist180 · January 16, 2020, 1:35am

dgraph live -x

will dgraph live with xid do the same as I do using <ex_id> i define in the mutation?
I am not sure how xid works in dgraph live, there is relatively few instructions in the documentation about how to use dgraph live with xid.
https://docs.dgraph.io/deploy/#fast-data-loading

Note Dgraph Live Loader can optionally write the xid->uid mapping to a directory specified using the -x flag, which can reused given that live loader completed successfully in the previous run.

this wold create a xid file, but if I add some new date without using dgraph live, how can I change the xid file?

amanmangal · January 16, 2020, 2:44am

xid folder is only used by live loader when provided a path to it. You will have to manually update it if you make changes to Dgraph without using live loader. You could instead just use upsert block, query the UID from Dgraph itself and then perform a mutation using the received UID.

purist180 · January 16, 2020, 4:42am

You will have to manually update it if you make changes to Dgraph without using live loader.

This is xid file, how should I update this file if I upsert some new data？

amanmangal · January 16, 2020, 8:05am

This is not recommended. But if you really want to deal with it yourself, we use badger (GitHub - dgraph-io/badger: Fast key-value DB in Go.) to maintain the mapping from xid → uid. You can import the xidmap code from dgraph repository (https://github.com/dgraph-io/dgraph/blob/master/xidmap) which internally uses badger and write to this directory. You could look at our bulk loader code to figure out how we do it dgraph/mapper.go at master · dgraph-io/dgraph · GitHub.

Topic		Replies	Views
How to update a large amount of data in dgraph every day Dgraph mutation	23	3670	August 10, 2020
Batch upserts in dgo Dgraph kind:question , dgo , dgraph	3	566	March 15, 2021
Managing large upserts Users	1	444	August 24, 2018
Very slow and abborted Upserts Dgraph	3	304	April 20, 2021
Bulk Upsert in Live Loader Dev rfc	4	989	July 22, 2020

How to speed up using java client to upsert massive data

Related topics