How to speed up using java client to upsert massive data

I want to write about 1 million piece of data using upsert
but it took about 10 minutes using upsert, how can I speed up this process?

Here’s the code I write in java client
I need to get the uid of the node by <ex_id> I define for each unique node
so I have write a upsert like this,
and I try to speed up the upsert by using a little batch, each request will get 500 piece of data

            if (predicate_class.equals("relation")) {
                String object_ex_id = oneTraid.getString("object");

                String insideQuery =


                        "       var(func: eq(ex_id, \"" + subject_ex_id + "\")){\n" +
                                "          subject" + i + " as uid\n" +
                                "   }\n" +
                                "       var(func :eq(ex_id, \"" + object_ex_id + "\")){\n" +
                                "           object" + i + " as uid\n" +
                                "   }\n";
                allQuery.append(insideQuery);


                String muString =
                        "           uid(subject" + i + ") <" + predicate + "> uid(object" + i + ") " +
                                "(start_time=" + start_time + ", end_time=" + end_time + ") .\n" +
                                "           uid(subject" + i + ") <ex_id> \"" + subject_ex_id + "\" .\n" +
                                "           uid(object" + i + ") <ex_id> \"" + object_ex_id + "\" .\n";

                allMutate.append(muString);




            } else if (predicate_class.equals("property")) {
                String object = oneTraid.getString("object");

                String insideQuery =

                        "       var(func: eq(ex_id, \"" + subject_ex_id + "\")){\n" +
                                "          subject" + i + " as uid\n" +
                                "   }\n";
                allQuery.append(insideQuery);

                String muString =
                        "           uid(subject" + i + ") <" + predicate + "> \"" + object + "\" " +
                                "(start_time=" + start_time + ", end_time=" + end_time + ") .\n" +
                                "           uid(subject" + i + ") <ex_id> \"" + subject_ex_id + "\" .\n";

                allMutate.append(muString);
            }



            int batchSize = 500;
            if (i % batchSize == 0) {
                String batchQuery =
                        "   query{\n" +
                                allQuery +
                                "}\n";

                Mutation batchMu = Mutation.newBuilder()
                        .setSetNquads(ByteString.copyFromUtf8(allMutate.toString()))
                        .build();

                Transaction txn = dgraphClient.newTransaction();

                try {




                    Request request = Request.newBuilder().setQuery(batchQuery).addMutations(batchMu).setCommitNow(true).build();


                    txn.doRequest(request);

                    txn.close();



                    System.out.println("batch: " + (i / batchSize + 1) + " of " + totalSize / batchSize + " upsert success!");

                } catch (Exception e) {

                    JSONObject result = new JSONObject();
                    result.put("status", "fail");

                    System.out.println("batch: " + (i / batchSize + 1) + " of " + totalSize / batchSize + " upsert FAIL!");
                    e.printStackTrace();

                    return result;
                }

                allQuery = new StringBuilder();
                allMutate = new StringBuilder();

            }



        }

        JSONObject result = new JSONObject();
        result.put("status", "success");
        return result;
    }
}

Will Asynchronous Client slove this problem?

You could send request to Alphas in parallel, that should further improve throughput. Though, it can sometimes slow down if there are lot of conflicts, would depend on the schema.

dgraph live -x

will dgraph live with xid do the same as I do using <ex_id> i define in the mutation?
I am not sure how xid works in dgraph live, there is relatively few instructions in the documentation about how to use dgraph live with xid.
https://docs.dgraph.io/deploy/#fast-data-loading

Note Dgraph Live Loader can optionally write the xid->uid mapping to a directory specified using the -x flag, which can reused given that live loader completed successfully in the previous run.

this wold create a xid file, but if I add some new date without using dgraph live, how can I change the xid file?

xid folder is only used by live loader when provided a path to it. You will have to manually update it if you make changes to Dgraph without using live loader. You could instead just use upsert block, query the UID from Dgraph itself and then perform a mutation using the received UID.

You will have to manually update it if you make changes to Dgraph without using live loader.

This is xid file, how should I update this file if I upsert some new data?

This is not recommended. But if you really want to deal with it yourself, we use badger (https://github.com/dgraph-io/badger) to maintain the mapping from xid -> uid. You can import the xidmap code from dgraph repository (https://github.com/dgraph-io/dgraph/blob/master/xidmap) which internally uses badger and write to this directory. You could look at our bulk loader code to figure out how we do it https://github.com/dgraph-io/dgraph/blob/master/dgraph/cmd/bulk/mapper.go#L243.