How to realize the same social account fusion of multiple batches of data through dgraph

In my demand, I get multiple batches of social records through multiple channels, how should I realize the aggregation of the same social accounts in multiple batches. What I know is upsert, but it is too slow in the case of large amounts of data.

@yeahvip Are you planning to use the live loader? If so, check out the -xidmap flag. This would work for initial and subsequent loads, but not if you’ve already got social account IDs in your graph.

One approach I’ve used in the past with success is to pull existing IDs from the graph in the batch loader code. When examining a record to add, check to see if the external ID already has a Dgraph ID. If so, associate the existing ID in the exported RDF or JSON, otherwise assign it a blank uid: _:<external id>

Hello, in the case you mentioned, we need to maintain the external id dictionary all the time. Our test shows that the working mechanism is to look up uid through the mapping between external id and uid when entering the database. If the number of incoming data is large, the dictionary query speed of new incoming data will become the bottleneck. In addition, if the same data is entered in two batches, the uid will be regenerated due to the existence of the blank node, and the external id needs to be queried again. Will future versions of dgraph support the form of custom Uids rather than automatically generating Uids?

Ah, that’s a different issue perhaps. Are you aware of the @id directive in Dgraph? https://dgraph.io/docs/graphql/schema/ids/#the-id-directive Maybe this is more to the point.

What you mentioned is the performance of graphql. Our system is basically based on the usage of dql. Is there any relevant scheme in dql?

It doesn’t have the @id directive, but upsert operations are supported: https://dgraph.io/docs/mutations/upsert-block/

the speed of upsert is unacceptable with millions of rdfs, and when upsert is used, dgraph live and dgraph bulk can’t be used.

Right, so I think your best option is to always use the -xidmap flag in bulk/live loading.

Can I ask why did you end up with this opinion? how did you test it? Assuming you have tested correctly to say unacceptable.

In my opinion it depends on the situation. Upsert runs concurrently. So it runs in its own go routine. Depending on how you build your upsert query. It will probably be faster than Liveloader (except Bulk) to do this job. Because it runs concurrently from within the DB.

Not precisely, Liveloader has a flag called “upsertPredicate” dgraph/run.go at f893f96f389218fe26bc638828d7fa57c61afec8 · dgraph-io/dgraph · GitHub and it creates upserts based on the XID value.

I haven’t done a comparison yet, but certainly mapping is faster. But if you use XIDs, you necessarily have to use Upsert. But if you don’t use it, I would recommend xidmap. Names can be confusing. xidmap(which maps BlankNodes to UID) is one thing and XIDs and External XIDs are another. They are not necessarily the same thing.
See for XIDs external-ids-upsert-block