Preserve UIDs in bulk loader


#1

I want to assign uid by myself like below using bulk loader.

<1000000000001290> <serviceId> "1" .
or 
<0x38d7ea4c6850a> <serviceId> "1" .

I noticed today that current release (v1.0.14 and also v1.0.15-rc9) of bulk loader don’t support it (always assign new uid by dgraph), but latest source seems to support preserve uids thanks to “new_uids” option related changes.

So, I want to know is which version will include the changes related to “new_uids” ? v1.0.x or v1.1 (or even later)?

I think current situation of assigning uid from outside dgraph is bit confusing. As I investigated, normal mutation support both decimal and hex format, live loader support only hex format, and bulk loader doesn’t support both decimal and hex format.

Thank you in advance.


(Michel Conrado) #2

Well “new_uids” has other proposal than that. Is exactly the opposite of what you want.

Issue 3011: ignore uids on import #3045
which means, “Ignore any assigned UID on dataset”.

However, bulkload will be able to insert UIDs either created manually or via export. This I believe will come in version 1.1 of Dgraph.

The best way to do that is ask Zero to create UIDs for you.

/assign?what=uids&num=100
This would allocate num uids and return a JSON map containing startId and endId , both inclusive. This id range can be safely assigned externally to new nodes during data ingestion.

https://docs.dgraph.io/deploy/#more-about-dgraph-zero


#3

Thank you for your quick response and sorry for my late reply.

Well “new_uids” has other proposal than that. Is exactly the opposite of what you want.

Yes I know. But talking about bulk loader, “new_uids” related changes actually make “preserve uids” possible. It’s bit weird situation. :slight_smile:

import until version 1.1 will be released.

I expected so because “new_uids” option will break current behavior of bulk loader. But if the change add “preserve_uids” option, it maybe safe to release with 1.0.x version. I’m very sorry :wink:

Issue 3011: ignore uids on import #3045
The best way to do that is ask Zero to create UIDs for you.

Hmm, I don’t want to maintain mappings of external rdb’s id and dgraph’s uid, and my rdb is using global id which is unique across entire database, so I want to use it as dgraph’s uid. And my data is so large(> 1G rows) that I must import it using bulk loader.

/assign?what=uids&num=100
This would allocate num uids and return a JSON map containing startId and endId , both inclusive. This id range can be safely assigned externally to new nodes during data ingestion.

I did it every time I installed new dgraph versions. But I believe there is no way to use assigned uid from current 1.0.x version of dgraph’s bulk loader.

So currently, I’m considering to build custom version of dgraph which contains back port of “new_uids” related changes and use it for initial data import.

By the way, is there any planned date for 1.1 release?

Thank you.


(Michel Conrado) #4

Dunno for sure, there’s a lot things to polish. But isn’t too far.


#5

I see. I’m looking forward to 1.1 release because “new_uids” will be included and also I’m very interested in the new type system.

Thank you for your kind replies and great product, Dgraph.


(system) closed #6

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.