How to set UIDs of for all nodes in Dataset?

amipandatdoc · April 22, 2019, 12:51pm

Hello,
I have dataset with three columns which doesn’t show any edge while running a query. So I found that there is no uid set for that in schema and I have added it.
But what if I want to set uids for each subject and object in my dataset of .rdf format as I want to load it via live loader.

pcharbonneau · April 22, 2019, 1:15pm

You might need to pre-process that dataset and include those UIDs in it for the edges before processing it with Live Loader.
You can reserve a range of UIDs using the alter endpoint and then use those to create the edges in the processed dataset.

You will need to set the node UID and create the edge with that same UID.

You can just increase the UIDs sequentially for those nodes in your RDF file as you process each node.

I am assuming the nodes do not have an UID already.

Sorry if I misunderstodd your question

amipandatdoc · April 22, 2019, 1:22pm

I am assuming the nodes do not have an UID already.
[/quote]

No not having an UID.
So after loading it by live loader I have to change range of uid…Right?

pcharbonneau · April 22, 2019, 1:49pm

Just to make sure I understand you want to load a set of nodes and then add edges related to those nodes?
In order for a node to have edge, you need to create that edge with the UID of the node.

What I usually do is reserve a range of UID and sequentially pre-process my data outside Dgraph.

You can then sequentially increase the UID in your pre-processing script and then generate a new RDF dataset from your data.

Just an example: Lets say you have a bunch of cities and their populations and lets assume your range of reserved UIDs starts at 100

You could create each city node like so:

<0x100> <name_of_city> (your node)

<0x100> <10000>

and increase your city node UID by one, then rinse and repeat.

If you already have the city nodes created, you would have to query for their UIDs and process the result and add your new edges using those UIDs,

Hope this makes sense.

amipandatdoc · April 22, 2019, 1:56pm

yes

How? I mean I have 98k triples and for each subject and object how can I assign UID.

pcharbonneau · April 22, 2019, 2:20pm

The only way to assign a UID, as far as I know, is to include it into your RDF dataset before loading it.
I assume in your RDF they are using the <_:identifier> which means that they are letting Dgraph decide the UID for that subject (the _).

If you want to set that UID yourself, you would need to process that file and create a new one from it where you would replace each <_:identifier> with the sequentially incremented UIDs

<0x100>

<0x101>

…

<0X17ED0>

There is no easy way to add a new edge to a specific node unless you know its UID unfortunately.

Maybe if you showed us a few lines of your RDF I could help better.

Hope this helps.

pcharbonneau · April 22, 2019, 2:32pm

The only way would be with some data manipulation, some scripting/programming that would transform your original data unfortunately.
It depends on your data so there is no universal way I can think of. Maybe I misunderstood what you were trying to accomplish.

Are you creating that initial RDF with the 98k triples yourself or is it some dataset that was provided to you?

AugustHell · April 22, 2019, 8:18pm

UIDs are assigned by dgraph on import. Its not wise to set them, as they follow an algo, I think for sharding purposes.
What you can do, is using aliases, so called Blank Nodes, like described in the documentation:
https://docs.dgraph.io/mutations/#blank-nodes-and-uid
Thats what the underscore pcharbonneau thought about.
So when inserting a new data item and you you need and uid for an edge with the same import, you can use Blank Nodes as uids. The will be replaced by real uids when imported.
It doesn’t matter, what kind your self made uids are, they could be straight numeric, like
<_:1> <_:2> ...
or named ones <_:alpha> <_:beta>
take anything you got in your record that is unique for it.
As uids may changed when exporting and importing data (eg setting up another db with the same data), I prefer using my own uniqueids. So each of my records got one saved along with the rest of the data.
Example:
Import:

<_:14738201892> <recordid> 14738201892 .
<_:14738201892> <name> “Smith” .
<_:14738201892> <place> “Houston” .

of a saved record:
<0x6bc818dc89e78754> <recordid> 14738201892 .
<0x6bc818dc89e78754> <name> “Smith” .
<0x6bc818dc89e78754> <place> “Houston” .

amipandatdoc · April 23, 2019, 5:34am

Provided online. It is Linked Sensor Data

Sure! [ I am uploading subset of that large dataset]

This is what I have been providedsubset_original.csv (6.4 KB)

Then I have manually made it in RDF format alpha1.rdf (4.2 KB)
subset_rdf.csv (6.5 KB)

I have tried both these combination, but nothing work for edge. I mean for both of these cases edge is not creating.

AugustHell · April 23, 2019, 9:06am

You might try:
<_:Observation_WindSpeed_4UT01_2004_8_9_16_25_00>
for the uids in your .rdf. That should work. Sorry, cant test it atm.

pcharbonneau · April 23, 2019, 9:18am

Good news ! In your case, you will not need to assign the UIDs yourself

Basically on a RDF row like this:

SUBJECT PREDICATE OBJECT

_:Observation_WindSpeed_4UT01_2004_8_9_16_25_00 _:weatherowlWindSpeedObservation .

What you are telling Dgraph is assign a UID yourself to the node (the underscore), but while processing this document, refer to it as Observation_WindSpeed_4UT01_2004_8_9_16_25_00.

So this reference is how you can assign edge to that same node later on.

So in your case you could simply do:

_:Observation_WindSpeed_4UT01_2004_8_9_16_25_00 “weatherowlWindSpeed” .

_:Observation_WindSpeed_4UT01_2004_8_9_16_25_00 “System4UT01” .

_:Observation_WindSpeed_4UT01_2004_8_9_16_25_00 “MeasureDataWindSpeed4UT01200489162500” .

_:Observation_WindSpeed_4UT01_2004_8_9_16_25_00 “Instant200489162500” .

Which would create edges between the node with the UID assigned to Observation_WindSpeed_4UT01_2004_8_9_16_25_00 that would look like this

IsensorobservationowlobservedProperty with value " weatherowlWindSpeed"

Isensorobservationowlprocedure with value " System4UT01"

Isensorobservationowlresult with value “MeasureDataWindSpeed4UT01200489162500”

IsensorobservationowlsamplingTime with value " Instant200489162500"

Then the next rows like

_:Observation_WindSpeed_4UT01_2004_8_9_10_20_00
“weatherowlWindSpeedObservation” .

_:Observation_WindSpeed_4UT01_2004_8_9_10_20_00 “MeasureDataWindSpeed4UT01200489102000” .

Would get assigned another UID by Dgraph that could be referenced to create edges as above.

Hope this helps.

pcharbonneau · April 23, 2019, 9:35am

Just another tip that might be useful from what I could see of your data. Since they seem to represent different type of observations and you might want to query for specific nodes only, you can assign a kind of “type” to each as noted in the documentation.

You can basically create an edge for the node that would represents its type and assign it an empty string value. You could then add an index for that edge in your schema and be able to query those specific nodes.

You could be to create a edge for each node of Dew Observation and another edge for each node of Wind Observation.

Have fun with your data

Topic		Replies	Views
Fast Data Loading Dgraph	3	570	September 12, 2018
Use bulk loader and where are data's uid? Dgraph	14	1230	July 10, 2018
How to specify uid in rdf files when using dgraph bulk load? Dgraph	5	1152	January 12, 2020
How do I specify UID？ GraphQL kind:question	10	2033	January 4, 2025
Please help, the predefined UID got replaced Users	4	383	September 2, 2019

How to set UIDs of for all nodes in Dataset?

Related topics