Use bulk loader and where are data's uid?

shanghai-Jerry · June 28, 2018, 3:25pm

i used bulk loader to load data which like
<02030fjfdbsh-dskjds> “jerry” .
…

and when i finished , i got a output dir, where can i get this data’s uid?

do you know more details about bulk loader?
how about the implementaion?

shanghai-Jerry · June 30, 2018, 6:01am

and rdf file’s data like: <_:uid35b3f12> “888002580”^^xs:int .


when i used bulk loader to get p directory in outdir, then restart server used this p directory.
i want to know is this data’s uid be the same in rdf file’s like: 0x35b3f12?
is there more introduction about bulk loader?

BlankRain · July 4, 2018, 1:13am

I also want to know…

MichelDiz · July 4, 2018, 3:58am

Hey,

There is a syntax error over there, should be like:

{
set{
  <_:uid35b3f12> <test> "888002580"^^<xs:int> .
}
}

not <_:uid35b3f12> “888002580”^^xs:int .

Read more about nQuads dataset: RDF 1.1 N-Quads

There is no way to discover something without predicated, theoretically.
What are the intentions in this use? where did you get a reference where that was plausible?

here: Loading close to 1M edges/sec into Dgraph - Dgraph Blog

shanghai-Jerry · July 4, 2018, 7:18am

if i used this, would dgraph use 0x35b3f12 to store this node’s uid or not?

i used to generate data like this, but when i used bulk loader to insert these to dgraph, the real uid of the node is not same as i excepted in rdf file. like we excepted this node’s uid to be 0x35b3f12, but it something else

MichelDiz · July 4, 2018, 2:41pm

No, Dgraph has no allocation of UIDs. In practice <_: uid35b3f12> is a badly written blank node.

The current rule is that you do not use hand-created UIDs. Only works with UIDs generated by Dgraph. But you could do (but could give an error if the UID was not generated by Dgraph):

<0x35b3f12> <test> “888002580”^^<xs:int> .

In this hypothesis the Dgraph will write in a Node that already exists. If not, can return a error.

shanghai-Jerry · July 5, 2018, 9:30am

yes, i also think so. but i use command $ curl localhost:8080/admin/export export database, and get exported rdf file, the content inside is like <_:uid35b3f12> <test> “888002580”^^<xs:int> .

have you tried about export database and import with exported rdf file and schema?

MichelDiz · July 5, 2018, 3:23pm

This is placed on the export of RDF as a logical pattern to be followed. It does not mean that when you re-import this RDF it will come with the same UID. It’s just a naming standard. But if you import that same RDF into a clean instance, the UIDs used before will certainly be retained. Just do not add anything before import.

Dgraph is able to read <_: uid35b3f12> as a valid Blank-node, but the correct one is _:uid35b3f12 according to the RDF standard by W3. But there is no problem with that.

shanghai-Jerry · July 6, 2018, 2:03am

Would you please tell me more about this uids used before and retained uids?

More about the bulk loader process will be greater for me.

Much thanks

MichelDiz · July 6, 2018, 3:17am

Well the blog post that I mentioned above is the only one I can give to you. If you wanna more, try to reach out this info on the code dgraph/dgraph/cmd/bulk/run.go at master · dgraph-io/dgraph · GitHub

There are comments and logics that you can evaluate in the root.

MichelDiz · July 6, 2018, 5:07am

That was a comment, I’m not sure if it occurs with RDF import from a RDF exported (from Dgraph). But I know if you create a large (giant indeed) object in JSON with all nodes in order. Every time you mutate this JSON, the Nodes will contain the same UIDs. Because Dgraph generates the UIDs in an order.

I have already created JSONs of more than 7 thousand nodes. And every time I mutate it to a new (clean) instance. These 7,000 nodes had the same UIDs. So I suppose if the RDF import is in order the Dgraph will keep the same UIDs. If it is a random import I believe my theory does not apply to.

shanghai-Jerry · July 9, 2018, 10:51am

i saw this code:

func lexUidNode(l *lex.Lexer, styp lex.ItemType, sfn lex.StateFn) lex.StateFn {
	l.AcceptUntil(isSpace)
	r := l.Peek()
	if r == lex.EOF {
		return l.Errorf("Unexpected end of uid subject")
	}

	in := l.Input[l.Start:l.Pos]
	if _, err := strconv.ParseUint(in[:], 0, 64); err != nil {
		return l.Errorf("Unable to convert '%v' to UID", in[:])
	}

	if isSpace(r) {
		l.Emit(styp)
		return sfn
	}

	return l.Errorf("Invalid character '%c' found for UID node itemType: %v", r,
		styp)
}

it seems that it supports load data with existed uid in rdf file.
but i don’t know why it doesn’t work.
https://github.com/dgraph-io/dgraph/blob/master/rdf/state.go

MichelDiz · July 9, 2018, 3:48pm

That seems partial support, I remember Pawan was working on it. But I do not know where he went. If this piece of code allows the insertion of Bulk already with declared UID. This is passive support up to the UID allocation limits that exist in the instance. @mrjn spoke in another comment (I believe it was in Github) that he could create a sort of UID allocation, but that would induce the instance to error. It would not be safe.

For me, rather than keeping the UID fixed to each export, it would create a way of handling the data itself. Things like user indexing (via email or something else). The UID would be more relevant to the “machine” itself and not to the end user. Export maintains all the characteristics of its business logic.

shanghai-Jerry · July 10, 2018, 1:38am

for me, i just want to query with uid, cause i think it’s faster than with indexing. so i want to connect some external
ids with uid, than i could find uid by this external id, it finished in another service.
that’s why i need to know exactly pairs between external id and uid, once it changes, it will be a total mistake.

MichelDiz · July 10, 2018, 2:12am

In that case just waiting the back-up support or the support of what Pawan was working on (I talked to him about it a while ago, but I think it was related to the Back-up). At the moment there are many things still to deal with. Most features are long-term.

Topic		Replies	Views
Please help, the predefined UID got replaced Users	5	355	October 2, 2019
How to specify uid in rdf files when using dgraph bulk load? Dgraph	5	1094	January 12, 2020
Dgraph bulk and dgraph live Dgraph	2	797	January 3, 2019
Preserve UIDs in bulk loader Users	5	653	June 27, 2019
Bulk loader same blank nodes from different rdf files Users	4	614	July 21, 2020

Use bulk loader and where are data's uid?

Related topics