Uid cannot be greater than

first

curl -H "Content-Type: application/json" "localhost:8080/mutate?commitNow=true" -XPOST --data-binary @organisation_0_0.json  | python -m json.tool | less

get error

"Uid: [18444291013291836287] cannot be greater than lease: [10000000000000020000]"

but

!!
why!!
my data like

{"dgraph.type": "company", "name": "Kam_Air", "url": "http:\\/\\/dbpedia.org\\/resource\\/Kam_Air", "uid": "0xfff748f45d30837f"}

why!!
what else can I do?

Your dataset has "uid": "0xfff748f45d30837f" this means that Dgraph should search for an existing entity with this UID and mutate there. But as this UID doesn’t exists, Dgraph will fail.

You have three options.

Option 1: Remove the UID from the dataset.

Option 2: Transform them into blank nodes adding in front the prefix _: e.g. "uid": "_:0xfff748f45d30837f" that way you will have a new uid.

Pay attention that blank nodes works at the transaction level. It won’t give you unique UIDs all the time you create a new transaction. You have to handle this yourself.

Option 3: Lease a huge amount of uids from /assign?what=uids&num= see the docs for more details https://dgraph.io/docs/deploy/dgraph-zero/ - If this doesn’t work in the first try. Keep creating more UIDs.

Blockquote

you mean I can’t set uid by myself ??
but same code in my Mac it worked, run in server is error …

Yep.

Your Mac and your Server are two different environments. So, in the server, for sure it doesn’t have the UID 0xfff748f45d30837f leased.

1 Like

uid 由zero 统一分配
可以根据uid 更新数据,但是uid得已经存在了.

2 Likes

@BlankRain确定不能自己设置吗??

@MichelDiz
for example, Data include:
people.json (node)
people_like_perople.json (edge)

question:

  1. due to data is big(>100000000), should I used bulk to loading?

  2. because people_like_perople.json contains the nodes in people.json,in order to avoid duplication,what should I do ?
    (if I load people.json first ,then load people_like_perople.json couldn’t use bulk;
    but if I load them together, then it will have duplication node;
    if I put them in one file, this file is so so so big I don’t want to do…)

  3. bulk way I don’t know how to do it, even if I read the document…

  4. Although I can import data correctly in HTTP mode, I want to do dgraph performance evaluation, so I need to import data in the fastest way so that the evaluation result is correct, right?
    Which of the following three methods is the fastest to import data?

curl -H "Content-Type: application/json" "localhost:8080/mutate?commitNow=true" -XPOST --data-binary @organisation_0_0.json  | python -m json.tool | less
    dgraph live -f organisation_0_0.json -s schema

    dgraph bulk -f organisation_0_0.json -s schema

What I recommend to avoid duplicates is.

  1. Start your cluster from scratch with as many datasets as possible using Bulkloader. And use xidmap feature to save the Blank Node(the RDF identifier) mapping on directory.
dgraph bulk -h | grep xidmap
      --xidmap string                  Directory to store xid to uid mapping
  1. Later on, you gonna use Liveloader to ingest a big amount of data that you can’t handle yourself at the client. But you need those references, and maybe create new ones (I’m talking about Blank nodes mapping). Liveloader also has the flag xidmap.

To reuse the same blank nodes once mapped on your first ingestion with bulkload, you have to start the live load always point to the same path/folder that you have started the xid mapping in the Bulkloader.

dgraph live -h | grep xidmap
      --xidmap string                  Directory to store xid to uid mapping

So, what do you solve with this approach? basically any duplication of entities. You don’t need to rely on UIDs or something else. Once the Blank node is mapped. If you always use the same blank nodes and the same xid mapping path. You are good to go.

The bulkloader is the fastest. But you can only use it once.

Liveloader is your best friend, it will deal better with the data avoiding early OOM in case of a machine with low resources.

Never, ever, use the HTTP API for big datasets. This will congest your cluster and possibly giving OOM on the medium run.

You could use the HTTP API, but with small batches, even if it continuous events.

@MichelDiz

  1. All my data is JSON, so I can use xidmap, right?
  2. In fact, I noticed the – xidmap command at the beginning, but what I understood was to save the uid mapping of the imported data. I don’t see how to use the saved mapping on the next import. Do you mean to save and use both are – xidmap command ?

when save:

dgraph live -f organisation_0_0.json -s schema --xidmap /myfolder/xidmap

when use:

dgraph live -f organisation_0_0.json -s schema | grep xidmap /myfolder/xidmap

right??

Yes.

You have to re-use the saved files.

First:

dgraph bulk -f myBaseDataset.json -s base.schema --xidmap ./myfolder/xidmap

And then (start to always use the xidmap)

dgraph live -f newdata1.json --xidmap ./myfolder/xidmap

You don’t need to give a schema if the data has the same structure from the previous. Only give the schema if the data has new entities or you wanna change the schema.

That’s really all.

1 Like

bulk 导入的时候 -f . 就可以了 把数据都放一块
可以先把json转rdf,相同的数据生成相同的标示就不会重复了

1 Like