Common Issues with Data Loading

  1. If while loading data , you get an error
    FATA[0221] While storing posting list error=IO error: /Users/username/dgraph/p0/002562.sst: Too many open files package=postingthen you can increase the max permissible files that are allowed to be open at a time which is a system property.
    For linux you could find instructions here

  2. Also for both the steps in bulk loading , you have an option to limit the number of cores and the memory the process uses.
    You could run ./uidassigner --help and ./loader --help to see all the optional flag available.
    You might want to modify the ram using --stw_ram_mb and the number of CPU’s using --numCPU depending on the ram and number of cores your machine has.

1 Like

I have some issues loading valid nqaud RDF ref: https://github.com/dgraph-io/dgraph/issues/78

Any guidance on why this is is throwing an error is welcome.

Hi @fils,

So, I’m looking at the RDF data that you’re using, and see statements with semicolons.

<http://opencoredata.org/id/resource/iodp/cruise/v1/3> a glview:Cruise ;

We’re doing our own RDF lexing and parsing, and only support this format, as per the the doc here:
https://www.w3.org/TR/n-quads/

statement	::=	subject predicate object graphLabel? '.'

Is it possible that you’re using some other format of RDF than the one we implemented support for?

Cheers,
Manish

The link I posted is to a TTL formatted version of RDF…

You are using nquads which is fine. RDF is just a data model, nothing more. There are several encodings of that data model. I converted my turtle to nquads before doing my test using the popular rapper tool. I have not posted my nquad version though. I can if you like but it is valid nquads. Rapper is part of the mature Redland tools http://librdf.org/raptor/rapper.html

I think in the error that you received, it complained about not finding object.

FATA[0033] While handling rdf reader.                    error=No Object in NQuad package=uidassigner_main

which comes from here:

That means, our RDF parser can find neither of the object id or value. It seems like this happens after a while of reading the data, so the culprit line is somewhere down there.

If you post the data file, I can have a look. Or, you could try to identify this particular issue by manually scanning your data. It’s also possible that our parser has some bugs – But, we’ll need to identify the culprit RDF statement.

I’ll post the nquads file in the morning somewhere where you can pull it from.

1 Like

Here is the file I was trying to load.

Hey @fils,

Sorry for the delay. Friday got busy and the weekend is full of chores.

So, this is the culprit line:

<http://opencoredata.org/id/resource/iodp/cruise/v1/> <http://opencoredata.org/voc/janus/1/leg> "" .

The object is empty. This isn’t allowed by our parser. We could modify our logic to ignore and skip over such lines. Suggestions?

Cheers,
Manish

No worries on the delay. Obviously those are not triples I want. It’s interesting to me that both rapper and OpenLinks Virtuoso took those triples. It’s an edge case in the RDF spec it seems. I’ll have to look up and see if such a triple is truly “legal”.

It would be nice if dgraph spit out errors by referencing the offending triple and its location in the file. That way I would have found this fast. I will try and load up the graph now and see how it works.

I’m curious to experiment with graphql for accessing RDF vs SPARQL queries.

Just an update… I was able to load via uidassigner, merge and loader.

I’ll have to try and start playing with graphql now. Is there a instance of the web based graphql query response interface served up via the docker instance? When I go the served URL I get a 404. Is there a fully qualified URL I should try?

http://192.168.99.100/query is there fine, didn’t know if there was a UI as well…

Hi Fils, there is no UI yet for this purpose, you have to do a curl request to the server to get the response.

curl IP:PORT/query -X POST -d ‘QUERY’

Also, looks like you are missing the port information in the URL you have specified.

sounds good… I can do curl… I mapped the port to my local mac’s port 80 (well, Kitematic did that for me to be honest)

I’ve only started to get into Graphql. I find it more intuitive to go from graphql to JSON or JSON like schema definitions than to RDF. It’s going to take me some time to wrap my head around this. Do you guys have any screencasts or the like that walk a person through exploring an RDF based graph with GraphQL?

I’ll try and look at the Freebase RDF and your example GraphQL and see what I can learn from inspection there, but any documentation or hints are welcome.

Hey @fils,

Sorry, no screencasts yet. But, if you look at the test queries here:
https://discuss.dgraph.io/t/list-of-test-queries/22

In general, the predicate can be a field in GraphQL. Right now, you’ve to start off specifying an _xid_, which is the external id, i.e. user-specified id for entities. Once you get results, you can also specify _uid_, which is the internally generated 64-bit integer ids.

I’ve created a Trello task to specify the offending triple when returning an error. I think you should be able to subscribe to the card to keep track of the progress. If not, let me know.