Common Issues with Data Loading

  1. If while loading data , you get an error
    FATA[0221] While storing posting list error=IO error: /Users/username/dgraph/p0/002562.sst: Too many open files package=postingthen you can increase the max permissible files that are allowed to be open at a time which is a system property.
    For linux you could find instructions here

  2. Also for both the steps in bulk loading , you have an option to limit the number of cores and the memory the process uses.
    You could run ./uidassigner --help and ./loader --help to see all the optional flag available.
    You might want to modify the ram using --stw_ram_mb and the number of CPUā€™s using --numCPU depending on the ram and number of cores your machine has.

1 Like

I have some issues loading valid nqaud RDF ref: Error loading RDF Ā· Issue #78 Ā· dgraph-io/dgraph Ā· GitHub

Any guidance on why this is is throwing an error is welcome.

Hi @fils,

So, Iā€™m looking at the RDF data that youā€™re using, and see statements with semicolons.

<http://opencoredata.org/id/resource/iodp/cruise/v1/3> a glview:Cruise ;

Weā€™re doing our own RDF lexing and parsing, and only support this format, as per the the doc here:
https://www.w3.org/TR/n-quads/

statement	::=	subject predicate object graphLabel? '.'

Is it possible that youā€™re using some other format of RDF than the one we implemented support for?

Cheers,
Manish

The link I posted is to a TTL formatted version of RDFā€¦

You are using nquads which is fine. RDF is just a data model, nothing more. There are several encodings of that data model. I converted my turtle to nquads before doing my test using the popular rapper tool. I have not posted my nquad version though. I can if you like but it is valid nquads. Rapper is part of the mature Redland tools Raptor RDF Parser Toolkit - Raptor RDF parser utility

I think in the error that you received, it complained about not finding object.

FATA[0033] While handling rdf reader.                    error=No Object in NQuad package=uidassigner_main

which comes from here:
https://github.com/dgraph-io/dgraph/blob/master/rdf/parse.go#L167

That means, our RDF parser can find neither of the object id or value. It seems like this happens after a while of reading the data, so the culprit line is somewhere down there.

If you post the data file, I can have a look. Or, you could try to identify this particular issue by manually scanning your data. Itā€™s also possible that our parser has some bugs ā€“ But, weā€™ll need to identify the culprit RDF statement.

Iā€™ll post the nquads file in the morning somewhere where you can pull it from.

1 Like

Here is the file I was trying to load.

https://www.dropbox.com/s/vbw9it3o7ai17no/JRSO_cruises_gl.nq.gz?dl=0

Hey @fils,

Sorry for the delay. Friday got busy and the weekend is full of chores.

So, this is the culprit line:

<http://opencoredata.org/id/resource/iodp/cruise/v1/> <http://opencoredata.org/voc/janus/1/leg> "" .

The object is empty. This isnā€™t allowed by our parser. We could modify our logic to ignore and skip over such lines. Suggestions?

Cheers,
Manish

No worries on the delay. Obviously those are not triples I want. Itā€™s interesting to me that both rapper and OpenLinks Virtuoso took those triples. Itā€™s an edge case in the RDF spec it seems. Iā€™ll have to look up and see if such a triple is truly ā€œlegalā€.

It would be nice if dgraph spit out errors by referencing the offending triple and its location in the file. That way I would have found this fast. I will try and load up the graph now and see how it works.

Iā€™m curious to experiment with graphql for accessing RDF vs SPARQL queries.

Just an updateā€¦ I was able to load via uidassigner, merge and loader.

Iā€™ll have to try and start playing with graphql now. Is there a instance of the web based graphql query response interface served up via the docker instance? When I go the served URL I get a 404. Is there a fully qualified URL I should try?

http://192.168.99.100/query is there fine, didnā€™t know if there was a UI as wellā€¦

Hi Fils, there is no UI yet for this purpose, you have to do a curl request to the server to get the response.

curl IP:PORT/query -X POST -d ā€˜QUERYā€™

Also, looks like you are missing the port information in the URL you have specified.

sounds goodā€¦ I can do curlā€¦ I mapped the port to my local macā€™s port 80 (well, Kitematic did that for me to be honest)

Iā€™ve only started to get into Graphql. I find it more intuitive to go from graphql to JSON or JSON like schema definitions than to RDF. Itā€™s going to take me some time to wrap my head around this. Do you guys have any screencasts or the like that walk a person through exploring an RDF based graph with GraphQL?

Iā€™ll try and look at the Freebase RDF and your example GraphQL and see what I can learn from inspection there, but any documentation or hints are welcome.

Hey @fils,

Sorry, no screencasts yet. But, if you look at the test queries here:
http://discuss.dgraph.io/t/list-of-test-queries/22

In general, the predicate can be a field in GraphQL. Right now, youā€™ve to start off specifying an _xid_, which is the external id, i.e. user-specified id for entities. Once you get results, you can also specify _uid_, which is the internally generated 64-bit integer ids.

Iā€™ve created a Trello task to specify the offending triple when returning an error. I think you should be able to subscribe to the card to keep track of the progress. If not, let me know.