Error while parsing RDF


(Aditya) #1

Hi,

I’m a relatively new user of DGraph and I’m trying to load the Freebase RDF triples onto DGraph for querying. I am using the standard freebase-rdf-latest.gz file provided by the Google. When I run the dgraphloader command however, I face the following problem:

2017/10/02 15:52:45 Error while parsing RDF: parsing time "-0410" as "2006": cannot parse "-0410" as "2006", on line:723854 <http://rdf.freebase.com/ns/g.120vz05y> <http://rdf.freebase.com/ns/people.deceased_person.date_of_death> "-0410"^^<http://www.w3.org/2001/XMLSchema#gYear>"

I realize that there is some sort of formatting problem in the .gz file but is there a way to ignore such triples and load all the others without exiting when an error occurs?

Also, what can I as a user do to make sure that such errors don’t occur. I can edit the triple file but that seemed like a bad way to go about it. (Since the rdf file is a standard one and all)

Thanks,
Aditya


(Pawan Rawal) #2

Hey @AdityaAS

The dgraphloader batches mutations (1000 mutations in each batch), so it would be hard to skip only the faulty lines in case an error is returned as we don’t have information of which mutation from the batch errored.

This particular value looks invalid. We use time.Parse using different layouts to try and parse the time string but this RDF value doesn’t match any of those. You’d have to manually modify it.


(system) #3

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.