Loading Wikidata into dgraph


#1

Has anybody tried loading the Wikidata RDF dumps into dgraph?

The files are free to download here

The format is an RDF compatible one called Turtle / TTL

However when I attempt to do dgraphloader, I hit the following error:

root@261358fbb2df:/dgraph# dgraphloader -r wikidata-20170508-all-BETA.ttl.gz

Dgraph version   : v0.7.6
Commit SHA-1     : 5f7eb75
Commit timestamp : 2017-05-01 14:19:52 +1000
Branch           : release/v0.7.6


Processing wikidata-20170508-all-BETA.ttl.gz
2017/05/10 19:27:46 main.go:135: Error while parsing RDF: Invalid input: @ at lexText, on line:1 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

For those curious, the first lines of the ttl file look like this:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology-beta#> .
@prefix wdata: <https://www.wikidata.org/wiki/Special:EntityData/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wdref: <http://www.wikidata.org/reference/> .
@prefix wdv: <http://www.wikidata.org/value/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix p: <http://www.wikidata.org/prop/> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix psn: <http://www.wikidata.org/prop/statement/value-normalized/> .
@prefix pq: <http://www.wikidata.org/prop/qualifier/> .
@prefix pqv: <http://www.wikidata.org/prop/qualifier/value/> .
@prefix pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/> .
@prefix pr: <http://www.wikidata.org/prop/reference/> .
@prefix prv: <http://www.wikidata.org/prop/reference/value/> .
@prefix prn: <http://www.wikidata.org/prop/reference/value-normalized/> .
@prefix wdno: <http://www.wikidata.org/prop/novalue/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .

wikibase:Dump a schema:Dataset,
                owl:Ontology ;
        cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
        schema:softwareVersion "0.0.5" ;
        schema:dateModified "2017-05-08T23:00:01Z"^^xsd:dateTime ;
        owl:imports <http://wikiba.se/ontology-1.0.owl> .

wdata:Q22 a schema:Dataset ;
        schema:about wd:Q22 ;
        schema:version "480270117"^^xsd:integer ;
        schema:dateModified "2017-04-30T16:35:36Z"^^xsd:dateTime ;
        wikibase:sitelinks "223"^^xsd:integer ;
        wikibase:statements "89"^^xsd:integer ;
        wikibase:identifiers "22"^^xsd:integer .

wd:Q22 a wikibase:Item ;
        rdfs:label "Scotland"@en-gb ;
        skos:prefLabel "Scotland"@en-gb ;
        schema:name "Scotland"@en-gb ;
        rdfs:label "Scotland"@en ;
        skos:prefLabel "Scotland"@en ;
        schema:name "Scotland"@en ;
        rdfs:label "Écosse"@fr ;
        skos:prefLabel "Écosse"@fr ;
        schema:name "Écosse"@fr ;
        rdfs:label "Scozia"@it ;
        skos:prefLabel "Scozia"@it ;
        schema:name "Scozia"@it ;

(Manish R Jain) #2

Can you convert them to RDF format first before loading? There should be online converters from TTL to RDF.


#3

Only problem is that the unzipped ttl file is > 50 GB.

It would suck if I did the whole conversion only to find out it still won’t work.

Wondering if anybody had any success at this so far.


(Manish R Jain) #4

You should be able to find programs which can convert them while gzipped, and output as gzipped.


#5

I will attempt that,

in the meantime, it does appear that loading ttl is supported in dgraph generally, for example:

So I wonder what is going wrong with this one?


(Manish R Jain) #6

We don’t directly support ttl. We support RDF nquad format. They are similar but different.


(Justin Judd) #7

I have loaded a copy of Wikidata into my instance of dgraph. I looked at two different options. The first one is that there are two places you can get rdf files in the right nquad format.

  1. http://tools.wmflabs.org/wikidata-exports/rdf/exports.html
  2. https://dumps.wikimedia.org/wikidatawiki/entities/ (look for the files ending in .nt.gz or .nt.bz2)

The option I ended up taking was I downloaded the JSON output from the 2nd link above, parsing it, and then using the Go client library to ingest them.


#8

@zb1I have used the following link http://www.easyrdf.org/converter


#9

How were you able to parse the JSON output to rdf?


(system) #10

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.