How about support more compact data format rather than rdf and json?

BlankRain · December 11, 2019, 4:43am

I have some data in RDMS, and want to exports to dgraph.
I need to create rdfs first ,then load it via live or bulk command .
The source data I exports from RDMS, may just be 10M, the RDFs I created could be 100M.
Here is an expamle:
Person table in CSV:

id,name,age
1,Jack,12
2,Tom,22
3,Bob,32
4,Alice,33
5,Brain,44

and the rdfs could be:

_:jack <id> "1" .
_:jack <name> "Jack" .
_:jack <age> "12" .

_:tom <id> "2" .
_:tom <name> "Tom" .
_:tom <age> "22" .

_:bob <id> "3" .
_:bob <name> "Bob" .
_:bob <age> "32" .

_:xid5 <id> "4" .
_:xid5 <name> "Alice" .
_:xid5 <age> "33" .

The RDF need more bytes than source csv.
How about support a new format based on csv?
Here is my idea.

Person: id,name,age  # this part is header for nodes
jack,1,Jack,12       # this part is body for data
tom,2,Tom,22
bob,3,Bob,32
xid5,4,Alice,33
uid_0x2c,5,Brain,44   # leave a newline for end of body

friend: id,Person,Person  # this part is header for edge ,similar with header of nodes
a,jack,tom   # this part is body for data
b,tom,bob
c,bob,xid5
d,xid5,uid_0x2c   # leave a newline for end of body

The rdfs below carry the same information.


# new node
_:jack <id> "1" .
_:jack <name> "Jack" .
_:jack <age> "12" .
# new node
_:tom <id> "2" .
_:tom <name> "Tom" .
_:tom <age> "22" .

# new node
_:bob <id> "3" .
_:bob <name> "Bob" .
_:bob <age> "32" .

# new node
_:xid5 <id> "4" .
_:xid5 <name> "Alice" .
_:xid5 <age> "33" .

# update by uid
<0x2c> <id> "4" .
<0x2c> <name> "Alice" .
<0x2c> <age> "33" .

# friend
_:jack <friend> _:tom (id=a) .
_:jack <friend> _:tom  (id=b).
_:tom <friend> _:bob  (id=c).
_:bob <friend> _:xid5 (id=d) .
_:xid5 <friend> <0x2c> (id=e).

it cost more bytes and not human-readable.
So,how about support a more compact and readable data format in dgraph?

MichelDiz · December 11, 2019, 3:42pm

I don’t know. Adding more formats give more code to maintain. Dgraph’s codebase is already huge. In general, we tend to wait for the community create such tools (If the community create it and it is really good we will recommend). JSON support came as it is virtually universal use. But things like CSV and etc require keeping a code base beyond our scope. Also, CSV isn’t graph friendly. It is simple for tables, not graphs.

Topic		Replies	Views
[Feature request] Support data import from CSV file Dgraph dgraph , status:accepted , kind:feature , area:import-export	4	1149	August 24, 2022
How to generate rdf.gz file Dgraph	1	670	September 5, 2019
Dgraph data backup Dgraph	1	326	April 26, 2020
CSV to RDF N-QUAD conversion for DGraph Users	7	838	May 9, 2019
CSV importer or CSV converter to RDF - How to? Users	3	2163	April 16, 2018

How about support more compact data format rather than rdf and json?

Related topics