I have some data in RDMS, and want to exports to dgraph.
I need to create rdfs first ,then load it via live
or bulk
command .
The source data I exports from RDMS, may just be 10M, the RDFs I created could be 100M.
Here is an expamle:
Person table in CSV:
id,name,age
1,Jack,12
2,Tom,22
3,Bob,32
4,Alice,33
5,Brain,44
and the rdfs could be:
_:jack <id> "1" .
_:jack <name> "Jack" .
_:jack <age> "12" .
_:tom <id> "2" .
_:tom <name> "Tom" .
_:tom <age> "22" .
_:bob <id> "3" .
_:bob <name> "Bob" .
_:bob <age> "32" .
_:xid5 <id> "4" .
_:xid5 <name> "Alice" .
_:xid5 <age> "33" .
The RDF need more bytes than source csv.
How about support a new format based on csv?
Here is my idea.
Person: id,name,age # this part is header for nodes
jack,1,Jack,12 # this part is body for data
tom,2,Tom,22
bob,3,Bob,32
xid5,4,Alice,33
uid_0x2c,5,Brain,44 # leave a newline for end of body
friend: id,Person,Person # this part is header for edge ,similar with header of nodes
a,jack,tom # this part is body for data
b,tom,bob
c,bob,xid5
d,xid5,uid_0x2c # leave a newline for end of body
The rdfs below carry the same information.
# new node
_:jack <id> "1" .
_:jack <name> "Jack" .
_:jack <age> "12" .
# new node
_:tom <id> "2" .
_:tom <name> "Tom" .
_:tom <age> "22" .
# new node
_:bob <id> "3" .
_:bob <name> "Bob" .
_:bob <age> "32" .
# new node
_:xid5 <id> "4" .
_:xid5 <name> "Alice" .
_:xid5 <age> "33" .
# update by uid
<0x2c> <id> "4" .
<0x2c> <name> "Alice" .
<0x2c> <age> "33" .
# friend
_:jack <friend> _:tom (id=a) .
_:jack <friend> _:tom (id=b).
_:tom <friend> _:bob (id=c).
_:bob <friend> _:xid5 (id=d) .
_:xid5 <friend> <0x2c> (id=e).
it cost more bytes and not human-readable.
So,how about support a more compact and readable data format in dgraph?