Hi everybody!
First of all, I like what you guys are doing in Dgraph! I’ve been following the progress for a year and I’m absolutely impressed.
I have a product that uses relational DB with a huge number of links inside and now I’m doing the research about graph databases whether they can help in some cases or not.
To do that I exported data sample and tried to bulk upload it to neo4j and dgraph.
About the data
I selected one type of entity:
type Individual {
id Int
name String
surname String
patronymic String
dt DateTime
}
- I did not use the type in the dgraph scheme.
And three types of relations. Each relation type has four properties (facets in terms of Dgraph).
I exported 150M of Individuals and 50M of each type of relation.
Schema in Dgraph was
<id>: int @index(int) .
<name>: string @index(trigram) .
<surname>: string @index(trigram) .
<patronymic>: string @index(trigram) .
<dt>: datetime .
<rel_1>: [uid] @reverse .
<rel_2>: [uid] .
<rel_3>: [uid] @reverse .
About the server
I used 16 Core and 64Gb RAM server with SSD disks.
Dgraph upload results
I used bulk loader as described here https://blog.dgraph.io/post/bulkloader/
RDF examples:
_:individual.{{.ID}} <id> "{{.Hid}}"^^<xs:int> .
_:individual.{{.ID}} <dt> "{{.Dt}}"^^<xs:dateTime> .
_:individual.{{.ID}} <name> "{{.Name}}"^^<xs:string> .
_:individual.{{.ID}} <surname> "{{.Surname}}"^^<xs:string> .
_:individual.{{.ID}} <patronymic> "{{.Patronymic}}"^^<xs:string> .
and for eash type of relation
_:individual.{{.ID1}} <rel_1> _:individual.{{.ID2}} (prop1="{{.Prop1}}", prop1={{.Prop2}}, prop1={{.Prop3}}) .
Bulk loader finished in 2,5 hours and uploaded 900M of edges. There were a lot of fails with OOM, but with some tuning of the bulk loader params the process was finished properly.
Neo4j upload results
I used neo4j-admin bulk import from CSV files. There were four different csv files: one for the nodes and three for the each type of relation.
Neo4j loaded this amount of data in 6 minutes! In neo4j case name, surname, patronymic and dt were not edges, like in Dgraph, but the properties of the node. So the actually counts was: 150M nodes, 150M relations, 1,2B properties.
Query perfomance results
I used an analytic query to test and compare performance.
Something like:
- find all nodes by the name and surname filter
- recurcive find all related (bidirectional) nodes with filter on edge properties (facets)
- return it
Dgraph result was 16 seconds (with indexes)
Neo4j result was 88 seconds (without indexes).
Neo4ji result was <1second after proper indexes was built.
Questions
I think that I did something really wrong or do not understand how the Dgraph works. I do not believe that it is real results.
Could help me explain these results or guide me to the right way?