Support for Spatial 2D, 3D geometries in Dgraph

@mrjn : I have the following questions to make a decision

  1. I wanted to check if Dgraph has extensive support for the geospatial data storage and retrieval in it. Does it support 2d and 3d geometries and also the coordinate reference systems (CRS) as supported in neo4j like Spatial values - Neo4j Cypher Manual

  2. If available, how does it differ on the performance with neo4j on the spatial data storage and retrieval.

  3. Does dgraph have support for spatial indexing using s2 geometry ?

Do let us know.

  1. Geographic information (lat, long) is supported, and specifically I see index handling for MultiPolygon, Polygon, Point, types. geo tutorial
  2. dgraph is a very different architecture than neo4j, so it is probably hard to compare apples to apples here. But the data is indexed with a geo index, so that will be as fast as many of the other indices available.
  3. An s2 index implementation is what is used to index the geometry data. (code)

Thanks for the quick response.
@iluminae @mrjn
Question 1:
We have a requirement to store a connected historical data (immutable) graph.
Which model do you suggest for a better balance in query and ingest performance. ?

  1. Can the historical attribute values be stored as nodes with historical dates as the edges/relationships.
  2. Any other model you recommend ??

Question 2:
What is the sizing config on Dgraph cloud to handle the data with about 3 Billion nodes and 3.5 Billion relations.

Total Nodes: 3.2 Billion (Each node has about ~3-4 attributes ,One special node type has about 200 attributes, one of the attribute would be of type ‘geo’)
Relationships: 3-3.5 Billion edges (1-2 attributes)

Please let us know of your thoughts.

I wont know how to design your schema really but let me give you some advice on how dgraph internally manages storage - which may help you in drawing conclusions with respect to ingestion and performance:

  1. Dgraph stores data by tablet, which is synonymous with a predicate. (an ‘attribute’ key as you have written above - sometimes getting the jargon to all match up is half the battle)
  2. Therefore, dgraph does not store anything per ‘node’ or ‘edge’. A node is just a unique ID some triples share as a subject.
  3. So, if you have 1M predicates ‘on a node’ vs. 3 predicates ‘on a node’, dgraph does not care, and will be equally performant at query time (specifically on querying X things ‘on a node’ in either pattern)
  4. Conversely, if you have a huge graph with billions of values, and it only has 5 different predicates (attribute keys, if you will) total, that will give you terrible performance, since the storage is by predicate.
  5. As an extension of the above, indicies will also be huge corresponding to the predicate being indexed.
  6. A well balanced huge database with billions of (key,value)s should be well balanced across a good number of predicates. What is a good number? That may take some work to find out.

I highly suggest you read the whitepaper before designing a database of this magnitude. It is certainly possible to do (I have ~4Bn triples in my current production dgraph) but you should not go in without understanding exactly how dgraph performs operations as to best design your database.

Good luck!

1 Like

Thanks.I shall go through the same.