Geospatial data and queries

I’ve created a brainstorming document for geo spatial queries:

Anyone who is interested please take a look and provide suggestions on what we can implement in the first version.


Data Types

Supported GIS features are the 7 OpenGIS simple features:

  • Point
  • MultiPoint
  • LineString
  • MultiLineString
  • Polygon
  • MultiPolygon
  • GeometryCollection

Types of supported columns

Geometry:

  • Features are in a 2-D Euclidean space. All calculations done in 2-D.

Geography

Features are lat-long positions on Earth. Calculations are done using an ellipsoidal (round-earth) model. Polygons have a left-hand orientation. Default model for earth: http://spatialreference.org/ref/epsg/wgs-84/ (https://en.wikipedia.org/wiki/World_Geodetic_System)

Format

Well Known Text (WKT) & Well Known Binary (WKB)
http://docs.opengeospatial.org/is/12-063r5/12-063r5.html

GeoJson

https://tools.ietf.org/html/rfc7946

Input/output format can be WKT or GeoJson.
Storage format should be WKB (as expected by RocksDB spatial index)

Go implementation of all 3 formats: https://github.com/twpayne/go-geom

Query Operators:

Possible query operators:

  • within (mongo, sql, postgis, mysql): whether one geometry is completely within another E.g. Find me all restaurants whose location is within ‘polygon representing border of Menlo Park’
  • intersect (mongo, sql, postgis, mysql): whether two geometries intersect each other E.g. find me all US states whose borders intersect ‘polygon for route 66’
  • disjoint(sql, postgis, mysql): not intersect
  • distance (sql, postgis, mysql): return minimum distance
    Useful for nearest neighbor queries (MongoDB uses near and does not expose the distance)
    E.g. find me all gas stations whose location is closest to my location, sort ascending by distance.

Storage

RocksDB Spatial indexing:
http://rocksdb.org/blog/2039/spatial-indexing-in-rocksdb/
Supports indexing bounding boxes. Index can be used to query all intersecting bounding boxes at a “level”. Supports multiple “levels” in the same index.
Note: This is not an actual R-tree index, but rather a single level index. Multiple levels are managed by the client by specifying the “level” for each feature.
Calculations are done in euclidean space. We will have to apply a geographic calculation on top of that.

Build our own index as described: https://msdn.microsoft.com/en-us/library/bb895265.aspx

References

After spending a bunch of time poking through this and trying to figure out the simplest solution, I think this is what we should do for a first version: (We can always add more features later on)

Storage: We use the RocksDB spatial index for storing the spatial data in WKB format. In a later release, we can consider adding our own indexes.
Data type: We only support the geography type at the moment. (It is more work than supporting only geometry, but is the best use if we want to store actual map data.)
Data format: I/O data format is GeoJson. Storage is WKB.
Query operators: We’ll start with two operators: within and near for the two most likely queries i.e. find me all Indian restaurants in SF, find the closest Indian restaurants to my location.

I’ll put together a design doc for this.

2 Likes

Sounds like fair logic. Can you also explain a bit about what you compared against? Like why GeoJson, what is the other option possible – that’d give me a bit more understanding of the choices.

There are two text formats for specifying geo data: geojson and well known text. I chose geojson because it’s easier to read and matches our Json output format. MongoDB uses Geojson.

However since RDF allows you to specify the type of the value we can accept both formats. That will make it easier to import existing datasets which are already in WKT without having to convert them to the geojson format.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.