Hello Dgraph Community,
First of all thank you for bringing Dgraph to masses.
I am currently looking for distributed database which can support the maximum features we need natively. Currently we are using (latest) MongoDB 3.6 community edition. I am satisfied with developer productivity, feature set (except Transaction and multi collection update support) and cluster deployment ease but it sucks at latency for read and write when traffic goes up suddenly.
Must to have features :
- Distributed clock sync and transaction when updating multiple collections/tables/nodes
- Declarative query language to push maximum computation to database as it is more aware of data locality and constraint.
a. complex JOINS idioms (something similar to aggregation $lookUp and $graphLookup in MongoDB 3.6 )
b. composition of Select/Insert / update /Delete into single query (like OpenCypher)
c. Oracle/Postgres SQL like parallel execution semantics/hints when needed
d. ability to define custom functions and aggregate functions (to avoid the frequent round trips to application server for such computations back and forth) - Optional schema validation at db level
- DB driver for JS and Clojure(or Java)
a. Async support
b. Connection Pool
c. Cluster awareness (should not require HA proxy to load balance)
d. maintain transaction across multiple calls to DB
e. TLS 1.2 or later support - Indexing
a. Regular (Btree) index and composite index
b. Unique index
c. Text search with score and weight ordering
d. Geo Index - Custom Table/collection partitioning based on value of attribute or function
- Configurable replication factor or default as 3 if not configurable
- Either Good admin UI for efficient cluster management or Fairly nice command line tools to do the same
- No significant loss of throughput (read/write) while new node is joining in or leaving from cluster
- DB users authentication and roles authorization level security
- Should support all features on Single server for local development and testing environment
- Either Hot back up or Multi DC support
- Basic data encryption support
Good to have features:
- Materialized View (implicitly triggered computed table)
- Auto balancing of sharded table chunks in cluster by learning query patterns over period of time to achieve data locality (or any other mechanism to speed up queries in distributed fashion)
- Indexing
a. Partial indexes - index over subset of table/collection data
b. functional indexes - index over subset of table/collection data categorized by function output value - Cloud hosting support (AWS/Azure/GCP)
Also, we are evaluating Neo4J and CitusDB/CockroachDB (with GraphQL layer in application server) as it nearly matches our requirements. So far CitusDB/CockroachDB are winning the race.
I have very limited knowledge about Dgraph, that is why I seek help/advice as to whether it will be good idea to consider Dgraph for such requirements or I am asking too much
Can it replace Big data stack -
Complete alternative to Apache Spark + Cassandra/HBase for hundreds (400 +) of TBs of data on HDD storage with average few hundred (up to 250) millisecond read/write responses (this requirement is for different project)
Thanks,
Piyush Katariya