Migration (from MongoDB) help/advice needed

piyush · December 24, 2017, 8:11am

Hello Dgraph Community,

First of all thank you for bringing Dgraph to masses.

I am currently looking for distributed database which can support the maximum features we need natively. Currently we are using (latest) MongoDB 3.6 community edition. I am satisfied with developer productivity, feature set (except Transaction and multi collection update support) and cluster deployment ease but it sucks at latency for read and write when traffic goes up suddenly.

Must to have features :

Distributed clock sync and transaction when updating multiple collections/tables/nodes
Declarative query language to push maximum computation to database as it is more aware of data locality and constraint.
a. complex JOINS idioms (something similar to aggregation $lookUp and $graphLookup in MongoDB 3.6 )
b. composition of Select/Insert / update /Delete into single query (like OpenCypher)
c. Oracle/Postgres SQL like parallel execution semantics/hints when needed
d. ability to define custom functions and aggregate functions (to avoid the frequent round trips to application server for such computations back and forth)
Optional schema validation at db level
DB driver for JS and Clojure(or Java)
a. Async support
b. Connection Pool
c. Cluster awareness (should not require HA proxy to load balance)
d. maintain transaction across multiple calls to DB
e. TLS 1.2 or later support
Indexing
a. Regular (Btree) index and composite index
b. Unique index
c. Text search with score and weight ordering
d. Geo Index
Custom Table/collection partitioning based on value of attribute or function
Configurable replication factor or default as 3 if not configurable
Either Good admin UI for efficient cluster management or Fairly nice command line tools to do the same
No significant loss of throughput (read/write) while new node is joining in or leaving from cluster
DB users authentication and roles authorization level security
Should support all features on Single server for local development and testing environment
Either Hot back up or Multi DC support
Basic data encryption support

Good to have features:

Materialized View (implicitly triggered computed table)
Auto balancing of sharded table chunks in cluster by learning query patterns over period of time to achieve data locality (or any other mechanism to speed up queries in distributed fashion)
Indexing
a. Partial indexes - index over subset of table/collection data
b. functional indexes - index over subset of table/collection data categorized by function output value
Cloud hosting support (AWS/Azure/GCP)

Also, we are evaluating Neo4J and CitusDB/CockroachDB (with GraphQL layer in application server) as it nearly matches our requirements. So far CitusDB/CockroachDB are winning the race.
I have very limited knowledge about Dgraph, that is why I seek help/advice as to whether it will be good idea to consider Dgraph for such requirements or I am asking too much

Can it replace Big data stack -

Complete alternative to Apache Spark + Cassandra/HBase for hundreds (400 +) of TBs of data on HDD storage with average few hundred (up to 250) millisecond read/write responses (this requirement is for different project)

Thanks,
Piyush Katariya

mrjn · December 28, 2017, 10:40pm

Hey Piyush,

I’d suggest playing with Dgraph to see if it fits what you’re looking for. You can have a look at the roadmap to see what features we have:

github.com/dgraph-io/dgraph

Product Roadmap

opened 04:39AM - 30 Nov 15 UTC

closed 06:51PM - 14 Jan 19 UTC

manishrjain

roadmap

- [x] Low Latency - [x] High Throughput - [x] RDF Parsing - [x] ~[Rocks DB](http…://rocksdb.org/)~ Badger DB for persistence - [x] ~Commit Logs~ (Replaced by RAFT logs) - [x] Query Language - [GraphQL](http://facebook.github.io/graphql/)-like - [x] Query - [x] Root Arguments - [x] Fields - [x] Response in JSON - [x] Field Arguments - [x] Field Alias - [x] Mutations #23 [v0.2] - [x] Fragments #8 - [x] Variables - [x] Type System - [x] Scalar Types - [x] ~Object Types~ - [x] Mutation validation for scalar types - [x] String matching / Name search - [x] Sort by attribute - [x] Limit number of results #9 - [x] Filter - [x] anyof - [x] allof - [x] eq (equal) - [x] inequality (>=, <=, >, <) - [x] Aggregate Functions - [x] count - [x] sum - [x] max - [x] min - [x] Geospatial Queries - [x] Nearby - [x] Within - [x] Contains - [x] Intersects - [x] Official Clients - [x] Javascript - [x] Go [v0.3] - [x] Java - [x] Distributed Transactions - [x] Distributed #14 [v0.2] - [x] Distributed Loader [v0.2] - [x] Distributed Server [v0.2] - [x] Clustering - [x] Node discovery and membership via Dgraph Zero - [x] High Availability - [x] Raft - [x] Automatic Data Replication - [x] Automatic Failover for reads - [x] Read linearizability - [x] Resilience - [x] Shard moves to handle server failure - [x] Export - [ ] Backup --- ### After v1.0 / Proprietary Plugins - [ ] Multi-homing support - [ ] Cypher Support - [ ] Access Control Lists - [ ] Query Graphical User Interface - [ ] User authentication - [ ] Cluster Management - [ ] SPARQL [maybe] - [ ] Tinkerpop Support [maybe] - [x] Distributed transactions [maybe]

Giving a general reply about what all Dgraph can or can not do is a tough ask. But, once you try it out, happy to answer any specific questions.

Cheers,
Manish

piyush · January 2, 2018, 6:33am

I am not asking for general questions. I am asking for very specific and basic questions.
I am ready to give more details if you think otherwise.
These questions are just for me get started with Dgraph.
I have been evaluating several NoSql/NewSql DBs in past and it takes lots of time to actually play it with and come conclusion.
Even If someone can simply answer in format of (Yes, No and Partial support), that would be enough for me.

mrjn · January 3, 2018, 5:21am

Please find my replies inlined. Hope this helps.

Must to have features :

Distributed clock sync and transaction when updating multiple collections/tables/nodes

We have distributed transactions.

Declarative query language to push maximum computation to database as it is more aware of data locality and constraint.

a. complex JOINS idioms (something similar to aggregation $lookUp and $graphLookup in MongoDB 3.6 )

Yes

b. composition of Select/Insert / update /Delete into single query (like OpenCypher)

Yes

c. Oracle/Postgres SQL like parallel execution semantics/hints when needed

Yes, each query is broken into sub-queries and executed concurrently.

d. ability to define custom functions and aggregate functions (to avoid the frequent round trips to application server for such computations back and forth)

Currently, there’re no custom aggregators.

Optional schema validation at db level
DB driver for JS and Clojure(or Java)

Yes, we have a Java and JS driver.

a. Async support - Go has goroutines, but I think it’s blocking calls in Java.

b. Connection Pool

c. Cluster awareness (should not require HA proxy to load balance) - Yes

d. maintain transaction across multiple calls to DB - Yes

e. TLS 1.2 or later support - Yes
Indexing

a. Regular (Btree) index and composite index - Yes, we have indices.

b. Unique index - Currently, not.

c. Text search with score and weight ordering - Text search yes, but no scoring yet.

d. Geo Index - Yes

Custom Table/collection partitioning based on value of attribute or function - Data is distributed, yes.

Configurable replication factor or default as 3 if not configurable - Yes

Either Good admin UI for efficient cluster management or Fairly nice command line tools to do the same - Under construction.

No significant loss of throughput (read/write) while new node is joining in or leaving from cluster - Yes

DB users authentication and roles authorization level security - Planned

Should support all features on Single server for local development and testing environment - Yes
Either Hot back up or Multi DC support - Backup planned, multi-DC support should work, more testing required.

Basic data encryption support - Not yet.

Good to have features:

Materialized View (implicitly triggered computed table)
Auto balancing of sharded table chunks in cluster by learning query patterns over period of time to achieve data locality (or any other mechanism to speed up queries in distributed fashion) - Shard rebalancing, currently using a basic data size heuristic. Can be made smarter over time.

Indexing

a. Partial indexes - index over subset of table/collection data - No partial indices.

b. functional indexes - index over subset of table/collection data categorized by function output value - No partial indices yet.

Cloud hosting support (AWS/Azure/GCP) - Planned for Q1 2018.

piyush · January 3, 2018, 8:49am

Awesome. Thanks for your response.

I see that you guys are also going to support OpenCypher, that would be super cool.
Not trying to rage a war on declarative query languages but i find OpenCypher (and Datalog also) to be more powerful and succint than GraphQL. Eagerly waiting for this to happen.

system · February 2, 2018, 8:49am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mongo-dgraph data synchronization Dgraph	3	1159	April 30, 2019
Releasing distributed transactions in v0.9 - Dgraph Blog Blog	10	1980	December 7, 2018
Releasing Dgraph v0.7.1 - Dgraph Blog Blog	0	898	August 18, 2017
Releasing Dgraph v0.7.1 - Dgraph Blog Blog	0	1009	January 5, 2017
Questions before using Users	1	404	May 10, 2018

Migration (from MongoDB) help/advice needed

Related topics