Migration (from MongoDB) help/advice needed


(Piyush Katariya) #1

Hello Dgraph Community,

First of all thank you for bringing Dgraph to masses.

I am currently looking for distributed database which can support the maximum features we need natively. Currently we are using (latest) MongoDB 3.6 community edition. I am satisfied with developer productivity, feature set (except Transaction and multi collection update support) and cluster deployment ease but it sucks at latency for read and write when traffic goes up suddenly.

Must to have features :

  1. Distributed clock sync and transaction when updating multiple collections/tables/nodes
  2. Declarative query language to push maximum computation to database as it is more aware of data locality and constraint.
    a. complex JOINS idioms (something similar to aggregation $lookUp and $graphLookup in MongoDB 3.6 )
    b. composition of Select/Insert / update /Delete into single query (like OpenCypher)
    c. Oracle/Postgres SQL like parallel execution semantics/hints when needed
    d. ability to define custom functions and aggregate functions (to avoid the frequent round trips to application server for such computations back and forth)
  3. Optional schema validation at db level
  4. DB driver for JS and Clojure(or Java)
    a. Async support
    b. Connection Pool
    c. Cluster awareness (should not require HA proxy to load balance)
    d. maintain transaction across multiple calls to DB
    e. TLS 1.2 or later support
  5. Indexing
    a. Regular (Btree) index and composite index
    b. Unique index
    c. Text search with score and weight ordering
    d. Geo Index
  6. Custom Table/collection partitioning based on value of attribute or function
  7. Configurable replication factor or default as 3 if not configurable
  8. Either Good admin UI for efficient cluster management or Fairly nice command line tools to do the same
  9. No significant loss of throughput (read/write) while new node is joining in or leaving from cluster
  10. DB users authentication and roles authorization level security
  11. Should support all features on Single server for local development and testing environment
  12. Either Hot back up or Multi DC support
  13. Basic data encryption support

Good to have features:

  1. Materialized View (implicitly triggered computed table)
  2. Auto balancing of sharded table chunks in cluster by learning query patterns over period of time to achieve data locality (or any other mechanism to speed up queries in distributed fashion)
  3. Indexing
    a. Partial indexes - index over subset of table/collection data
    b. functional indexes - index over subset of table/collection data categorized by function output value
  4. Cloud hosting support (AWS/Azure/GCP)

Also, we are evaluating Neo4J and CitusDB/CockroachDB (with GraphQL layer in application server) as it nearly matches our requirements. So far CitusDB/CockroachDB are winning the race.
I have very limited knowledge about Dgraph, that is why I seek help/advice as to whether it will be good idea to consider Dgraph for such requirements or I am asking too much :wink:

Can it replace Big data stack -

Complete alternative to Apache Spark + Cassandra/HBase for hundreds (400 +) of TBs of data on HDD storage with average few hundred (up to 250) millisecond read/write responses (this requirement is for different project)

Thanks,
Piyush Katariya


(Manish R Jain) #2

Hey Piyush,

I’d suggest playing with Dgraph to see if it fits what you’re looking for. You can have a look at the roadmap to see what features we have:

Giving a general reply about what all Dgraph can or can not do is a tough ask. But, once you try it out, happy to answer any specific questions.

Cheers,
Manish


(Piyush Katariya) #3

I am not asking for general questions. I am asking for very specific and basic questions.
I am ready to give more details if you think otherwise.
These questions are just for me get started with Dgraph.
I have been evaluating several NoSql/NewSql DBs in past and it takes lots of time to actually play it with and come conclusion.
Even If someone can simply answer in format of (Yes, No and Partial support), that would be enough for me.


(Manish R Jain) #5

Please find my replies inlined. Hope this helps.


Must to have features :

Distributed clock sync and transaction when updating multiple collections/tables/nodes

We have distributed transactions.

Declarative query language to push maximum computation to database as it is more aware of data locality and constraint.

a. complex JOINS idioms (something similar to aggregation $lookUp and $graphLookup in MongoDB 3.6 )

Yes

b. composition of Select/Insert / update /Delete into single query (like OpenCypher)

Yes

c. Oracle/Postgres SQL like parallel execution semantics/hints when needed

Yes, each query is broken into sub-queries and executed concurrently.

d. ability to define custom functions and aggregate functions (to avoid the frequent round trips to application server for such computations back and forth)

Currently, there’re no custom aggregators.

Optional schema validation at db level
DB driver for JS and Clojure(or Java)

Yes, we have a Java and JS driver.

a. Async support - Go has goroutines, but I think it’s blocking calls in Java.

b. Connection Pool

c. Cluster awareness (should not require HA proxy to load balance) - Yes

d. maintain transaction across multiple calls to DB - Yes

e. TLS 1.2 or later support - Yes
Indexing

a. Regular (Btree) index and composite index - Yes, we have indices.

b. Unique index - Currently, not.

c. Text search with score and weight ordering - Text search yes, but no scoring yet.

d. Geo Index - Yes

Custom Table/collection partitioning based on value of attribute or function - Data is distributed, yes.

Configurable replication factor or default as 3 if not configurable - Yes

Either Good admin UI for efficient cluster management or Fairly nice command line tools to do the same - Under construction.

No significant loss of throughput (read/write) while new node is joining in or leaving from cluster - Yes

DB users authentication and roles authorization level security - Planned

Should support all features on Single server for local development and testing environment - Yes
Either Hot back up or Multi DC support - Backup planned, multi-DC support should work, more testing required.

Basic data encryption support - Not yet.

Good to have features:

Materialized View (implicitly triggered computed table)
Auto balancing of sharded table chunks in cluster by learning query patterns over period of time to achieve data locality (or any other mechanism to speed up queries in distributed fashion) - Shard rebalancing, currently using a basic data size heuristic. Can be made smarter over time.

Indexing

a. Partial indexes - index over subset of table/collection data - No partial indices.

b. functional indexes - index over subset of table/collection data categorized by function output value - No partial indices yet.

Cloud hosting support (AWS/Azure/GCP) - Planned for Q1 2018.


(Piyush Katariya) #6

Awesome. Thanks for your response.

I see that you guys are also going to support OpenCypher, that would be super cool.
Not trying to rage a war on declarative query languages but i find OpenCypher (and Datalog also) to be more powerful and succint than GraphQL. Eagerly waiting for this to happen.


(system) #7

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.