Can we replicate between two clusters one in AWS Oregon and another in GCP Frankfurt?

greenz1 · September 20, 2018, 1:54pm

We have a few servers and we are looking into transferring some workload to a Graph database. Any guides on replicating data between two clusters at different geo locations?

I found the docs for cluster setup at a single location.

dmai · September 20, 2018, 5:10pm

Dgraph Servers within a single group will by default be replicas of each other. They will still be within the same Dgraph cluster, so the replicas should be close enough for consistent replication.

Servers clustered between Oregon and Frankfurt is far enough where Dgraph would not run correctly.

greenz1 · September 21, 2018, 5:31am

I was under the impression that DGraph’s clustering and ACID transactions behave like Apache Ignite or Google’s Spanner.

Can you please elaborate what might cause problems?

greenz1 · September 21, 2018, 8:26am

@dmai I think I should clear this up.

There will be 2 separate clusters one in AWS Oregon and another in GCP Frankfurt.

We just need to replicate 2 collections in both of the clusters. Is this possible?

Edit: I have edited the question to reflect the same.

dmai · September 21, 2018, 7:15pm

Replication between two independent Dgraph clusters isn’t a feature yet but it’s something others have asked about as well.

We’re combating the physical world here. Ping times between the US west coast (Los Angeles) and Frankfurt, Germany is 148ms (according to WonderNetwork ping times). We expect Dgraph to be fast, and >148ms for network calls within a cluster is not good enough.

You can try it out and see what happens. You can also try having a replica in-between to decrease the time between servers.

greenz1 · September 22, 2018, 8:02am

Just a little clarification to ensure that we are on the same page.

A cluster is a group of connected server machines(VM or Bare Metal)/containers which when queried behave as if the user/application(s) were querying single DB instance. A cluster can consist of one or more nodes. A node is an instance of DB running on a server machine/container. A cluster replicates data between nodes.

Replication is a process of sharing data between multiple nodes and/or clusters to ensure redundant data makes it easy to build scalable systems with high availability.

IMO intra-cluster replication should be very fast(which I think works well in DGraph). Also, it’d be naivety to have a single cluster with nodes separated more than ~30ms apart.
Inter-cluster replication will always have a lag. Well, we can’t move faster than light XDXD.

Edit: We will have AWS Canada Central and GCP London servers up when the office opens up on Monday. The ping data is as follows
Oregon - Montreal = ~70ms
Montreal - London = ~80ms
London - Frankfurt = ~15ms

mrjn · October 11, 2018, 6:01am

I’d be very interested in this, and we can work with you if you want to run a geo replicated cluster like this. DM me. @greenz1

Topic		Replies	Views
Help with cluster design Dgraph	1	800	June 26, 2019
Geographically distribute DGraph Users	4	984	May 15, 2018
Cluster Setup - Deploy Documentation	1	632	June 24, 2023
About DGraph cluster Users	2	517	February 25, 2018
Geographically distributed datacenter replication? GraphQL kind:question	2	871	March 22, 2021

Can we replicate between two clusters one in AWS Oregon and another in GCP Frankfurt?

Related topics