Can we replicate between two clusters one in AWS Oregon and another in GCP Frankfurt?


(Chintan Mishra) #1

We have a few servers and we are looking into transferring some workload to a Graph database. Any guides on replicating data between two clusters at different geo locations?

I found the docs for cluster setup at a single location.


(Daniel Mai) #2

Dgraph Servers within a single group will by default be replicas of each other. They will still be within the same Dgraph cluster, so the replicas should be close enough for consistent replication.

Servers clustered between Oregon and Frankfurt is far enough where Dgraph would not run correctly.


(Chintan Mishra) #3

I was under the impression that DGraph’s clustering and ACID transactions behave like Apache Ignite or Google’s Spanner.

Can you please elaborate what might cause problems?


(Chintan Mishra) #4

@dmai I think I should clear this up.

There will be 2 separate clusters one in AWS Oregon and another in GCP Frankfurt.

We just need to replicate 2 collections in both of the clusters. Is this possible?

Edit: I have edited the question to reflect the same.


(Daniel Mai) #5

Replication between two independent Dgraph clusters isn’t a feature yet but it’s something others have asked about as well.

We’re combating the physical world here. Ping times between the US west coast (Los Angeles) and Frankfurt, Germany is 148ms (according to WonderNetwork ping times). We expect Dgraph to be fast, and >148ms for network calls within a cluster is not good enough.

You can try it out and see what happens. You can also try having a replica in-between to decrease the time between servers.


(Chintan Mishra) #6

Just a little clarification to ensure that we are on the same page.

A cluster is a group of connected server machines(VM or Bare Metal)/containers which when queried behave as if the user/application(s) were querying single DB instance. A cluster can consist of one or more nodes. A node is an instance of DB running on a server machine/container. A cluster replicates data between nodes.

Replication is a process of sharing data between multiple nodes and/or clusters to ensure redundant data makes it easy to build scalable systems with high availability.

IMO intra-cluster replication should be very fast(which I think works well in DGraph). Also, it’d be naivety to have a single cluster with nodes separated more than ~30ms apart.
Inter-cluster replication will always have a lag. Well, we can’t move faster than light XDXD.

Edit: We will have AWS Canada Central and GCP London servers up when the office opens up on Monday. The ping data is as follows
Oregon - Montreal = ~70ms
Montreal - London = ~80ms
London - Frankfurt = ~15ms


(Manish R Jain) #8

I’d be very interested in this, and we can work with you if you want to run a geo replicated cluster like this. DM me. @greenz1