Initial Question: Hi
This is more of a query. I understand that a client can be created by passing as many servers which are part of cluster. But this would generate a single client with connections to the n number of servers configured.
Like in a traditional database system where in a connection pool exist with already an n number of connections already open and client hiding those details and providing a free connection, does the similar philosophy also applies for a dgraph.
If yes does dgraph recommend a way to implement it?
If no does that mean there is always and always a single connection open between the client and the database? Also isn’t that a problem?
Response From Manish:
In Dgraph, you form the connections and pass a working connection (or multiple) to the client library. Dgraph itself doesn’t do any connection pooling or anything else.
When you give it a bunch of connections, it would choose one at random and send the query using that.
Further Question:
Thanks for the response. Does that mean if there are 3 servers at a time forming a DGraph cluster, I have mention all those server/node’s IP in my code configuration in order to create 3 DgraphGrpc.DgraphBlockingStub objects? Also, does that mean that the client will spawn only 3 connections with Dgraph cluster?
Also, if I want to increase or decrease DGraph nodes in a cluster in future, do I have to change my application configuration every time to accommodate new added/deleted node or DGraph handles it by its own? If yes, how are the things internally done? Can you please point me to the correct documentation?
We can run as many Dgraph servers as we want. We can manually set --idx flag, or we can leave that flag empty, and Zero would auto-assign an id to the server. This id would get persisted in the write-ahead log, so be careful not to delete it.
The new servers will automatically detect each other by communicating with Dgraph zero and establish connections to each other.
Typically, Zero would first attempt to replicate a group, by assigning a new Dgraph server to run the same group as assigned to another. Once the group has been replicated as per the --replicas flag, Zero would create a new group.
Over time, the data would be evenly split across all the groups. So, it’s important to ensure that the number of Dgraph servers is a multiple of the replication setting. For e.g., if we set --replicas=3 in Zero, then run three Dgraph servers for no sharding, but 3x replication. Run six Dgraph servers, for sharding the data into two groups, with 3x replication.
Sorry for late replies. Checked with Dgraph folks over their slack channel and they say that since internally it uses gRPC for connection interface, pooling of connection objects is not required since gRPC manages connections very well and doesn’t need pooling.
I havent tested out how much bulk request java client can take as if for now. But I do see some problems.
1.) We had 3 server node cluster with 1 node for zero and one for ratel. We configured just 1 server in our java application. Connection worked fine but when we shut down the configured server, java client looses the interaction with the cluster. Ideally, java client should have been part of cluster or somehow, connected server should have told java client, these 2 other servers are also part of the cluster and let java client share the request load among all.
Any clues? @shanghai-Jerry, @BlankRain
@piyushGoyal2 This might be happening because you are just configuring one server with the Java client so it is able to interact with only that server and not the other servers in the cluster. If you want interaction with all the other servers in the cluster, you should configure all the three servers in the java client or you can also use a load balancer. Try to do that and let me know if it works for you.
Thanks @karan28aug. But the point of distributed database is addition or removal of servers. Imagine a random day where I have to remove an existing server and add a new one. Does that mean changing the application config?