Can I use multiple path for data storage

korjavin · January 27, 2021, 9:49pm

On my gcloud node I have multiple dirs like
/mnt/disk1
/mnt/disk2
…
etc

Can I use them (line concatenation) for storing dgraph data?

I understand about mdadm raid-0, etc… But seems it’s not my option.

MichelDiz · January 27, 2021, 9:59pm

No, dgraph doesn’t control filesystems. What you could do, and is really good, is start each alpha using a disk for each.

korjavin · January 27, 2021, 10:00pm

Thinking how I can handle this in k8s.

Is there some extra-args to modify path for the alpha using somethig like replica number?

MichelDiz · January 27, 2021, 10:03pm

Volume management in K8s is a complex topic. But you can configure a persistent volume.

korjavin · January 27, 2021, 10:05pm

I am trying to get benefits from local sdd disks.

Cloud managed disks seems to be too slow.

MichelDiz · January 27, 2021, 10:07pm

Do you mean that you are trying to use a local disk, local indeed? or local in the cloud machine? SSDs on Cloud should be easy to attach.

korjavin · January 27, 2021, 10:10pm

I attached a few sdd disk to k8s node.

In terms of google cloud they are “local-sdd”.

Trying to managing installation now to give them to dgraph pods.

hostPath seems to be an option.

But those disks are limited to 375Gb, and I need to attach a few of them, and because of this I have them as
/mnt/disk1
/mnt/disk2
etc…

MichelDiz · January 27, 2021, 10:14pm

You can have two groups in your cluster. And the Alphas of each group share that path.

e.g

Alpha Group 0 - /mnt/disk1

dgraph (...) -p /dgraph/alpha0
dgraph (...) -p /dgraph/alpha1
dgraph (...) -p /dgraph/alpha2

Alpha Group 1 - /mnt/disk2

dgraph (...) -p /dgraph/alpha3
dgraph (...) -p /dgraph/alpha4
dgraph (...) -p /dgraph/alpha5

Dgraph will balance the predicates between these groups.

korjavin · January 27, 2021, 10:18pm

Yeah, I got this idea.

That’s why I asked, how I can to set those paths in my helm chart.

Seems like I need to calculate path basing on statefulset replica number or something like this.

korjavin · January 27, 2021, 10:18pm

And do you have any advice about “zero” node in such configuration?

How many? One for every two alphas?

MichelDiz · January 27, 2021, 10:22pm

A single Zero is fine. You could have multiple Zeros, but I would recommend that you isolate them. Cuz if one Zero fails, the Alpha tries to reach the others. BTW, there’s no correlation between the size of the Zero Group and the Alphas. It doesn’t matter if you have several Alphas and a single zero. What means when you have multiple zeros is that you have availability.

@joaquin could help there.

korjavin · January 27, 2021, 10:26pm

Another reason, why I wanted exactly two alpha nodes is about data-safety and replication.

If I have 2 gke nodes and 2 alpha nodes and replica=1 I can assume that I can lost 1 gke node, restart it from scratch and still have my data.

Am I right?
(I can make a picture if needed)

In case if I have, let’s say 6 alpha on 2 gke nodes, I have no idea wether all my data is replicated between gke nodes or not.
If one be lost, i lost my data.

By “gke node” I meant a node of my k8s in google cloud.

It happens that something happens with them, but I assume that it will never happen with two of them in the same time. (Risk that I can take).

MichelDiz · January 27, 2021, 10:40pm

Do you mean Alpha Groups? Cuz a group need at least 3 Alphas. It can be 1 or 3 or N odd number.

Wrong, replica=1 will create single groups in the cluster. If replica set to 1 and you have 6 Alphas, you get 6 Groups. There’s no replication there. But 6 shards. As far as I know.

From scratch? if so, you should export your data before starting something from scratch.

In a configuration of replica 3. if you lost one Alpha, you still 2 Alphas that can repopulate the data. But you only lost data if you manually delete the Dgraph folders. e.g. you delete a volume or use an ephemeral volume.

korjavin · January 27, 2021, 10:53pm

I tried to achive something like this

I assume that at any time I can lose “GKE node 1” or “GKE node 2” but not both of them together.

I have to build an architecture of dgraph nodes, that at any time with two nodes or three I’d have all my data present. That’s what I tried to achive via replica=1.

I am pretty noob to Dgraph, and I believe I can understand all that in a wrong way.

I am okay to have 3 GKE Nodes for this. Just image that I added GKE Node 3 on my picture.

But I don’t understand what configuration of alpha/zero nodes I should have to have my cluster working in any moment if I have a node present.

(I understand that after I lose one of gke nodes there must be some rebalancing or something else, and perfomance will degrade).

Thank you for your answers. I feel like you already saved me from some occasion data-loss.

korjavin · January 27, 2021, 10:58pm

Final thing could be like this:

But if I start 9 alpha pods (one for a disk) instead of 3, will I be able to have all my replicas evenly distributed over GKE Nodes?

(please tell me if picture with 9 pods will help)

MichelDiz · January 27, 2021, 11:13pm

I think the HA config is good for you and well documented https://dgraph.io/docs/deploy/ports-usage/#high-availability-ha-cluster-configuration

Your diagram is confusing. The shard is a “group”. The replicas are part of the same group. When you say “Alpha 1” looks like it should be “Alpha Group 1”. Which is 3 replicas of the same shard in a replica 3 configuration. In your diagram you say that there are 3 shards in the Alpha Group 1. Which makes no sense. A group is a set of replicas.

But I see that you wanna split the 2 groups between the GKE nodes. That is fine. if you lost GKE1 your data will be safe anyway. But you need consensus. You need to bring back the lost Alphas to make a majority of the group.

Pay attention that you can’t use replica 2.

Replicas are 100%. There’s no distribution of replicas. What is distributed is the predicate into shards(in groups).

Also, any alpha will send the predicate to the right group. Either a mutation or a query. It means that you can hit any alpha.

korjavin · January 27, 2021, 11:25pm

It seems that I used wrong terms.

I keep in my mind pods ( docker containers with alpha). I call shard a piece of data, and replica a copy of shard.

I read in (HA) cluster configuration doc:

For example, if you set --replicas=3 in for a Zero node, and then run three Alpha nodes for no sharding, but 3x replication.

That looks similar.

But what I want is to have full copy of my data on every GKE node (which hold many alpha pods).

So, If I have 3 GKE nodes, and 3 disks on each.

Then I should strart 9 alpha pods, and set replica=3. Is it correct?

But do I have a guarantee (or how to achive it) that data will spread on all my gke nodes.

I am afraid of situation if all three replicas with unique data will stay on “GKE Node 1” for instance.

Please, excuse me. I don’t understand what does mean

Replicas are 100%.

Are you saying there is no same-data copying? Isn’t that replica?

MichelDiz · January 27, 2021, 11:42pm

This is what I mean

I forgot about the Zero group. But you can add one zero node on each GKE node. And give to every Alpha the address of all zeros existent in your cluster.

korjavin · January 27, 2021, 11:45pm

So all three “NAME” have the same data? If so, it’s what I looking for.

The problem is, I have totaly no idea how to deploy this configuration from my helm chart.

How can I guarantee that I will not have a GKE node with “Name”, “Name” , “Name” ?

@MichelDiz thank you again for your effort. It helps me a lot.

MichelDiz · January 27, 2021, 11:49pm

Yes, they are part of the same group. Which is “replica 3”. Alpha N group holds a set of the predicate. You don’t control what predicate will be moved. It is based on disk usage. But you can try to force it. And there is a feature request to make it deterministic.

Distributing the Alpha Groups between your GKE cluster, makes sense for availability. And you won’t lose any data in a disaster.

Let’s wait for @joaquin, he is the expert there.

If you spread the groups it won’t happen. A predicate can’t be in two groups at the same time.

Topic		Replies	Views
Using Kubernetes - Deploy Documentation	1	1466	August 28, 2020
Dgraph crash loop on aws Dgraph	16	1740	June 8, 2020
Dgraph fails to start on restarts with Kind (Kubernetes) Dgraph	12	1773	October 30, 2020
Running multiple Dgraph alpha pod and zeros in single host Dgraph kind:question	11	688	March 23, 2023
If google would use dgraph, should it use for everything one single dgraph DB, or for every service (Maps, YouTube, GMail...) an own dgraph DB? Dgraph kind:question , dgraph	10	931	October 21, 2021

Can I use multiple path for data storage

Related topics