What is zero what is server?

dalu · November 17, 2017, 8:29pm

I’m trying to deploy dgraph on a docker swarm, for now with docker service create

What is zero and what is server?

You write:

Dgraph Zero Run three Zero instances, assigning a unique ID to each via --idx flag,

but a few lines above you’re using --idx for the server
!?

docker run -it -p 7081:7081 -p 8081:8081 -p 9081:9081 -v ~/data:/dgraph dgraph/dgraph:latest dgraph server -port_offset 1 --memory_mb=<typically half the RAM> --zero=HOSTIPADDR:7080 --my=HOSTIPADDR:7081 --idx <unique-id>

why so complicated?

What is zero and why are you writing stuff that contradicts itself?
Is zero the storage layer?
And server the translator?

mrjn · November 17, 2017, 8:36pm

Zero is the coordinator for the cluster, and Dgraph server is the data server, storing the data.

Note that you only need to run 3 zeros, if you really care about HA. Otherwise, just 1 instance of zero is easier to run and sufficient for most users.

--idx is a unique id for that server. Zero instances need to be passed an idx flag, but you can skip passing that flag to Dgraph server.

I can understand why this must be confusing, which is why it’s better to stick to the simpler setup without any replication, at least in the beginning.

dalu · November 17, 2017, 10:13pm

thanks for the quick reply Manish.

ok here’s what I got so far

version: "3"
services:
  dgz1:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    ports:
      - "7080:7080"
      - "8080:8080"
    volumes:
      - "dgdata1:/dgraph"
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz1:7080 --idx=1
volumes:
  dgdata1:
    driver: local
networks:
  default:
    driver: overlay

and this seems to be working, let’s see if I can get a cluster up and running.

Do zero and server access the same /dgraph directory?

mrjn · November 17, 2017, 10:16pm

No. Every instance has its own unique directory. They are not shared among the servers.

@pawan: Can we build a config file to orchestrate Dgraph, which users can base their configs on?

dalu · November 17, 2017, 10:41pm

To run three replicas for server, set --replicas=3
do I set this on all instances of dgzero?

and to which zero do I talk to with my server?

version: "3"
services:
  dgz1:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    volumes:
      - "dgdata1:/dgraph"
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz1:7080 --idx=1 --replicas=3
  dgz2:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    volumes:
      - "dgdata2:/dgraph"
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz2:7080 --idx=2 --replicas=3
  dgz3:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    volumes:
      - "dgdata3:/dgraph"
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz3:7080 --idx=3 --replicas=3
  dgs:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    ports:
      - "7080:7080"
      - "8080:8080"
      - "9080:9080"
    networks:
      - default
    command: dgraph server --bindall=true --my=dgs:7080 --zero=dgz1:7080 --memory_mb=2048 --idx=4
volumes:
  dgdata1:
    driver: local
  dgdata2:
    driver: local
  dgdata3:
    driver: local
networks:
  default:
    driver: overlay

but this doesn’t work…

dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 pool.go:104: == CONNECT ==> Setting dgz1:7080
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 worker.go:99: Worker listening at address: [::]:7080
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 gRPC server started.  Listening on port 9080
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 HTTP server started.  Listening on port 8080
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 gRPC server started.  Listening on port 9080
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 HTTP server started.  Listening on port 8080
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 groups.go:93: Current Raft Id: 4
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 pool.go:104: == CONNECT ==> Setting dgz1:7080
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 groups.go:113: Connected to group zero. Connection state: member:<id:4 addr:"dgs:7080" > state:<counter:31 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"dgs:7080" leader:true last_update:1510958161 > > members:<key:2 value:<id:2 group_id:1 addr:"dgs:7080" > > members:<key:3 value:<id:3 group_id:1 addr:"dgs:7080" > > tablets:<key:"_predicate_" value:<group_id:1 predicate:"_predicate_" > > > > groups:<key:2 value:<members:<key:4 value:<id:4 group_id:2 addr:"dgs:7080" leader:true last_update:1510958192 > > members:<key:5 value:<id:5 group_id:2 addr:"dgs:7080" > > members:<key:6 value:<id:6 group_id:2 addr:"dgs:7080" > > > > groups:<key:3 value:<members:<key:7 value:<id:7 group_id:3 addr:"dgs:7080" leader:true last_update:1510958217 > > members:<key:8 value:<id:8 group_id:3 addr:"dgs:7080" > > members:<key:9 value:<id:9 group_id:3 addr:"dgs:7080" > > > > groups:<key:4 value:<members:<key:10 value:<id:10 group_id:4 addr:"dgs:7080" leader:true last_update:1510958356 > > > > zeros:<key:1 value:<id:1 addr:"dgz1:7080" leader:true > > maxRaftId:10 > 
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 gRPC server started.  Listening on port 9080
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 gRPC server started.  Listening on port 9080
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 HTTP server started.  Listening on port 8080
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 draft.go:138: Node ID: 4 with GroupID: 2
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 worker.go:99: Worker listening at address: [::]:7080
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 groups.go:113: Connected to group zero. Connection state: member:<id:4 addr:"dgs:7080" > state:<counter:31 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"dgs:7080" leader:true last_update:1510958161 > > members:<key:2 value:<id:2 group_id:1 addr:"dgs:7080" > > members:<key:3 value:<id:3 group_id:1 addr:"dgs:7080" > > tablets:<key:"_predicate_" value:<group_id:1 predicate:"_predicate_" > > > > groups:<key:2 value:<members:<key:4 value:<id:4 group_id:2 addr:"dgs:7080" leader:true last_update:1510958192 > > members:<key:5 value:<id:5 group_id:2 addr:"dgs:7080" > > members:<key:6 value:<id:6 group_id:2 addr:"dgs:7080" > > > > groups:<key:3 value:<members:<key:7 value:<id:7 group_id:3 addr:"dgs:7080" leader:true last_update:1510958217 > > members:<key:8 value:<id:8 group_id:3 addr:"dgs:7080" > > members:<key:9 value:<id:9 group_id:3 addr:"dgs:7080" > > > > groups:<key:4 value:<members:<key:10 value:<id:10 group_id:4 addr:"dgs:7080" leader:true last_update:1510958356 > > > > zeros:<key:1 value:<id:1 addr:"dgz1:7080" leader:true > > maxRaftId:10 > 
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 node.go:230: Group 2 found 0 entries
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 draft.go:680: New Node for group: 2
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 draft.go:138: Node ID: 4 with GroupID: 2
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 node.go:230: Group 2 found 0 entries
dgraph_dgs.1.zhpmx8h6jfxv@n1.selftls.com    | 2017/11/17 22:46:15 Unable to reach leader or any other server in group 2
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:93: Current Raft Id: 4
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 pool.go:104: == CONNECT ==> Setting dgz1:7080
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 worker.go:99: Worker listening at address: [::]:7080
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 HTTP server started.  Listening on port 8080
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 groups.go:93: Current Raft Id: 4
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 pool.go:104: == CONNECT ==> Setting dgz1:7080
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 worker.go:99: Worker listening at address: [::]:7080
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yn26vw86008m@n2.selftls.com    | 2017/11/17 22:51:49 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 draft.go:680: New Node for group: 2
dgraph_dgs.1.yvrmqgbkmoxc@n1.selftls.com    | 2017/11/17 22:46:34 Unable to reach leader or any other server in group 2
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_dgs.1.yf5z0kl7dein@n3.selftls.com    | 2017/11/17 22:49:35 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure

miko · November 18, 2017, 12:19am

FYI: this works for me when added "--peer dgz1:7080" to both dgz2 and dgz3, and then starting the server (dgs) in another stack (after the zero cluster is stabilized). So:
docker stack deploy -c docker-compose.zero.yml --prune dgraph docker stack deploy -c docker-compose.server.yml dgraph

dalu · November 18, 2017, 10:27am

@miko can you please post your yaml configs?
And what do you mean “works”?
did you check docker service logs of the particular services?

I’m currently doing it manually

#!/usr/bin/env bash
docker network create --driver overlay --scope swarm dgraph

docker volume create dgza
docker volume create dgzb
docker volume create dgzc

docker service create --name dgza --restart-condition on-failure --replicas 1 --network dgraph --mount source=dgza,target=/dgraph dgraph/dgraph:latest dgraph zero --bindall true --my dgza:7080 --idx 1 --replicas 3 --peer dgzb:7080
docker service create --name dgzb --restart-condition on-failure --replicas 1 --network dgraph --mount source=dgzb,target=/dgraph dgraph/dgraph:latest dgraph zero --bindall true --my dgzb:7080 --idx 2 --replicas 3 --peer dgzc:7080
docker service create --name dgzc --restart-condition on-failure --replicas 1 --network dgraph --mount source=dgzc,target=/dgraph dgraph/dgraph:latest dgraph zero --bindall true --my dgzc:7080 --idx 3 --replicas 3 --peer dgza:7080

docker volume create dgsa
docker service create --name dgsa --publish 7080:7080 --publish 8080:8080 --publish 9080:9080 --restart-condition on-failure --replicas 1 --network dgraph --mount source=dgsa,target=/dgraph dgraph/dgraph:latest dgraph server --memory_mb 2048 --zero dgza:7080

this doesn’t work for me.
Maybe it’s my network or dns resolving of the static Go binaries, not sure how docker swarm dns works (do they write to /etc/hosts? do they provide their own dns service?) also not sure where I can see the Dockerfile that is used to create the image.

dgza log (only running 1 instance, removed --peer)
docker service create --name dgza --restart-condition on-failure --replicas 1 --network dgraph --mount source=dgza,target=/dgraph dgraph/dgraph:latest dgraph zero --bindall true --my dgza:7080

dgza.1.03pff4srjfsu@n2.selftls.com    | Setting up listener at: 0.0.0.0:7080
dgza.1.03pff4srjfsu@n2.selftls.com    | Setting up listener at: 0.0.0.0:8080
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:27 node.go:230: Group 0 found 0 entries
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:27 raft.go:567: INFO: 1 became follower at term 0
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:27 raft.go:315: INFO: newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:27 raft.go:567: INFO: 1 became follower at term 1
dgza.1.03pff4srjfsu@n2.selftls.com    | Running Dgraph zero...
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:31 raft.go:749: INFO: 1 is starting a new election at term 1
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:31 raft.go:580: INFO: 1 became candidate at term 2
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:31 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 2
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:31 raft.go:621: INFO: 1 became leader at term 2
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:19:31 node.go:301: INFO: raft.node: 1 elected leader 1 at term 2
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:23:13 zero.go:256: Got connection request: addr:"localhost:7080" 
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:23:13 pool.go:104: == CONNECT ==> Setting localhost:7080
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:23:13 zero.go:352: Connected
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:23:17 oracle.go:353: Error while fetching minTs from group 1, err: rpc error: code = Unimplemented desc = unknown service protos.Worker
dgza.1.03pff4srjfsu@n2.selftls.com    | 2017/11/18 10:23:27 oracle.go:353: Error while fetching minTs from group 1, err: rpc error: code = Unimplemented desc = unknown service protos.Worker

docker service create --name dgsa --publish 7080:7080 --publish 8080:8080 --publish 9080:9080 --restart-condition on-failure --replicas 1 --network dgraph --mount source=dgsa,target=/dgraph dgraph/dgraph:latest dgraph server --memory_mb 2048 --zero dgza:7080

[root@n1 ~]# docker service logs dgsa 
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 gRPC server started.  Listening on port 9080
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 HTTP server started.  Listening on port 8080
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 groups.go:93: Current Raft Id: 0
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 pool.go:104: == CONNECT ==> Setting dgza:7080
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 worker.go:99: Worker listening at address: [::]:7080
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:93: Current Raft Id: 0
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 groups.go:113: Connected to group zero. Connection state: member:<id:1 group_id:1 addr:"localhost:7080" > state:<counter:5 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"localhost:7080" > > > > zeros:<key:1 value:<id:1 addr:"dgza:7080" leader:true > > maxRaftId:1 > 
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 draft.go:138: Node ID: 1 with GroupID: 1
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 node.go:230: Group 1 found 0 entries
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 draft.go:680: New Node for group: 1
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:567: INFO: 1 became follower at term 0
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:315: INFO: newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:567: INFO: 1 became follower at term 1
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 groups.go:283: Asking if I can serve tablet for: _predicate_
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:749: INFO: 1 is starting a new election at term 1
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:580: INFO: 1 became candidate at term 2
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 2
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 raft.go:621: INFO: 1 became leader at term 2
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 node.go:301: INFO: raft.node: 1 elected leader 1 at term 2
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:23:13 mutation.go:147: Done schema update predicate:"_predicate_" value_type:STRING list:true 
dgsa.1.guwaixejci4f@n2.selftls.com    | 2017/11/18 10:28:13 groups.go:283: Asking if I can serve tablet for: _dummy_
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 pool.go:104: == CONNECT ==> Setting dgza:7080
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 gRPC server started.  Listening on port 9080
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 HTTP server started.  Listening on port 8080
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 worker.go:99: Worker listening at address: [::]:7080
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 gRPC server started.  Listening on port 9080
dgsa.1.pl7vhglf6hnr@n3.selftls.com    | 2017/11/18 10:22:39 gRPC server started.  Listening on port 9080
dgsa.1.pl7vhglf6hnr@n3.selftls.com    | 2017/11/18 10:22:39 HTTP server started.  Listening on port 8080
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 HTTP server started.  Listening on port 8080
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 groups.go:93: Current Raft Id: 0
dgsa.1.pl7vhglf6hnr@n3.selftls.com    | 2017/11/18 10:22:39 groups.go:93: Current Raft Id: 0
dgsa.1.pl7vhglf6hnr@n3.selftls.com    | 2017/11/18 10:22:39 pool.go:104: == CONNECT ==> Setting dgza:7080
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 pool.go:104: == CONNECT ==> Setting dgza:7080
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 worker.go:99: Worker listening at address: [::]:7080
dgsa.1.pl7vhglf6hnr@n3.selftls.com    | 2017/11/18 10:22:39 worker.go:99: Worker listening at address: [::]:7080
dgsa.1.pl7vhglf6hnr@n3.selftls.com    | 2017/11/18 10:22:39 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.qw02pwz0kwaj@n3.selftls.com    | 2017/11/18 10:23:01 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgsa.1.ryti50mx4710@n1.selftls.com    | 2017/11/18 10:22:13 groups.go:108: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure

dalu · November 18, 2017, 12:51pm

Well I give up.
I don’t know how to run this in any other context than a dev/test context.

In a way dgraph is like kubernetes, too good to be true and hard to setup.
Idk why there aren’t any --dashboard_root= or --data_root= options

You can’t run it without docker and with docker it only runs in a “server runs in the dgraph zero instance” which doesn’t make sense at all.
You can’t write a service definition that takes the same binary.

You wrote: “Every instance has its own unique directory.”
Yes, but you should’ve been more explicit.
Every instance has its own unique directory as far as zero is concerned.
But every server shares the data directory with the zero instance.
And if this isn’t true then I don’t know what to believe anymore.

What good is open source software if there is no manual to tell you how to run it in a production context?
I’m now past a 14 day kubernetes nightmare only to slip into another nightmare with dgraph.
I’ll stay with mongodb at least I know how to run this and I don’t need to bend my fingers to run a replicaset.
Sorry but I’m just really frustrated right now.

ManiGandham · November 18, 2017, 11:43pm

@dalu

There are 2 tiers:

Dgraph Server - these nodes store your actual data and process queries

Dgraph Zero - these nodes dont store any data, they just manage the Dgraph Servers and assign them data to store so that it meets the replication factor you set, and also change that assignment to rebalance data as it grows.

Process:

Start first Dgraph Zero node
Start more Dgraph Zero nodes, point them to the first one so that they form a cluster
Start Dgraph Server nodes, point them to any of the Dgraph Zero nodes so that they know what to do

Are you using Kubernetes? It’s a great platform, what went wrong there? It looks like you’re using Docker Swarm from your configs here. Issues that I see:

Dgraph Zero nodes #2 and #3 need to point to DZ node #1 so that they can form the cluster, otherwise each DZ node is starting on its own without any knowledge of the others.
Run more than one Dgraph Server node, you should start at least 3 to match the amount of data replication you set.
Only the Dgraph Server nodes need data volumes because they store the actual data, not Dgraph Zero, although you don’t really need persistent volumes at all since both DZ and DS nodes will run in clusters with replicated data.

The documentation is still rough and the naming of Dgraph Zero and Server can be better, but the system does work. Hope that helps.

miko · November 19, 2017, 1:24am

Hello @dalu, here is my setup:
docker-compose.zero.yml:

version: "3.4"
networks:
  default:
    driver: overlay
services:
  dgz1:
    image: dgraph/dgraph:v0.9.1
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz1:7080 --idx=1 --replicas=2 
  dgz2:
    image: dgraph/dgraph:v0.9.1
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz2:7080 --idx=2 --replicas=2 --peer dgz1:7080
  dgz3:
    image: dgraph/dgraph:v0.9.1
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    networks:
      - default
    command: dgraph zero --bindall=true --my=dgz3:7080 --idx=3 --replicas=2 --peer dgz1:7080

docker-compose.server.yml:

version: "3.4"
networks:
  default:
    driver: overlay
services:
  dgs1:
    image: dgraph/dgraph:v0.9.1
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    ports:
      - "8080:8080"
      - "9080:9080"
    networks:
      - default
    command: dgraph server --bindall=true --my=dgs1:7080 --zero=dgz1:7080 --memory_mb=1024 
  dgs2:
    image: dgraph/dgraph:v0.9.1
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    ports:
      - "8081:8081"
      - "9081:9081"
    networks:
      - default
    command: dgraph server --bindall=true --my=dgs2:7081 --zero=dgz1:7080 --memory_mb=1024 --port_offset=1

Start it with:

docker stack deploy -c docker-compose.zero.yml dg
docker stack deploy -c docker-compose.server.yml dg

Then run queries to mutate and index the data (first two curl commands):

https://docs.dgraph.io/get-started/#step-3-run-queries

Then connect to localhost:8080 or localhost:8081 (both work for me) to query the data (from the third curl command). It “works for me” means I can see the resulting graphs on both ports.

For production setup I would use GitHub - joyent/containerpilot: A service for autodiscovery and configuration of applications running in containers with consul-template to generate the config files.

pawan · November 19, 2017, 11:37pm

Hey @dalu

Sorry that you had to face issues. I see you also have a Github issue about this. I will add documentation for using Docker Swarm and Kubernetes with Dgraph this week.

Thanks @ManiGandham @miko for your help here.

dalu · November 22, 2017, 10:32am

No, I wanted to. But setting it up is too complicated. That’s what I meant that I’m past a 14 day kubernetes nightmare.
I tried kubeadm but was locked out of my cluster.
I wanted to go manual 3 node master setup but the documenation is riddled with explosives.
And I don’t want to use GCE or AWS because they’re too expensive.
And rancher (as the simple way to do it) sometimes works sometimes doesn’t and only has 1 master node.

Anyhow thank you all for feedback.

And I think docker swarm isn’t quite there yet.
I used IPV6 --advertise-url which lead to containers not being able to communicate with each other.
Then I switched to IPv4 and many problems went away.
When I used an encrypted network docker network create --opts encrypted netname only 2 out of 3 hosts were able to communicate.
And on top of all this, it was done on a Hetzner cloud test, aka they’re testing their openstack cloud infra and were giving away 3x 4GB RAM 40GB disk instances for free until 30.11.2017. So I used this to play with kubernetes setup and with docker swarm and see what works and what doesn’t.

Also I adjusted my config so that node-z1 wouldn’t use --peer and the other nodes use this as their peer.
And the first time I ran the docker stack deploy it would work fine.
But then I teared it down, stopped and removed all containers on all nodes and ran it again however
this time there were network errors again. As far as I remember between the server and z1 node.

@miko thanks. I didn’t know you could deploy 2 stacks in the same namespace.
I’ll try this.

And sorry for late reply I was moving to another country.

Yes I did, thank you @pawan.

You’re really nice guys and I hope you’ll be successful with your project. It is really something like a saviour in the world of Go “databases”. ORMs are in a sad state and mongo ToMany ToOne etc means doing it all by hand. dgraph as the backend store would speed up development with the flexibility of SQL (dependency free) only with a simpler ql. And as a bonus one can benefit from free rdf collections. It’s a win win.

dalu · November 22, 2017, 1:15pm

HEUREKA
it works

dgz.yml

version: "3.4"
volumes:
  dgdz1:
    driver: local
  dgdz2:
    driver: local
  dgdz3:
    driver: local
networks:
  default:
    driver: overlay
    driver_opts:
      encrypted: ""
services:
  dgz1:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    networks:
      - default
    volumes:
      - "dgdz1:/dgraph"
    command: dgraph zero --bindall=true --my=dgz1:7080 --idx=1 --replicas=3 
  dgz2:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    networks:
      - default
    volumes:
      - "dgdz2:/dgraph"
    command: dgraph zero --bindall=true --my=dgz2:7080 --idx=2 --replicas=3 --peer dgz1:7080
  dgz3:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 1G
    networks:
      - default
    volumes:
      - "dgdz3:/dgraph"
    command: dgraph zero --bindall=true --my=dgz3:7080 --idx=3 --replicas=3 --peer dgz1:7080

dgs.yml

version: "3.4"
volumes:
  dgds1:
    driver: local
  dgds2:
    driver: local
  dgds3:
    driver: local
networks:
  default:
    driver: overlay
    driver_opts:
      encrypted: ""
services:
  dgs1:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    ports:
      - "8080:8080"
      - "9080:9080"
    networks:
      - default
    volumes:
      - "dgds1:/dgraph"
    command: dgraph server --bindall=true --my=dgs1:7080 --zero=dgz1:7080 --memory_mb=2048
  dgs2:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    ports:
      - "8081:8081"
      - "9081:9081"
    networks:
      - default
    volumes:
      - "dgds2:/dgraph"
    command: dgraph server --bindall=true --my=dgs2:7081 --zero=dgz2:7080 --memory_mb=2048 --port_offset=1
  dgs3:
    image: dgraph/dgraph:latest
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      resources:
        limits:
          memory: 2G
    ports:
      - "8082:8082"
      - "9082:9082"
    networks:
      - default
    volumes:
      - "dgds3:/dgraph"
    command: dgraph server --bindall=true --my=dgs3:7082 --zero=dgz3:7080 --memory_mb=2048 --port_offset=2

system · December 22, 2017, 1:15pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dgraph cluster setup Dgraph dgraph , cluster , docker	2	521	March 9, 2023
Dgraph Zero Docker volume Users	5	794	February 23, 2018
Replacing zero and server nodes Users	11	2690	March 4, 2018
Deploying Dgraph to Docker Swarm Users	6	1544	August 13, 2018
Zero is restarting again and again Users kind:question	2	631	January 19, 2022

What is zero what is server?

Related topics