Know when alpha server is ready to receive requests


(Francesco Magliocca) #1

I am using dgraph’s go client, but I am experiencing a strange behavior:
here is the Go code:

import "github.com/dgraph-io/dgo"
import "github.com/dgraph-io/dgo/protos/api"
import "google.golang.org/grpc"
import "context"
import "fmt"

func main() {
    conn, err1 := grpc.Dial("server:9080", grpc.WithInsecure())
    if err1 != nil {
        panic(err1.Error())
    }

    defer conn.Close()

    client := dgo.NewDgraphClient(api.NewDgraphClient(conn))

    err2 := client.Alter(context.Background(), &api.Operation { DropAll: true})
    if err2 != nil {
        panic(err2.Error())
    }

    newSchema := &api.Operation {
        Schema: `email: string @index(exact) .
                 pass: password .
`,
    }
    err3 := client.Alter(context.Background(), newSchema)
    if err3 != nil {
        panic(err3.Error())
    }
}

when I do the first client.Alter, err2 is not nil, and the error is:

panic: rpc error: code = Unknown desc = Please retry again, server is not ready to accept requests

I use this docker-compose.yml to bring up all the services:

version: "3.2"
services:
    webserver:
        build: . # use the Dockerfile in current directory
        depends_on:
            - server
            - zero

    # Dgraph services
    zero:
        image: "dgraph/dgraph:latest"
        volumes:
            - type: volume
              source: dgraph
              target: /dgraph
              volume:
                  nocopy: true

        ports:
            - "5080:5080" # gRPC port
            - "6080:6080"

        restart: on-failure
        command: dgraph zero --my=zero:5080

    server:
        image: "dgraph/dgraph:latest"
        volumes:
            - type: volume
              source: dgraph
              target: /dgraph
              volume:
                  nocopy: true

        ports:
            - "8080:8080"
            - "9080:9080" # gRPC port

        restart: on-failure
        command: dgraph alpha --my=server:8080 --lru_mb=2048 --zero=zero:5080
    # Probably the UI is not needed
    ratel:
        image: "dgraph/dgraph:latest"
        volumes:
            - type: volume
              source: dgraph
              target: /dgraph
              volume:
                  nocopy: true

        ports:
            - "8000:8000"
        command: dgraph-ratel

volumes:
    dgraph:

and here is the logs I see when running the various services:

zero_1       | I0520 10:59:35.236286       1 init.go:88]
zero_1       |
zero_1       | Dgraph version   : v1.0.14
zero_1       | Commit SHA-1     : 26cb2f94
zero_1       | Commit timestamp : 2019-04-12 13:21:56 -0700
zero_1       | Branch           : HEAD
zero_1       | Go version       : go1.11.5
zero_1       |
zero_1       | For Dgraph official documentation, visit https://docs.dgraph.io.
zero_1       | For discussions about Dgraph     , visit https://discuss.dgraph.io.
zero_1       | To say hi to the community       , visit https://dgraph.slack.com.
zero_1       |
zero_1       | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
zero_1       | Copyright 2015-2018 Dgraph Labs, Inc.
zero_1       |
zero_1       |
zero_1       | I0520 10:59:35.236579       1 run.go:98] Setting up grpc listener at: 0.0.0.0:5080
zero_1       | I0520 10:59:35.236675       1 run.go:98] Setting up http listener at: 0.0.0.0:6080
zero_1       | badger 2019/05/20 10:59:35 INFO: All 0 tables opened in 0s
zero_1       | I0520 10:59:35.246741       1 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc0004c2e70 Applied:0 MaxSizePerMsg:1048576 MaxCommittedSizePerReady:0 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1fbc410 DisableProposalForwarding:false}
zero_1       | I0520 10:59:35.247420       1 node.go:297] Group 0 found 1 entries
ratel_1      | 2019/05/20 10:59:35 Listening on port 8000...
zero_1       | I0520 10:59:35.247497       1 node.go:85] 1 became follower at term 0
webserver_1  | panic: rpc error: code = Unknown desc = Please retry again, server is not ready to accept requests
webserver_1  |
webserver_1  | goroutine 1 [running]:
webserver_1  | main.main()
webserver_1  | 	/app/main.go:71 +0x344
zero_1       | I0520 10:59:35.247585       1 node.go:85] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
zero_1       | I0520 10:59:35.247604       1 node.go:85] 1 became follower at term 1
zero_1       | I0520 10:59:35.247839       1 run.go:284] Running Dgraph Zero...
zero_1       | E0520 10:59:35.248036       1 raft.go:499] While proposing CID: Not Zero leader. Aborting proposal: cid:"5a182804-e9a7-468d-a816-e2c29b2ee2b1" . Retrying...
server_1     | I0520 10:59:35.335857       1 init.go:88]
server_1     |
server_1     | Dgraph version   : v1.0.14
server_1     | Commit SHA-1     : 26cb2f94
server_1     | Commit timestamp : 2019-04-12 13:21:56 -0700
server_1     | Branch           : HEAD
server_1     | Go version       : go1.11.5
server_1     |
server_1     | For Dgraph official documentation, visit https://docs.dgraph.io.
server_1     | For discussions about Dgraph     , visit https://discuss.dgraph.io.
server_1     | To say hi to the community       , visit https://dgraph.slack.com.
server_1     |
server_1     | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
server_1     | Copyright 2015-2018 Dgraph Labs, Inc.
server_1     |
server_1     |
server_1     | I0520 10:59:35.336173       1 server.go:117] Setting Badger table load option: mmap
server_1     | I0520 10:59:35.336179       1 server.go:129] Setting Badger value log load option: mmap
server_1     | I0520 10:59:35.336183       1 server.go:157] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
zero_1       | I0520 10:59:35.252205       1 node.go:175] Setting conf state to nodes:1
zero_1       | I0520 10:59:35.252240       1 raft.go:669] Done applying conf change at 0x1
zero_1       | I0520 10:59:35.504551       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1       | I0520 10:59:35.504668       1 node.go:85] 1 no leader at term 1; dropping index reading msg
server_1     | I0520 10:59:35.350407       1 node.go:85] All 0 tables opened in 0s
server_1     | I0520 10:59:35.352292       1 server.go:117] Setting Badger table load option: mmap
server_1     | I0520 10:59:35.352303       1 server.go:129] Setting Badger value log load option: mmap
server_1     | I0520 10:59:35.352308       1 server.go:171] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
server_1     | I0520 10:59:35.396182       1 node.go:85] All 0 tables opened in 0s
server_1     | I0520 10:59:35.402994       1 groups.go:95] Current Raft Id: 0x0
server_1     | I0520 10:59:35.403077       1 run.go:387] gRPC server started.  Listening on port 9080
server_1     | I0520 10:59:35.403087       1 run.go:388] HTTP server started.  Listening on port 8080
server_1     | I0520 10:59:35.403101       1 worker.go:81] Worker listening at address: [::]:7080
server_1     | I0520 10:59:35.503154       1 pool.go:138] CONNECTED to zero:5080
server_1     | I0520 10:59:35.800166       1 server.go:275] Received ALTER op: drop_all:true
amt-core_webserver_1 exited with code 2
zero_1       | W0520 10:59:37.504886       1 node.go:636] [0x1] Read index context timed out
zero_1       | I0520 10:59:37.504980       1 node.go:85] 1 no leader at term 1; dropping index reading msg
zero_1       | I0520 10:59:38.048069       1 node.go:85] 1 is starting a new election at term 1
zero_1       | I0520 10:59:38.048127       1 node.go:85] 1 became pre-candidate at term 1
zero_1       | I0520 10:59:38.048150       1 node.go:85] 1 received MsgPreVoteResp from 1 at term 1
zero_1       | I0520 10:59:38.048185       1 node.go:85] 1 became candidate at term 2
zero_1       | I0520 10:59:38.048197       1 node.go:85] 1 received MsgVoteResp from 1 at term 2
zero_1       | I0520 10:59:38.048236       1 node.go:85] 1 became leader at term 2
zero_1       | I0520 10:59:38.048261       1 node.go:85] raft.node: 1 elected leader 1 at term 2
zero_1       | I0520 10:59:38.048343       1 raft.go:641] I've become the leader, updating leases.
zero_1       | I0520 10:59:38.048376       1 assign.go:42] Updated Lease id: 1. Txn Ts: 1
zero_1       | E0520 10:59:38.248648       1 raft.go:499] While proposing CID: Not Zero leader. Aborting proposal: cid:"41f809ba-6710-4f1d-b77d-7552f043e305" . Retrying...
zero_1       | W0520 10:59:39.505164       1 node.go:636] [0x1] Read index context timed out
zero_1       | I0520 10:59:39.505478       1 zero.go:414] Connected: cluster_info_only:true
zero_1       | I0520 10:59:39.506790       1 zero.go:396] Got connection request: addr:"server:8080"
zero_1       | I0520 10:59:39.507271       1 pool.go:138] CONNECTED to server:8080
zero_1       | I0520 10:59:39.510759       1 zero.go:523] Connected: id:1 group_id:1 addr:"server:8080"
zero_1       | W0520 10:59:39.510854       1 pool.go:226] Connection lost with server:8080. Error: rpc error: code = Unavailable desc = transport is closing
server_1     | I0520 10:59:39.511403       1 groups.go:118] Connected to group zero. Assigned group: 1
server_1     | I0520 10:59:39.511595       1 groups.go:120] Raft Id after connection to Zero: 0x1
server_1     | I0520 10:59:39.515653       1 draft.go:74] Node ID: 0x1 with GroupID: 1
server_1     | I0520 10:59:39.515947       1 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00acd4060 Applied:0 MaxSizePerMsg:1048576 MaxCommittedSizePerReady:0 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1fbc410 DisableProposalForwarding:false}
server_1     | I0520 10:59:39.516306       1 node.go:297] Group 1 found 1 entries
server_1     | I0520 10:59:39.516362       1 draft.go:1143] New Node for group: 1
server_1     | I0520 10:59:39.516599       1 node.go:85] 1 became follower at term 0
server_1     | I0520 10:59:39.516768       1 node.go:85] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
server_1     | I0520 10:59:39.516795       1 node.go:85] 1 became follower at term 1
server_1     | I0520 10:59:39.516945       1 groups.go:138] Server is ready
server_1     | I0520 10:59:39.517046       1 groups.go:672] Got address of a Zero leader: zero:5080
server_1     | I0520 10:59:39.517387       1 groups.go:685] Starting a new membership stream receive from zero:5080.
server_1     | I0520 10:59:39.518674       1 groups.go:702] Received first state update from Zero: counter:3 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"server:8080" > > > > zeros:<key:1 value:<id:1 addr:"zero:5080" leader:true > > maxRaftId:1
server_1     | I0520 10:59:39.519249       1 node.go:175] Setting conf state to nodes:1
server_1     | I0520 10:59:39.519650       1 node.go:85] 1 is starting a new election at term 1
server_1     | I0520 10:59:39.519688       1 node.go:85] 1 became pre-candidate at term 1
server_1     | I0520 10:59:39.519710       1 node.go:85] 1 received MsgPreVoteResp from 1 at term 1
server_1     | I0520 10:59:39.519753       1 node.go:85] 1 became candidate at term 2
server_1     | I0520 10:59:39.519782       1 node.go:85] 1 received MsgVoteResp from 1 at term 2
server_1     | I0520 10:59:39.519821       1 node.go:85] 1 became leader at term 2
server_1     | I0520 10:59:39.519893       1 node.go:85] raft.node: 1 elected leader 1 at term 2
server_1     | I0520 10:59:39.524348       1 groups.go:394] Serving tablet for: _predicate_
server_1     | I0520 10:59:39.536104       1 mutation.go:145] Done schema update predicate:"_predicate_" value_type:STRING list:true
server_1     | I0520 10:59:39.543816       1 groups.go:394] Serving tablet for: dgraph.xid
server_1     | I0520 10:59:39.546304       1 index.go:686] Deleting index for attr dgraph.xid and tokenizers []
server_1     | I0520 10:59:39.546347       1 index.go:704] Rebuilding index for attr dgraph.xid and tokenizers [exact]
server_1     | I0520 10:59:39.546456       1 mutation.go:145] Done schema update predicate:"dgraph.xid" value_type:STRING directive:INDEX tokenizer:"exact"
server_1     | I0520 10:59:39.551929       1 groups.go:394] Serving tablet for: dgraph.password
server_1     | I0520 10:59:39.554232       1 mutation.go:145] Done schema update predicate:"dgraph.password" value_type:PASSWORD
server_1     | I0520 10:59:39.559336       1 groups.go:394] Serving tablet for: dgraph.user.group
server_1     | I0520 10:59:39.566390       1 index.go:861] Deleting reverse index for dgraph.user.group
server_1     | I0520 10:59:39.566468       1 index.go:871] Rebuilding reverse index for dgraph.user.group
server_1     | I0520 10:59:39.566729       1 mutation.go:145] Done schema update predicate:"dgraph.user.group" value_type:UID directive:REVERSE
server_1     | I0520 10:59:39.574442       1 groups.go:394] Serving tablet for: dgraph.group.acl
server_1     | I0520 10:59:39.576787       1 mutation.go:145] Done schema update predicate:"dgraph.group.acl" value_type:STRING
server_1     | I0520 10:59:40.517608       1 groups.go:747] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
server_1     | I0520 10:59:40.517654       1 groups.go:756] Got Zero leader: zero:5080
^CGracefully stopping... (press Ctrl+C again to force)

What am I doing wrong? How can I know when the alpha server is ready to take requests?


(Aman Mangal) #2

You can try a retry loop until the error contains the string server is not ready to accept requests. That may work.


(Francesco Magliocca) #3

Oh, and should I be doing this for each database operation or query?
Or after the first successful operation, I can be sure the server will always be ready to accept requests?


(Aman Mangal) #4

I think that’s not required.


(Francesco Magliocca) #5

Thank you very much for your help. I have found out about the /health HTTP endpoint where I can check whether the server is ready to receive requests.
Currently I’m doing:

    log.Println("Polling for health")
    res, err := http.Get("http://server:8080/health")

    // Retry until the server is ready to accept responses
    for err != nil || res.StatusCode != http.StatusOK {
        // Wait 1 second and retry
        time.Sleep(1 * time.Second)
        log.Println("Polling for health")
        res, err = http.Get("http://server:8080/health")
    }

And after I get a 200 as response, all requests succeed.
Is there a way to do this same call using grpc?


(Michel Conrado) #6

This command is wrong. The right port for Alpha “my” flag is 7080.


(Francesco Magliocca) #7

Thanks, this removed the errors regarding connections lost between zero and alpha!
Right now the webserver still needs to check for server health, though.
Is there still something wrong, going on?
And if not, is there a way to check for the server to be ready using the go client api (and not through http requests)?


(Michel Conrado) #8

dunno, are still having issues?

No sure, just checking the Dgo client tho.

You can check the state of your cluste by ${zeroAddr}:6080/state
https://docs.dgraph.io/deploy/#more-about-dgraph-zero


(Francesco Magliocca) #9

Yes, I still get when sending the first request:

rpc error: code = Unknown desc = Please retry again, server is not ready to accept requests

But I suppose this is normal, I just need to check for the state of the server?


(Michel Conrado) #10

After the changes you made, is recommended start from scratch.

are you using Dgraph massively (means loads in sequence)? Dgraph isn’t too busy? What are your stats?


(Francesco Magliocca) #11

I restarted all the containers from scratch (and also removed the volumes used).
I am not using Dgraph massively, I merely start all the containers (including my webapp) and I just wipe the old schemas and create a new schema (just two edges) and load 3 values, it’s a test I am doing to learn about Dgraph. But the thing is that the alpha server takes quite a bit of time before it is ready to receive requests.


(Michel Conrado) #12

That’s not normal at all. If it’s from scratch should be fast to be ready. Something else is happening.


(Francesco Magliocca) #13

@MichelDiz It takes approximately 4 seconds for the alpha server to be ready:

I0601 07:02:19.463544    5596 server.go:117] Setting Badger table load option: mmap
I0601 07:02:19.463559    5596 server.go:129] Setting Badger value log load option: mmap
I0601 07:02:19.463565    5596 server.go:157] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
I0601 07:02:19.482563    5596 node.go:85] All 0 tables opened in 0s
I0601 07:02:19.488398    5596 server.go:117] Setting Badger table load option: mmap
I0601 07:02:19.488414    5596 server.go:129] Setting Badger value log load option: mmap
I0601 07:02:19.488420    5596 server.go:171] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
I0601 07:02:19.505592    5596 node.go:85] All 0 tables opened in 0s
I0601 07:02:19.512442    5596 groups.go:95] Current Raft Id: 0x0
I0601 07:02:19.512702    5596 run.go:387] gRPC server started.  Listening on port 9080
I0601 07:02:19.512741    5596 run.go:388] HTTP server started.  Listening on port 8080
I0601 07:02:19.512744    5596 worker.go:81] Worker listening at address: [::]:7080
I0601 07:02:19.612790    5596 pool.go:138] CONNECTED to localhost:5080
I0601 07:02:23.625295    5596 groups.go:118] Connected to group zero. Assigned group: 1
I0601 07:02:23.625340    5596 groups.go:120] Raft Id after connection to Zero: 0x1
I0601 07:02:23.629359    5596 draft.go:74] Node ID: 0x1 with GroupID: 1
I0601 07:02:23.629526    5596 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc000278300 Applied:0 MaxSizePerMsg:1048576 MaxCommittedSizePerReady:0 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1fbc410 DisableProposalForwarding:false}
I0601 07:02:23.629708    5596 node.go:297] Group 1 found 1 entries
I0601 07:02:23.629734    5596 draft.go:1143] New Node for group: 1
I0601 07:02:23.629836    5596 node.go:85] 1 became follower at term 0
I0601 07:02:23.629912    5596 node.go:85] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I0601 07:02:23.629932    5596 node.go:85] 1 became follower at term 1
I0601 07:02:23.629988    5596 groups.go:138] Server is ready

I can’t find anything not working properly


(Michel Conrado) #14

what is the size of your data?


(Francesco Magliocca) #15

Sorry @MichelDiz for the late reply, I start a fresh dgraph instance, without any data loaded in (I make sure I delete all the p, w, zw directories before a new restart).
The data I want to load is REALLY small, it’s a simple example: two nodes with two edges each (a username (string) and a password edge)