Odd behaviour when running zero and alpha of 1.0.14 on the same localhost with --port-offset


(Pat Quill) #1

Hello!

I’m trying to isolate a possible export, bulk loader bug that I came across yesterday. Thought I’d start up a single host 1.0.14 zero and alpha, create the possibly problematic predicate corp.when: datetime @index(hour), mutate a few times, export and then use the bulk loader. If this raises the bug, it is easily reported and recreated.

@dgraph1:/opt/dgraph/1.0.14# tree
.
├── bin
│   ├── dgraph
│   └── dgraph-ratel
└── data
    ├── postings
    ├── wal
    └── zero_wal

5 directories, 2 files

I start a zero:

sudo -u dgraph /opt/dgraph/1.0.14/bin/dgraph zero --my localhost:5080 --wal /opt/dgraph/1.0.14/data/zero_wal --port_offset 3

and it returns:

I0425 10:42:05.814868   11477 init.go:88]

Dgraph version   : v1.0.14
Commit SHA-1     : 26cb2f94
Commit timestamp : 2019-04-12 13:21:56 -0700
Branch           : HEAD
Go version       : go1.11.5

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.


I0425 10:42:05.815151   11477 run.go:98] Setting up grpc listener at: 0.0.0.0:5083
I0425 10:42:05.815298   11477 run.go:98] Setting up http listener at: 0.0.0.0:6083
badger 2019/04/25 10:42:05 INFO: All 0 tables opened in 0s
I0425 10:42:05.829116   11477 node.go:153] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00015f5f0 Applied:0 MaxSizePerMsg:1048576 MaxCommittedSizePerReady:0 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1fbc410 DisableProposalForwarding:false}
I0425 10:42:05.830061   11477 node.go:297] Group 0 found 1 entries
I0425 10:42:05.830193   11477 node.go:85] 1 became follower at term 0
I0425 10:42:05.830272   11477 node.go:85] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I0425 10:42:05.830300   11477 node.go:85] 1 became follower at term 1
I0425 10:42:05.830519   11477 run.go:284] Running Dgraph Zero...
E0425 10:42:05.830996   11477 raft.go:499] While proposing CID: Not Zero leader. Aborting proposal: cid:"45b10fe8-7adf-4887-ba08-25cbdc4b8e4c" . Retrying...
I0425 10:42:05.831984   11477 node.go:175] Setting conf state to nodes:1
I0425 10:42:05.832054   11477 raft.go:669] Done applying conf change at 0x1
I0425 10:42:06.830924   11477 node.go:85] 1 no leader at term 1; dropping index reading msg
W0425 10:42:08.831043   11477 node.go:636] [0x1] Read index context timed out
I0425 10:42:08.831110   11477 node.go:85] 1 no leader at term 1; dropping index reading msg
E0425 10:42:08.831905   11477 raft.go:499] While proposing CID: Not Zero leader. Aborting proposal: cid:"14f5ab5d-f0ec-4f88-b1fa-a927d66ddceb" . Retrying...
I0425 10:42:09.090760   11477 node.go:85] 1 is starting a new election at term 1
I0425 10:42:09.090928   11477 node.go:85] 1 became pre-candidate at term 1
I0425 10:42:09.090993   11477 node.go:85] 1 received MsgPreVoteResp from 1 at term 1
I0425 10:42:09.091080   11477 node.go:85] 1 became candidate at term 2
I0425 10:42:09.091155   11477 node.go:85] 1 received MsgVoteResp from 1 at term 2
I0425 10:42:09.091243   11477 node.go:85] 1 became leader at term 2
I0425 10:42:09.091303   11477 node.go:85] raft.node: 1 elected leader 1 at term 2
I0425 10:42:09.091392   11477 raft.go:641] I've become the leader, updating leases.
I0425 10:42:09.091451   11477 assign.go:42] Updated Lease id: 1. Txn Ts: 1
W0425 10:42:10.831225   11477 node.go:636] [0x1] Read index context timed out
I0425 10:42:11.833357   11477 raft.go:493] CID set for cluster: 110ef575-6404-4110-8b0d-727ac8ff3936

Then, I start an alpha:

sudo -u dgraph /opt/dgraph/1.0.14/bin/dgraph alpha --zero localhost:5083 --my localhost:9000 --postings /opt/dgraph/1.0.14/data/postings --wal /opt/dgraph/1.0.14/data/wal --port_offset 3 --lru_mb 4096

and it returns:

I0425 10:43:56.715356   11499 init.go:88]

Dgraph version   : v1.0.14
Commit SHA-1     : 26cb2f94
Commit timestamp : 2019-04-12 13:21:56 -0700
Branch           : HEAD
Go version       : go1.11.5

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.


I0425 10:43:56.715655   11499 server.go:117] Setting Badger table load option: mmap
I0425 10:43:56.715681   11499 server.go:129] Setting Badger value log load option: mmap
I0425 10:43:56.715686   11499 server.go:157] Opening write-ahead log BadgerDB with options: {Dir:/opt/dgraph/1.0.14/data/w ValueDir:/opt/dgraph/1.0.14/data/w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
I0425 10:43:56.721775   11499 node.go:85] All 0 tables opened in 0s
I0425 10:43:56.723656   11499 server.go:117] Setting Badger table load option: mmap
I0425 10:43:56.723680   11499 server.go:129] Setting Badger value log load option: mmap
I0425 10:43:56.723689   11499 server.go:171] Opening postings BadgerDB with options: {Dir:/opt/dgraph/1.0.14/data/p ValueDir:/opt/dgraph/1.0.14/data/p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
I0425 10:43:56.727910   11499 node.go:85] All 0 tables opened in 0s
I0425 10:43:56.729422   11499 groups.go:95] Current Raft Id: 0x0
I0425 10:43:56.729605   11499 worker.go:81] Worker listening at address: [::]:7083
I0425 10:43:56.729652   11499 run.go:387] gRPC server started.  Listening on port 9083
I0425 10:43:56.729664   11499 run.go:388] HTTP server started.  Listening on port 8083
I0425 10:43:56.829820   11499 pool.go:138] CONNECTED to localhost:5083
I0425 10:43:56.833140   11499 pool.go:138] CONNECTED to localhost:5080
W0425 10:43:56.833609   11499 pool.go:226] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"

After the alpha starts the zero produces the following pair of messages, over and over again until it is terminated:

I0425 10:44:16.768298   11477 zero.go:396] Got connection request: cluster_info_only:true
I0425 10:44:16.768641   11477 zero.go:414] Connected: cluster_info_only:true

If I try to alter the database:

curl -X POST localhost:8083/alter -d 'odd.when: datetime @index(hour)'

I get:

{"errors":[{"code":"Error","message":"Please retry again, server is not ready to accept requests"}]

@dgraph1:/opt/dgraph/1.0.14# tree
.
├── bin
│   ├── dgraph
│   └── dgraph-ratel
└── data
    ├── postings
    │   ├── 000000.vlog
    │   └── MANIFEST
    ├── wal
    │   ├── 000000.vlog
    │   └── MANIFEST
    └── zero_wal
        ├── 000000.vlog
        ├── 000002.sst
        └── MANIFEST

Some questions:

  1. Is there something wrong with the invocations?
  2. Why is alpha trying to connect to localhost:5080 when it was started with --zero localhost:5083
  3. Why is zero producing the same two messages, ceaselessly?

Thanks!


(Michel Conrado) #2

The flag --my localhost:5083 need change as you add offset. “my” flag in this case is about Zero.

You did right here to --zero but missed with --my localhost:9000 in this case is “:8083”


(Pat Quill) #3

OK, just tried,

sudo -u dgraph /opt/dgraph/1.0.14/bin/dgraph zero --my localhost:5083 --wal /opt/dgraph/1.0.14/data/zero_wal --port_offset 3

and

sudo -u dgraph /opt/dgraph/1.0.14/bin/dgraph alpha --zero localhost:5083 --my localhost:8083 --postings /opt/dgraph/1.0.14/data/postings --wal /opt/dgraph/1.0.14/data/wal --port_offset 3 --lru_mb 4096

Still encountering odd behaviour.

Output from alpha:

I0425 19:30:19.538117   12592 server.go:117] Setting Badger table load option: mmap
I0425 19:30:19.538140   12592 server.go:129] Setting Badger value log load option: mmap
I0425 19:30:19.538146   12592 server.go:157] Opening write-ahead log BadgerDB with options: {Dir:/opt/dgraph/1.0.14/data/wal ValueDir:/opt/dgraph/1.0.14/data/wal SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
I0425 19:30:19.542117   12592 node.go:85] All 0 tables opened in 0s
I0425 19:30:19.543002   12592 node.go:85] Replaying file id: 0 at offset: 0
I0425 19:30:19.543043   12592 node.go:85] Replay took: 17.892µs
I0425 19:30:19.543166   12592 server.go:117] Setting Badger table load option: mmap
I0425 19:30:19.543186   12592 server.go:129] Setting Badger value log load option: mmap
I0425 19:30:19.543194   12592 server.go:171] Opening postings BadgerDB with options: {Dir:/opt/dgraph/1.0.14/data/postings ValueDir:/opt/dgraph/1.0.14/data/postings SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x1fbc410}
I0425 19:30:19.545975   12592 node.go:85] All 0 tables opened in 0s
I0425 19:30:19.546714   12592 node.go:85] Replaying file id: 0 at offset: 0
I0425 19:30:19.546745   12592 node.go:85] Replay took: 12.672µs
I0425 19:30:19.547081   12592 run.go:387] gRPC server started.  Listening on port 9083
I0425 19:30:19.547101   12592 run.go:388] HTTP server started.  Listening on port 8083
I0425 19:30:19.547133   12592 groups.go:95] Current Raft Id: 0x0
I0425 19:30:19.547195   12592 worker.go:81] Worker listening at address: [::]:7083
I0425 19:30:19.647402   12592 pool.go:138] CONNECTED to localhost:5083
I0425 19:30:19.649987   12592 pool.go:138] CONNECTED to localhost:5080
W0425 19:30:19.650399   12592 pool.go:226] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"

Still trying to connect to localhost:5080

Output from zero is the same ceaseless repetition of 2 messages:

I0425 19:32:02.220146   12572 zero.go:414] Connected: cluster_info_only:true
I0425 19:32:02.320940   12572 zero.go:396] Got connection request: cluster_info_only:true

Docs say,

Port Offset To make it easier for user to setup the cluster, Dgraph defaults the ports used by Dgraph nodes and let user to provide an offset (through command option --port_offset ) to define actual ports used by the node. Offset can also be used when starting multiple zero nodes in a HA setup.

dgraph alpha help and dgraph zero help are less clear re the purpose of the --port-offset and --my flags and the interaction between the flags.

Thanks!


(Michel Conrado) #4

Have you started from scratch? just in case.


(Pat Quill) #5

(Updated first response with more info.)

Yes, I started from scratch.

Oh, by scratch you mean deleting the contents of z, p and w :slight_smile:
Yeah, that did the trick.

Thank you very much.


(Michel Conrado) #6

Yep, glad it worked! You need to clear the Paths because they keep past persistent settings.