Unable to reach leader in group 1 - dir structures help

I’m trying to understand the dgraph directory structure.

What is the dgraph “T” directory?

Thanks,
Ryan

Hi Ryan, t is short for test or testing. It contains the source of t.go which, when compiled, is the driver for running integration tests. There should be a readme in there as well.

I think he may be mentioning the t directory that Dgraph creates when you create a cluster.
I mentioned that a few times to rename the t.go - That’s too short and not human familiar.
If it would be “testing/test.go” it would avoid any question and obvious to anyone.

About the t directory in Dgraph cluster. It is a temporary dir. It is used for internal things.

Background… We had a dead Alpha… it was sharing a base directory structure of version1:

/var/data1/dgraph/version1/t
/var/data1/dgraph/version1/w1
/var/data1/dgraph/version1/zw
/var/data1/dgraph/version1/w
/var/data1/dgraph/version1/p
/var/data1/dgraph/version1/p1

We configured the two alpha’s to use different p and w directories, we hadn’t realized a t directory was there too. We were prepared to wipe the p and q directory, but were surprised by the t and wanted to make sure it didn’t have anything to due with keeping state.

Ultimately, we swapped the two dgraph’s running on that server to not have a shared base directory any longer.

What are the temporary buffers? Do they have anything to do with ingest? Our concern was that we had data buffered up for ingest.

Directory to store temporary buffers. (default "t")

https://dgraph.io/docs/deploy/cli-command-reference/

It contains temporary indexing and other Ristretto stuff. So it is related to caching. I think the bulk and liveloader uses it too but for other reasons.

You may not delete that tho. I think my statement to delete this was wrong.

We’re watching an alpha re-create itself …

log.go.34] Table created: 24 at level: 7 for strea: 3. Size: 123 MiB
admin.go:857] namespace: 0. Error reading GraphQL schema: Please retry again, server is not ready to accept requests.

Why would it not be able to read the GraphQL schema?

We have 4 Groups. Group 1 does not have a leader.
We have 3 Zero’s. 1 Zero is dead, but we have a leader.

Thanks,
Ryan

Eventually it will.

Which server is it trying to read it from?

This is a warning that the cluster is not ready. It is not about the GraphQL Schema per se. Probably the GraphQL schema is in the group 1 of the Alphas.

Dgraph doesn’t read from a single Alpha. It read from a Group. The Alpha is just a medium. The query will find the location of the resource internally in a distributed way.

The way to know where the GQL Schema is, is checking in Ratel where the predicate dgraph.graphql.schema is.

Open the tab Cluster Management in Ratel. And look for it.

Ah ha!

On Group #1, I see:

dgraph.ddrop.op
dgraph.graphql.p_query
dgraph.graphql.schema
dgraph.graphql.xid

Albeit, they are size 0.0B

Group #1 has 2 dead Alphas. The Alpha that is left did not elect itself leader.

The size in Ratel isn’t so precise tho.

Is there a way to manually set the leader?

In Group #1, we had 2 Alpha’s fail, out of 3. The remaining “good” Alpha never elected itself the leader. We’ve brought the other 2 Alpha’s back-up with empty p, w, and t directories.

The message we see on the two Alpha’s that are “new” are:

Error while calling hasPeer: Unable to reach leader in group 1.  Retrying...

The message we see on the Alpha that’s been up the whole time, but not the leader is:

is starting a new election at term 833
became pre-candidate at term 833
[logterm: 833, index 93495359] sent MsgPreVote request to 1 at term 833
[logterm: 833, index 93495359] sent MsgPreVote request to 2 at term 833
Unable to send message to peer: 0x1.  Error: Do not have address of peer 0x1
Unable to send message to peer: 0x2.  Error: Do not have address of peer 0x2

No. You need to solve the issue with those unhealthy Alphas.

The RAFT has a address recorded in it. If you change somehow the address in the Alpha. It will think it is a new node. Also the RAFT ID has to be the same.

Try to start from scratch.

For the zero’s, we have:

--raft="idx=1"

For the Alpha’s we don’t set the idx, what we do set is:

--raft="snapshot-after-entries=300000; snapshot-after-duration=0; group=1"

Should we be manually setting the RAFT idx for the Alphas?

Thanks

The Alpha has RAFT configs

 --raft string                Raft options
  group=; Provides an optional Raft Group ID that this Alpha would indicate to Zero to join.
  idx=; Provides an optional Raft ID that this Alpha would use to join Raft groups.
  learner=false; Make this Alpha a "learner" node. In learner mode, this Alpha will not participate in Raft elections. This can be used to achieve a read-only replica.
  pending-proposals=256; Number of pending mutation proposals. Useful for rate limiting.
  snapshot-after-duration=30m; Frequency at which we should create a new raft snapshots. Set to 0 to disable duration based snapshot.
  snapshot-after-entries=10000; Create a new Raft snapshot after N number of Raft entries. The lower this number, the more frequent snapshot creation will be. Snapshots are created only if both snapshot-after-duration and snapshot-after-entries threshold are crossed.
 
(default "learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group=;")

That is not a good practice, but you can do. It will give you more work to do.

I’d check the IDs from the logs and keep an eye on them. If they changed for some reason.

BTW, you can not re use a RAFT ID from a dead Alpha/node.

We had previously called a

  • removeNode?id=1&group=1
  • removeNode?id=2&group=1

in hopes the functioning 3rd node would elect itself leader. This didn’t work. There was no leader elected.

With this suggestion, we pointed the two new Alphas to the old corrupted w, p, and t directories. They started up and failed with a duplicate raft id issue.

We resolved the duplicate raft id issue by setting the --idx flag to unique ids. Now they started up, but failed with a DivertedEdge error (same error from the very begining).

We then cleared-out the w write-ahead log folder. Now the new alphas start, but won’t elect a leader saying:

Error while calling hasPeer: Unable to reach leader in group 1.  Retrying...

After having called removeNode, is there a way to undo that?

I’ve removed the w directory on all 3 failining Alpha’s… They all still have their full p posting directory…

All 3 now say:

Error while calling hasPeer: Unable to reach leader in group 1.  Retrying...

How do you force a vote or declare one as the leader?

I stood up a brand new group - Group 5 - completely fresh folders. It has the same error:

Error while calling hasPeer: Unable to reach leader in group 1.  Retrying...

We’ve got this back to an initial error…

oracle.go:215] ProcessDelta Committed: 185959349 -> 185959350
log.go:30] writeRequests called. Writing to value log
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
...
Sending fatal event
...
2023/04/04 00:48:00 proto: DirectedEdge: illegal tag 0 (wire type 0)

We believe this is coming from the w dir (aka write-ahead-log), is there a way to skip bad/corrupted data?

No.

Not possible. That’s part of the RAFT algo. As far as I know we can’t control that.

Try this.

Badger

BEFORE ANY ATTEMPT. PLEASE BACKUP YOUR FILES.

Download Badger binary and put it in /usr/local/bin/badger

Commands to use

badger info --dir ./p

This will check the integrity of the data.

Check the part => Abnormalities

Abnormalities:
2 extra files.
0 missing files.
0 empty files.
0 truncated manifests.

What matter is missing and truncated. This will check for corrupted files.
If there is, only luck you can count on.

You can also flat your db before streaming it. This can help with streaming. (OPTIONAL)

badger flatten --dir ./p

Stream your data to a new DB

mkdir p_backup
mv p p_backup

badger stream --dir ./p_backup --out ./p

This will copy your data to a new place.

Now you can delete p(with mv it was deleted already), w, zw and t directories.

rm -fr t w zw

  1. After that start a Zero group and Alpha. This step above you should do for each shard/group.
  2. Stop them all. (otherwise all nodes will be “nodeCount”: 0)
  3. Start the cluster again.
    You should have all nodes intact.

Why would I do this?

With a completely new DB, it avoids file conflicts, configs, etc. It’s like starting from scratch, but with the same data as before.