Error caused by switching Leader in live loading

Hi all,
I would like to know if following Error really causes any data loss:
“draft.go:467] Lastcommit 10591 > current 10575. This would cause some commits to be lost.”

I found above error in Dgraph Alpha’s log, while live-loading “A bigger dataset” running 3-node Dgraph cluster built on GCP.
https://tour.dgraph.io/moredata/1/

I would be glad if you could help me.

which version are you using?

The version is 1.0.10.

Followings are commands to run each process.

  • Zero @ node1
    dgraph zero --idx=1 --replicas=3 --my=10.146.0.2:5080 --bindall
  • Alpha @ node1
    dgraph alpha --idx=1001 --my=10.146.0.2:7080 --lru_mb=3072 --badger.vlog=disk
  • Others are similar to node1.

To reproduce, I needed to do following operations several times:

curl -X POST http://127.0.0.1:8080/alter -d'{"drop_all": true}'
curl -X POST http://127.0.0.1:8080/alter -d'
director.film: uid @reverse .
genre: uid @reverse .
initial_release_date: dateTime @index(year) .
name: string @index(term) @lang .
'
dgraph live -r dgraph/1million.rdf.gz --zero 10.146.0.2:5080 -c 1 -b 2000

Thank you

What you mean with “switching Leader in live loading”?

How many Alphas instances do you have? why “-idx=1001”? do you have more then 1001 Alphas?

Please don’t use “-b 2000” as you’re using “–badger.vlog=disk” I think you may have hdd storages, so do you have less performance. This may cause issues increasing the value of batch. Let it default or try to use SSDs or NVMe.

Can you share your specs?

Thank you for the reply.

What you mean with “switching Leader in live loading”?

I found the error after new leader had elected while live loading.
I think this is triggered by high load condition.
Here is the log around the error but I modified little bit, combined and add node name each line.

How many Alphas instances do you have? why “-idx=1001”? do you have more then 1001 Alphas?

I have 3 Alphas.

  • node1: Zero idx:1, Alpha idx:1001
  • node2: Zero idx:2, Alpha idx:1002
  • node3: Zero idx:3, Alpha idx:1003

Please don’t use “-b 2000” as you’re using “–badger.vlog=disk” I think you may have hdd storages, so do you have less performance. This may cause issues increasing the value of batch. Let it default or try to use SSDs or NVMe.

The reason why I used -b 2000 is to know how Dgraph behave in high load situation.
However I will use --badger.vlog=mmap and don’t use -b 2000 in normal operation.
I believed --badger.vlog=disk gives me more safety because vlog is WAL and it must be flushed to storages in RDBMS like PostgreSQL.

Can you share your specs?
on GCP:
n1-standard-2 (vCPU x 2, RAM 7.5 GB), Standard disk (it should be HDD, not SSD)

1 Like

In my opinion (This is a personal comment) - if you are going to use HDD, you would necessarily need to increase the amount of memory and consequently the lru_mb cache. HDDs are very slow, the fastest of them with 15k RPM has 400 IOPS - And the most basic SSD has 5K IOPS and an NVMe has around 120K IOPS, up to 10 million IOPS read. In theory a DDR4 RAM can give you 1.7 million IOPS write. SSD, NVMe and RAM have in common low latency and fast access.

Realize? more memory resolves physical storage bottleneck problems.

When we are talking about Dgraph, this is a DB designed to use the maximum of SSDs or NVMe. If you use HDD you have to compensate for this. And compensate a lot because in this hypothesis you are tripling the work of the Dgraph. With less memory and greater work you will have problems as with any DB.

Even PG gets better with SSD see the chart.

1 Like

On testing the Dgraph load. I’d think you’d better create a test with clients. Like this guy

This is the best way to test Dgraph. Live Load needs some adjustments to keep up with some of the changes in Dgraph in recent times, so I do not recommend using it for that purpose or increasing its default values.