Transaction is too old Error


(Steven Ayers) #1

Under load testing, I have noticed I get “Transaction is too old” as an error. It looks to be a GRPC error.

The first two lines below are two of many, I just pasted the last two. Under even higher load my dgraph crashed from running out of memory.

I’m running a single host dgraph server with an rd5.large AWS instance over a single grpc connection.

Does anyone know exactly why this happens?

E0417 22:12:32.844160    3550 draft.go:391] Applying proposal. Error: Transaction is too old. Proposal: "mutations:<group_id:1 start_ts:2904174 edges:<entity:88622 attr:\"links\" value_id:86341 > edges:<entity:88622 attr:\"_predicate_\" value:\"links\" > > key:\"01-8954377150175161614\" index:720584 ".
E0417 22:12:32.844526    3550 draft.go:391] Applying proposal. Error: Transaction is too old. Proposal: "mutations:<group_id:1 start_ts:2887269 edges:<entity:88309 attr:\"links\" value_id:86338 > edges:<entity:88309 attr:\"_predicate_\" value:\"links\" > > key:\"01-4900552890111969553\" index:720585 ".
I0417 22:12:33.128614    3550 node.go:85] Rolling up Created batch of size: 1.7 MB in 24.643143ms.
I0417 22:12:33.129058    3550 node.go:85] Rolling up Sent 199947 keys
I0417 22:12:33.138098    3550 draft.go:838] Rolled up 199947 keys. Done
I0417 22:12:33.138119    3550 draft.go:353] List rollup at Ts 2911602: OK.
I0417 22:12:34.154705    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:12:34.155571    3550 draft.go:931] Found 7 old transactions. Acting to abort them.
I0417 22:12:34.155670    3550 draft.go:934] abortOldTransactions for 7 txns. Error: No connection exists
I0417 22:13:04.153890    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:13:04.154613    3550 draft.go:931] Found 60 old transactions. Acting to abort them.
I0417 22:13:04.154633    3550 draft.go:934] abortOldTransactions for 60 txns. Error: No connection exists
I0417 22:13:34.155942    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:13:34.156661    3550 draft.go:931] Found 420 old transactions. Acting to abort them.
I0417 22:13:34.156684    3550 draft.go:934] abortOldTransactions for 420 txns. Error: No connection exists
I0417 22:14:05.393809    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:14:05.395582    3550 draft.go:931] Found 2723 old transactions. Acting to abort them.
I0417 22:14:05.395608    3550 draft.go:934] abortOldTransactions for 2723 txns. Error: No connection exists
I0417 22:14:34.156695    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:14:34.157500    3550 draft.go:931] Found 2764 old transactions. Acting to abort them.
I0417 22:14:34.157523    3550 draft.go:934] abortOldTransactions for 2764 txns. Error: No connection exists
I0417 22:15:04.244868    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:15:04.245938    3550 draft.go:931] Found 2764 old transactions. Acting to abort them.
I0417 22:15:04.245962    3550 draft.go:934] abortOldTransactions for 2764 txns. Error: No connection exists
I0417 22:15:34.160981    3550 draft.go:1042] Skipping snapshot at index: 697222. Insufficient discard entries: 0. MinPendingStartTs: 2824568
I0417 22:15:34.161753    3550 draft.go:931] Found 2764 old transactions. Acting to abort them.
I0417 22:15:34.161777    3550 draft.go:934] abortOldTransactions for 2764 txns. Error: No connection exists

(Michel Conrado) #2

“Found 2764 old transactions. Acting to abort them.” That’s normal.

This is Alpha logs? can you share the yaml or commands used to create the cluster?

“Error: No connection exists” - I wonder connection with what? zero is dead on your cluster? you said you have a “single server” - this means a single Dgraph Alpha running?

Also share stats, configs, which version you are.

Cheers.


(Steven Ayers) #3

It runs fine for several minutes, then this happens every time, so I don’t think the zero is dead. Yes, a single Alpha server. I also tried adding 3 connections to the same alpha on the client side, and it just sped up the process of this issue.

Installation process:

curl https://get.dgraph.io -sSf | bash

dgraph zero --my=<internal_ip>:5080 (in one tab)

dgraph alpha --lru_mb=10240 --my=<internal_ip>:7080 --zero=localhost:5080 (another tab)

dgraph-ratel -port 8008 (another tab)

Version:

$ dgraph version

Dgraph version   : v1.0.14
Commit SHA-1     : 26cb2f94
Commit timestamp : 2019-04-12 13:21:56 -0700
Branch           : HEAD
Go version       : go1.11.5

Once I recreated the AWS instance, I stopped getting this error. I recreated on the same instance size, rd5.large and rd5.xlarge (4 cores, 32GB Ram, 500GB gp2 SSD, 1500 IOPS).

Before every test, I’m using:

func (store *DbStore) DeleteAll() (err error) {
	err = store.Alter(context.Background(), &api.Operation{DropAll: true})
	return
}

Perhaps if you run this too many times, it does something to the database?

It’s also possible that from my tests before, running my code when I hadn’t made sure it was completely safe, it did something to the system. I think it may have been memory related, and caused issues with the Zero server?


(Michel Conrado) #4

Why not “localhost”? On zero you’re using <internal_ip>:5080 but in the Alpha you’re using --zero=localhost:5080 This difference can create issues. Set all as localhost.

I don’t think so. Never saw it before (neither me or other users), but if it keeps happening, you should fill up an issue on github to the team investigate it.

You have enough memory for normal usage. How big is your dataset?

I’ve only mentioned Zero because I saw a “No connection exists” and you said you had only one Dgraph instance. So I assume it could be Zero. Zero don’t have any issues with memory, it doesn’t has a lot of tasks to do. You’d never see Zero above 1GB of memory. He is just a cluster balancer.

If you are doing a test-only server, I would recommend instead of using Drop_All that you do “rm -rf ./p ./w ./wz” delete the folders and start from scratch like this.


(Steven Ayers) #5

Why not “localhost”? On zero you’re using <internal_ip>:5080 but in the Alpha you’re using --zero=localhost:5080 This difference can create issues. Set all as localhost.

Ah I see. The documentation reads --my=IPADDR:5080, which reads as “my IPADDR” which tells me “set this as your IP”. It makes me think you need to use the internal ip and not loopback for whatever reason. Under what circumstances would this need to be anything else?

You have enough memory for normal usage. How big is your dataset?

Tiny, in tests it’s always under a 1GB. But it’s writing that 1GB in a matter of minutes, with one transaction per node (will iterate on this and see if it can be batched, but it’s hard with the nature of the data).

If you are doing a test-only server, I would recommend instead of using Drop_All that you do “rm -rf ./p ./w ./wz” delete the folders and start from scratch like this.

Could be worth putting this in the docs near where it says to start from scratch, use Drop_All


(Michel Conrado) #6

Dgraph is generally used in containers (you’re creating a bare metal). IPADDR is usually the address of the container or what an internal DNS (docker) will locate. If you use different but homonymous configurations, problems can happen. But not always. However it is important to keep the configuration across your cluster equally.

Well, this dataset is very small. Nothing abnormal should be happening here.

My tip is for exceptional cases only.


(Steven Ayers) #7

Cool, thanks for the info :+1: