Unable to write to value log file: file too large


#1
E0528 11:42:54.996051   30623 writer.go:51] TxnWriter got error during callback: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                    
E0528 11:42:54.997983   30623 node.go:81] writeRequests: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                                            
E0528 11:42:54.998039   30623 writer.go:51] TxnWriter got error during callback: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                    
E0528 11:42:55.003109   30623 node.go:81] writeRequests: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                                            
E0528 11:42:55.003228   30623 writer.go:51] TxnWriter got error during callback: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                    
E0528 11:42:55.004311   30623 node.go:81] writeRequests: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                                            
E0528 11:42:55.004431   30623 writer.go:51] TxnWriter got error during callback: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                    
E0528 11:42:55.006598   30623 node.go:81] writeRequests: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                                            
E0528 11:42:55.006633   30623 writer.go:51] TxnWriter got error during callback: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large                                    
E0528 11:42:55.014127   30623 draft.go:391] Applying proposal. Error: Error while flushing to disk: Unable to write to value log file: "p/003577.vlog": write p/003577.vlog: file too large. Proposal: "key:
\"01-2454504743433075034\" delta:<txns:<start_ts:15435186 commit_ts:15435208 > txns:<start_ts:15435187 commit_ts:15435209 > txns:<start_ts:15435190 commit_ts:15435211 > txns:<start_ts:15435204 commit_ts:1
5435221 > txns:<start_ts:15435202 commit_ts:15435222 > txns:<start_ts:15435207 commit_ts:15435223 > txns:<start_ts:15435205 commit_ts:15435225 > max_assigned:15435227 group_checksums:<key:1 value:52145169
08677245399 > > index:13261097 ".                                                                                                                                                                           
fatal error: runtime: out of memory

Alpha is crashing with this error.

Dgraph version: 1.0.14
Machine Config: 16 GB RAM, Swap memory enabled. I am using this command to start alpha
dgraph alpha --lru_mb 2048 --zero localhost:5080 --badger.vlog=disk

@MichelDiz please help.


(Sascha Andres) #2

@quillen could it be that the partition where the log file should be written to is full?


#3

No, there is 50% more space available.


(Michel Conrado) #4

humm, that’s odd. What OS, Storage and File System are you using?


#5

Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.9G 0 7.9G 0% /dev
tmpfs tmpfs 1.6G 6.0M 1.6G 1% /run
/dev/sda ext4 315G 138G 162G 46% /
tmpfs tmpfs 7.9G 0 7.9G 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000

I am using Ubuntu 18.10.


(Michel Conrado) #6

Definitely isn’t the FS, ext4 can handle TeraBytes of data in 4k blocks.
Do you have some kind of SWAP on?

Also, what you’re doing to trigger that error? step by step. I think this should be an issue to be investigate.


#7

Yes, SWAP is enabled.

Filename Type Size Used Priority
/dev/sdb partition 524284 223716 -2

We only have a single node setup (which is of 16GB RAM and 320GB Disk) and we are writing data into it using 500 dgraph connections at a time. Earlier, the alpha was getting killed by OOM error very frequently. So, I reduced the lru_mb to 2048 and used --badger param. Even then alpha got killed by OOM. After that, I tried to restart the alpha and encountered this error.


(Michel Conrado) #8

This is way better in master (I don’t recommend use master for now), probably the work on it will be released in v1.1

lru_mb today no matters that much. Is just a cache option. And for now is disabled by default until we finish our own cache system.

You didn’t said what type of storage you have. So you’re using --badger.vlog=disk - This param for HDD isn’t that good. Cuz HDD has low IOPS.

Okay, disable it and test again. Also decrease the load a bit, give it more air to breath. Or add more machines (real machines) each one with a Node and create a load balancing, eg: create 3 Alpha Nodes and give each of them 166.66 loads (of 500 connections that you mentioned).


(Michel Conrado) #9

BTW, I see #3349 is in latest RC v1.0.15-rc9 You can try with that one.

Link to binaries: Linux , Darwin , Windows .

I think this binaries has v1.0.15-rc7 but #3349 is there since RC2.


#10

Thank you @MichelDiz. I’ll try out these.