Database becomes unreliable after inserts

I am consistently having problems with a v1.0.8 database becoming unresponsive to inserts. It consistently responses with “Unable to find log file. Please retry” after a certain amount of inserts.

One guess is that it has to do with predicate moves. It seems to have become unresponsive after this move as indicate with the log file:

dgraph-server-0:

2018/09/14 20:47:19 predicate_move.go:250: Move predicate request for pred: [at], src: [1], dst: [2]
2018/09/14 20:47:19 predicate_move.go:260: Trying to abort pending mutations. Loop: 0
2018/09/14 20:47:19 draft.go:749: TryAbort 1 txns with start ts. Error: <nil>
2018/09/14 20:47:19 mutation.go:592: tryAbortTransactions for 1 txns. Error: <nil>
2018/09/14 20:47:19 predicate_move.go:260: Trying to abort pending mutations. Loop: 1
2018/09/14 20:47:20 stream_lists.go:231: Sending predicate: [at] Time elapsed: 01s, bytes sent: 0 B, speed: 0 B/sec
2018/09/14 20:47:21 stream_lists.go:231: Sending predicate: [at] Time elapsed: 02s, bytes sent: 0 B, speed: 0 B/sec
2018/09/14 20:47:22 stream_lists.go:213: Sent batch of size: 3.6 MB in 104.956259ms.
2018/09/14 20:47:22 stream_lists.go:231: Sending predicate: [at] Time elapsed: 03s, bytes sent: 3.6 MB, speed: 1.2 MB/sec
2018/09/14 20:47:22 stream_lists.go:213: Sent batch of size: 3.6 MB in 29.212523ms.
2018/09/14 20:47:22 stream_lists.go:213: Sent batch of size: 3.6 MB in 17.296626ms.
2018/09/14 20:47:23 stream_lists.go:231: Sending predicate: [at] Time elapsed: 04s, bytes sent: 11 MB, speed: 2.7 MB/sec
2018/09/14 20:47:24 stream_lists.go:231: Sending predicate: [at] Time elapsed: 05s, bytes sent: 11 MB, speed: 2.1 MB/sec
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.5 MB in 14.266439ms.
2018/09/14 20:47:25 stream_lists.go:231: Sending predicate: [at] Time elapsed: 06s, bytes sent: 14 MB, speed: 2.4 MB/sec
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.6 MB in 50.220192ms.
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 7.1 MB in 146.593537ms.
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.6 MB in 10.86529ms.
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.6 MB in 32.121575ms.
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.8 MB in 27.184004ms.
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.6 MB in 11.736636ms.
2018/09/14 20:47:25 stream_lists.go:213: Sent batch of size: 3.6 MB in 11.039617ms.
2018/09/14 20:47:26 stream_lists.go:213: Sent batch of size: 3.6 MB in 21.895807ms.
2018/09/14 20:47:26 stream_lists.go:213: Sent batch of size: 3.6 MB in 32.556499ms.
2018/09/14 20:47:26 stream_lists.go:231: Sending predicate: [at] Time elapsed: 07s, bytes sent: 50 MB, speed: 7.2 MB/sec
2018/09/14 20:47:27 stream_lists.go:231: Sending predicate: [at] Time elapsed: 08s, bytes sent: 50 MB, speed: 6.3 MB/sec
2018/09/14 20:47:28 stream_lists.go:231: Sending predicate: [at] Time elapsed: 09s, bytes sent: 50 MB, speed: 5.6 MB/sec
2018/09/14 20:47:28 stream_lists.go:213: Sent batch of size: 3.6 MB in 16.959125ms.
2018/09/14 20:47:28 stream_lists.go:213: Sent batch of size: 3.6 MB in 16.027798ms.
2018/09/14 20:47:29 stream_lists.go:213: Sent batch of size: 3.6 MB in 921.687162ms.
2018/09/14 20:47:29 stream_lists.go:231: Sending predicate: [at] Time elapsed: 10s, bytes sent: 61 MB, speed: 6.1 MB/sec
2018/09/14 20:47:29 stream_lists.go:213: Sent batch of size: 14 MB in 244.486384ms.
2018/09/14 20:47:29 stream_lists.go:246: Sending predicate: [at] Sent 885911 (+1 maybe for schema) keys

dgraph-server-1:

2018/09/14 20:47:19 predicate_move.go:143: Receiving predicate. Batching and proposing key values
2018/09/14 20:47:22 predicate_move.go:163: Predicate being received: at
2018/09/14 20:47:22 index.go:927: Dropping predicate: [at]
2018/09/14 20:47:23 schema.go:63: Deleting schema for predicate: [at]
2018/09/14 20:47:26 predicate_move.go:41: Writing 387180 keys
2018/09/14 20:47:31 predicate_move.go:41: Writing 387180 keys
2018/09/14 20:47:34 predicate_move.go:41: Writing 498732 keys
2018/09/14 20:47:37 schema.go:252: Setting schema for attr at: datetime, tokenizer: [hour], directive: INDEX, count: false
2018/09/14 20:47:37 predicate_move.go:228: Proposed 885912 keys. Error: <nil>

dgraph-server-2 seems to not be involved.

dgraph-zero-0:

2018/09/14 20:47:19 tablet.go:179: 

Groups sorted by size: [{gid:2 size:171439882} {gid:3 size:392882199} {gid:1 size:647917206}]

2018/09/14 20:47:19 tablet.go:184: size_diff 476477324
2018/09/14 20:47:19 tablet.go:73: Going to move predicate: [at], size: [76 MB] from group 1 to 2
2018/09/14 20:47:19 oracle.go:91: Purged below ts:3798, len(o.commits):0, len(o.rowCommit):12000
2018/09/14 20:47:37 tablet.go:109: Predicate move done for: [at] from group 1 to 2
2018/09/14 20:47:49 oracle.go:91: Purged below ts:3806, len(o.commits):0, len(o.rowCommit):12000

And here’s another fun log file from zero:

2018/09/14 20:55:19 tablet.go:73: Going to move predicate: [receiver_balance], size: [92 MB] from group 1 to 2
2018/09/14 20:55:27 tablet.go:221: Got error during move: While calling MovePredicate: rpc error: code = Unknown desc = Unable to find log file. Please retry
2018/09/14 20:55:27 tablet.go:60: Error while trying to move predicate receiver_balance from 1 to 2: While calling MovePredicate: rpc error: code = Unknown desc = Unable to find log file. Please retry

I have consistently retried this set of inserts with the same results. It seems to be trying to move a predicate … then failing to do so… and nothing will insert past a certain point.

…and now it is doing this:

2018/09/14 21:43:31 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 12s, bytes sent: 95 MB, speed: 8.0 MB/sec
2018/09/14 21:43:32 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 13s, bytes sent: 95 MB, speed: 7.3 MB/sec
2018/09/14 21:43:33 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 14s, bytes sent: 95 MB, speed: 6.8 MB/sec
...
2018/09/14 21:49:02 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 05m43s, bytes sent: 95 MB, speed: 278 kB/sec
2018/09/14 21:49:03 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 05m44s, bytes sent: 95 MB, speed: 277 kB/sec
2018/09/14 21:49:04 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 05m45s, bytes sent: 95 MB, speed: 277 kB/sec
2018/09/14 21:49:05 stream_lists.go:231: Sending predicate: [receiver_balance] Time elapsed: 05m46s, bytes sent: 95 MB, speed: 276 kB/sec

given the decreasing kB/sec and increasing time, I presume something is stuck.

just curious, can you try running servers with something other than --badger.tables=mmap ? and see what happens. im not looking for performance but if things also lockup.

It’s hard to know what the problem is here, but if predicate move is a problem, you can turn off the predicate moves by setting --rebalance_interval in Zero to something arbitrarily large, like 10000h.

Also, you could run just one group, by setting --replication=3, and running 3 Alphas.

I also encountered this problem, have you solved it?

Unfortunately, I haven’t so I have had to abandon plans to use Dgraph at the moment. There is just little information available in the logs to provide more information. I am doing simple inserts of a reasonable amount of data and I expect things to “just work” at this scale.

I like DGraph … hope this gets sorted soon.

1 Like