Dgraph crashed during live loading using dgraph live and unable to start the db

Dgraph version : v1.0.11
Commit SHA-1 : b2a09c5b
Commit timestamp : 2018-12-17 09:50:56 -0800
Branch : HEAD
Go version : go1.11.1

We have RDFs near to 500 million and trying to index them to dgraph using dgraph live.
The server crashed few time in between and I had to restart the server and continue indexing. For this purpose I am using -x switch to store the xid to uid mappings too.

Now I see a new error in the logs continuously:

dgraph[3971]: E0220 07:02:39.913535 3971 storage.go:540] i=940703<first=940704, ErrSnapOutOfDate
dgraph[3971]: W0220 07:02:39.913541 3971 draft.go:333] Error while calling CreateSnapshot: requested index is older than the existing snapshot. Retrying…

And the db is not loading at all. Not able to query the already indexed nodes as well.

Looks like the indexes have crashed. Is there a way to rollback to a previous safe state and continue the indexing.

Any help would be appreciated.

Thanks,
Eshwar

1 Like

Do the machines all have same date and time? Check if their clocks are set by ntp server…

I strongly recommend you to use Bulk Load. As you mention on Slack that you’re using Live Load. It is much safer and faster and less resources needed to use Bulk Load for that kind of situation.

Cheers.

Hi Micheal,

Thanks for the reply. I went through the documentation and used live load as we already had a setup.

Thanks,
Eshwar

Hi Smantha,

We right now have a single server where both zero and data services are running.

Thanks,
Eshwar

I’m unable to query on the database anymore. Looks like all the indexes have crashed.

I tried modifying the schema and incorporate indexing. Below are the error logs:

dgraph[5011]: runtime.throw(0x1463913, 0x16)
dgraph[5011]: #011/usr/local/go/src/runtime/panic.go:608 +0x72
dgraph[5011]: runtime.sysMap(0xc388000000, 0x4000000, 0x1f96318)
dgraph[5011]: #011/usr/local/go/src/runtime/mem_linux.go:156 +0xc7
dgraph[5011]: runtime.(*mheap).sysAlloc(0x1f7c680, 0x4000000, 0x0, 0x0)
dgraph[5011]: #011/usr/local/go/src/runtime/malloc.go:619 +0x1c7
dgraph[5011]: runtime.(*mheap).grow(0x1f7c680, 0x1, 0x0)
dgraph[5011]: #011/usr/local/go/src/runtime/mheap.go:920 +0x42
dgraph[5011]: runtime.(*mheap).allocSpanLocked(0x1f7c680, 0x1, 0x1f96328, 0xc0ef182938)
dgraph[5011]: #011/usr/local/go/src/runtime/mheap.go:848 +0x337
dgraph[5011]: runtime.(*mheap).alloc_m(0x1f7c680, 0x1, 0xa, 0x0)
dgraph[5011]: #011/usr/local/go/src/runtime/mheap.go:692 +0x119
dgraph[5011]: runtime.(*mheap).alloc.func1()
dgraph[5011]: #011/usr/local/go/src/runtime/mheap.go:759 +0x4c
dgraph[5011]: runtime.(*mheap).alloc(0x1f7c680, 0x1, 0xc00e01000a, 0x7f2d5d34de30)
dgraph[5011]: #011/usr/local/go/src/runtime/mheap.go:758 +0x8a

The errors in the log are all memory-related. Do you have enough RAM handle the volume you’re loading?

The system is of 16 GB RAM and i’m using the below commands to run the dgraph services.

dgraph alpha --lru_mb 5120 -p /data/dgraph/p -w /data/dgraph/w
dgraph zero --wal /data/dgraph/zw

I increased the RAM to 32GB and started indexing (altered schema in Ratel). I see the below log even when I restart dgraph service. And its stuck there since long.

dgraph[3755]: I0221 08:00:17.710171 3755 index.go:33] Deleting index for title
dgraph[3755]: I0221 08:00:17.831886 3755 index.go:38] Rebuilding index for title

I cant query on other indexes too when this update is happening. Is this an expected behaviour?

We see no much I/O activities as well. (used iotop)
How long will the indexing take?
Is there a way to cancel the index creation once triggered?

Thanks,
Eshwar

When an index is being built all the Alphas wait for the index rebuild to finish before serving any further requests.

Given the size of data here, the index should be created first before loading any data.

Hi Daniel,

Thanks for the response. Yeah, I had created the schema well before inserting data to dgraph. But then the indexes crashed during the live load.

Is there a way to cancel the current indexing which is getting created?

Thanks,
Eshwar

There’s no way to cancel a running indexing task. If you need it, feel free to file an issue for it.

Hi Manish,

I have created an issue in github repo.
https://github.com/dgraph-io/dgraph/issues/3061

Thanks,
Eshwar