Problem when change datatype and index of predicate with huge data

diggy · June 18, 2019, 3:48pm

Moved from GitHub dgraph/3572

If you suspect this could be a bug, follow the template.

What version of Dgraph are you using?
v1.0.14
Have you tried reproducing the issue with latest release?
Yes
What is the hardware spec (RAM, OS)?
Linux Ubuntu 18.04 , 31GiB
Steps to reproduce the issue (command/config used to run Dgraph).

Go to dgraph-ui page
Change type predicate from string to float and make index

image1285×491 23 KB
The Dgraph server freeze some minutes. I had restart Dgraph server

systemctl restart dgraph dgraph-ui.service dgraph-zero.service

Go back ui page (port 8000) , make query and not response. The error: context deadline exceeded (see image)

Detail:

I have Dgraph Database with huge data (over 10 million nodes). After i change price predicate from string to float (current with over 5 million nodes has price data) and index it, Dgraph server has been freeze and i can’t query or mutate. This is the error:

It seem to be hang forever and can’t excute any query.
I have restart Dgraph service, but still got the problem.
What happen, is it the bug on Dgraph when run with huge data?

diggy · June 27, 2019, 8:28pm

martinmr commented :

This is not a bug. Unfortunately, in order to keep all our consistency guarantees, Dgraph blocks while rebuilding an index. So adding an index to a predicate with lots of data might result in Dgraph being unavailable for a while. Restarting the server doesn’t help because the index rebuild process starts right after restart. It’s better to just let it complete.

Two strategies to reduce the impact:

Think though your schema and desired queries thoroughly before you start adding data. This minimizes the need to change the indices/schema.
If the need arises, perform the change at some point with low traffic (e.g at night).

Unfortunately this is the current state of affairs. Once version 1.1 is released we can start looking into improving this process.

diggy · August 8, 2019, 5:28pm

d4l3k commented :

This is a really unfortunate state to be in. There should be some way to cancel building the index since it’s can be hideously slow. I left it running for 10 hours without it finishing and doesn’t seem to be any way to view the progress.

From a recent blog post “One of the cornerstones of Dgraph is that it allows a flexible schema, which can be modified in a live system, without any downtime. This involves changing data types and adding or removing indices with a single ALTER command to match the needs of an application developer.” https://blog.dgraph.io/post/indexing-in-dgraph/

This post states that there is no downtime which according to this issue is not the case. Ideally this would happen in the background like how CockroachDB does it and only use the index once it’s fully built. CockroachDB also provides info on status of the index build. Online Schema Changes | CockroachDB Docs

diggy · August 8, 2019, 6:54pm

manishrjain commented :

Yes. We should be fixing this problem. Allow queries to use the index once it’s built and also allow indices to be built using disk, instead of entirely in memory. Feel free to file an issue to track that. We’ll get to it with priority after v1.1 release. CC: @campoy .

For the time being, if you use best effort queries, you’d still be able to get the queries to work.

diggy · August 8, 2019, 6:55pm

manishrjain commented :

Opening this issue as a placeholder. Close this issue once the other issue is filed.

diggy · August 8, 2019, 6:58pm

d4l3k commented :

For anyone else running into this issue, here’s the hacky solution to cancelling an index build.

Applied these changes to v1.0.16: noindex · d4l3k/dgraph@6a11b74 · GitHub

Launched alpha with these changes, removed the index from schema and then launched from stable again.

diggy · August 8, 2019, 7:00pm

d4l3k commented :

For the time being, if you use best effort queries, you’d still be able to get the queries to work.

Reading still works with this but it was blocking me from adding new data.

diggy · September 30, 2019, 3:32am

Willem520 commented :

Hi, I met the same problem #4097, it look like a big problem to block my program

Topic		Replies	Views
Problem when change datatype and index of predicate in Dgraph Dgraph	2	525	June 21, 2019
Query too slow after adding float index Dgraph kind:question , dgraph , area:performance	3	630	June 11, 2020
High CPU and DEADLINE_EXCEEDED when running queries Dgraph dgraph , kind:bug , status:more-info-nee	3	659	December 30, 2019
Float values leading dgraph live loader to go into a looped parsing failure Dgraph kind:question	4	340	June 23, 2023
I encountered the error of errIndexingInProgress. Please retry when alter the schema Dgraph kind:enhancement , status:accepted , ticket:created	9	1086	September 29, 2020

Problem when change datatype and index of predicate with huge data

Related topics