Storage of indexes and schema updates

cscetbon · December 15, 2022, 12:48am

Can you explain how indexes are managed and stored in Dgraph ? Same for schema updates. Does it change how data is accessed? how data is stored? both ? What impact does a schema update have on a running production database ? How do we rebuild an index if it’s possible ?

Thank you

matthewmcneely · December 15, 2022, 7:45pm

Have a look at the Dgraph paper for details on how indexes are managed in Dgraph: dgraph/dgraph.pdf at main · dgraph-io/dgraph · GitHub. That paper should address your other questions too, with the exception of index rebuilding.

For an answer to that, have a look at Manual index rebuild

MichelDiz · December 15, 2022, 11:34pm

indexes are managed automatically based in the given schema and do not require any explicit configuration or maintenance. Not sure what you want.

You can alter the schema via Alter endpoint or method.

Not sure what you mean.

Dgraph stores data in a native graph format using BadgerDB as the underlying key-value (KV) store. In Dgraph, the smallest unit of data is a posting list, also known as a predicate. This means that data is stored in the database as a set of posting lists, each of which corresponds to a predicate in the schema. These posting lists are stored in BadgerDB attached to a shard called “tablet”. A tablet is a sum of postings + indices. This allows Dgraph to provide fast and efficient access to data, while still maintaining the integrity and structure of the graph.

In general, schema updates in Dgraph can be performed without any downtime or impact on a running production database, especially if the database has a high availability (HA) setup. However, changes to the schema can affect the performance of queries, depending on whether new indexes are added or existing indexes are removed or modified. Adding new indexes can improve query performance, while removing existing indexes or changing the type of an indexed field can degrade query performance.

In the past, it was possible to rebuild an index in Dgraph by performing a schema update that specified the same index for a given predicate. However, this functionality is no longer available in Dgraph, and it is not currently possible to rebuild an index. In fact, any changes to the schema used to trigger a rebuild of all indexes, which consumed a lot of resources. This is when this functionality was disabled.

If you would like to see this functionality added back to Dgraph, we recommend opening an issue on our GitHub repository to track the request. If the issue does not already exist, you can create a new one. By opening an issue, you can help us track the request and prioritize it for future development.

cscetbon · December 16, 2022, 4:42am

Sorry, I should have told you that I know Badger and how Dgraph stores posting list on top of it. What I’m more interested in is knowing what type of schema changes can be made and what happens when we do so. For indexes I can understand that removing an option like @search will remove the underlying index for instance but I suppose it’s done in the background, it prevents Dgraph from using the index anymore even if the data is still there, but what is the impact on the database ? does it slow it down ? does it schedule the deletion for later ? Is it on us to know that for the duration of the suppression the db will be impacted ? It probably depends on the table size etc…

Same question if for instance we try to change a type, can it be done ? What happens if I try to convert an int to a float ?

Thanks

MichelDiz · December 16, 2022, 5:40am

You can convert an integer to a float. You can also convert a float to an integer, and the data will not be lost or deleted. However, you will not be able to retrieve the values. Once you use float values, you have to keep them as floats.

You can convert a string to an integer or float and vice versa, as long as you do not change the format or add special characters.

Some data types, such as unique identifier (UID), geographic location (geo), boolean values, passwords, and datetime values, cannot be converted to other data types. And generally not possible to convert a list back to a single value.

You can only “play” with indexes (tokenizers) and directives. Directives are simply functions that are applied on the fly.

it is best to choose a data type that is appropriate for the values being stored and to avoid unnecessary conversions.

I cannot give you detailed information on what happens because I do not have complete knowledge of the code. I only know what is abstract or stated in Manish’s paper.

There is no “@search” in DQL, only in GraphQL. Although it does the same thing as “@index”. It is good to keep them separated. As far as I understand, you were referring to Dgraph and not GraphQL.
Yes, index changes are made in the background.

If you change or remove an index, the index will be removed or unavailable during the change. The data will still be there, but the index table will be removed and will not be available for querying. I’m not sure what specific impact you are referring to. As I mentioned before, removing an index can have a direct impact on the performance of some queries that are heavily reliant on indexing. It may cause these queries to run slower or even fail if the necessary index is not available(e.g regex).

As far as I know, it is immediate.

The question is. Why you bother with this? Are you planning to do changes all the time? I have some ideas about having multiple versions of the schema as if they were “branches”. I think it would be useful for someone who intends to make continual changes to DB indexes.

BTW, only the single predicate is impacted. If you use best effort, I believe it will not influence the query.

Yes. But there is no formula for how much you are able to measure.

Yes, but be careful. As I said, not every type of you will be able to do this.

It’s allowed. The opposite that sometimes can be bad. May generate errors due to float size.

cscetbon · December 16, 2022, 8:00am

Do you mean that values are not updated in the database ? or just once they are converted to float they can’t be converted to something else ?

I said Dgraph not DQL so I mean interacting with Dgraph and in my case exclusively with GraphQL

I’m trying to understand if the database I’m working with could be impacted and if I should do it on off hours instead of during rush hours

Can you elaborate on this ? I don’t get it.

Thanks

MichelDiz · December 16, 2022, 3:35pm

Sorry, but I think it’s obvious right? Some types cannot be converted to other types. You cannot convert 2.89289111 to int - what it would be? 2 or 3? see? But it can in String. However, this conversion makes no sense. What would be the plausible reason to change the scalar type? At first this question made no sense to me. I struggled to try to explain in detail. But in reality, this idea of changing types all the time doesn’t make sense. Is it a recurring practice in another DB? I can’t see the advantage in that.

The value you send to the DB will be kept as it came. Types are a representation of form. Data is stored in bytes and we use Int, float, string and etc types to show this data. They are not written in float or int as far as I know (I could be wrong a bit). So some conversion is still possible.

You can convert infinitely. What matters is not introducing new characters strange to Int or float. However, this practice is unsafe and not recommended.

It makes our job a little more difficult. See, the data is stored in the context of Dgraph. That is, DQL/RDF. And not something related to GraphQL. GraphQL is just an API language. Dgraph’s native language is DQL.

Another point is that you opened the issue in users/dgraph instead of user/graphql/. This tells us that you are interested in Dgraph i.e. DQL/RDF. There is no mention of GraphQL in your question. The indexing rebuild processes and the other things you were interested in are within the scope of Dgraph and not GraphQL.

If you do what you’ve always done with other DBs, you’ll be safe.

This is DQL related. Something fundamental to how data is written in DQL. Since your interest is GraphQL, no drill down is required.

Cheers.

cscetbon · December 16, 2022, 6:27pm

You asked if I meant DQL or Dgraph, I said Dgraph and that I interact with it using GraphQL, I can’t see how it invalidates my questions about Dgraph and put the focus exclusively on GraphQL …

With other databases I know what they do behind the scene hence the reason why I’m using those questions. You can point me to the documentation if you want but I think it’s important to know how it works under the hood.

I’m still interested in understanding your answer, can you please elaborate on this ^ ?

MichelDiz · December 16, 2022, 7:09pm

It’s not about validation. Just about context. Without context I can’t give precise answers. If you start a thread without context. I’ll have to guess what your objective is or pepper you with questions back.

You can ask as many questions as needed. But help to help you. Context is always welcome.

Unfortunately the documentation doesn’t go that far. This would be a content type for anyone who wants to contribute to the core code. You will not be able to deeply understand how DB works without learning BadgerDB, without reading the paper and the code.

Within the scope of Dgraph. Predicate is the smallest atomic unit of DB. It’s not based on entities, but shards of everything the DB contains. So when you change a predicate it will imply the tablet(tablet, not to be confused with “table”) of that predicate. It doesn’t change the whole Type(in case of GraphQL). Just those predicates you changed.

From docs: You can optionally set the bestEffort boolean to true . This may yield improved latencies in read-bound workloads where linearizable reads are not strictly needed.

https://dgraph.io/docs/clients/go/#read-only-transactions
https://dgraph.io/docs/enterprise-features/learner-nodes/#best-effort-queries

cscetbon · December 16, 2022, 7:25pm

Okay thanks, I should be able to take it from there.

–
HC

Topic		Replies	Views
Data == postings lists == indexes and resource utilization Dgraph Cloud	6	693	May 13, 2021
Queries and Storage Questions Dgraph	2	854	November 28, 2018
Storing GraphQL schema in Dgraph Dev graphql , eng , schema	8	1266	June 4, 2020
Dgraph schema alteration good practices Dgraph	3	603	June 2, 2020
How do graphs get mapped to badger? Dgraph kind:question	4	447	May 20, 2021

Storage of indexes and schema updates

Related topics