Specifying indexes in dgraph

kostub · September 13, 2016, 1:41am

How are we specifying indexes in dgraph? I don’t see any mention of indexing in the roadmap. But clearly string matching will require indexes. And so do geo queries.

Are we automatically creating indexes or having the user specify them using some schema? Or do we expose some API for the users to create the index manually? I see some code that was added recently to support indexes but don’t see any documentation on the wiki about this.

For reference here is now neo4j does indexing:

mrjn · September 13, 2016, 7:36am

We will let the user specify the predicate for the indexing and the tokenizer that they need to use. I think we’re half way there that we can specify the predicate (but not the tokenizer so far). FYI: @jchiu

jchiu · September 13, 2016, 8:00am

That’s right. I will try to get the tokenizer in this week.

Currently, index takes in a json config. I do foresee reading this from some centralized config in the future, but I didn’t want to be blocked by this at this point.

We haven’t updated the wiki because we haven’t exposed this functionality yet.

kostub · September 13, 2016, 11:42am

@jchiu do you have some document describing what you propose to implement? That’ll help me understand and incorporate it in the geo query design.

jchiu · September 13, 2016, 12:38pm

Do you refer to the schema design?

Currently, this is how it looks like:

{
	"config": [{
		"attribute": "type.object.name.en"
	}]
}

The next step is to add a tokenizer.

{
	"config": [{
		"attribute": "type.object.name.en",
		"tokenizer": "term"
	}]
}

It is possible that we don’t use JSON in the future, but I am more concerned about what is being specified. For now, it is really just a list of (attribute, tokenizer).

kostub · September 13, 2016, 12:39pm

Does this apply the index on every single node with this attribute?

jchiu · September 13, 2016, 12:41pm

The node holding the predicate being indexed will do the indexing.

The implementation is essentially:

If I get a mutation on a predicate value, I add an additional inverted mutation. So that makes sense only if it happens on the same node holding the predicate. Hope this answers the question?

Add: Every node will check if the mutation applies to a predicate that is being indexed. Every node will have a copy of the index file I expect.

kostub · September 13, 2016, 1:19pm

I think you interpreted what i meant by “node” differently from me . By “node”, I mean the “node” in the graph, not a server.

If I interpret this correctly, every single mutation will check if there is an index on that particular attribute (predicate), irrespective of the type of subject.

I’d like to add a type to the config to indicate whether this is a spatial index (for geo data) or an inverted index (in your case). An unspecified type could just mean inverted index.

jchiu · September 13, 2016, 11:49pm

Haha… Yes, you are right about “every single mutation will check if …” Index will ignore the mutation if it’s valueBytes is nil, which means we are just changing UIDs/subjects not values.

Yes, it makes sense to add a type. The default could be “string/text”. And we can add more like “spatial”, “numeric”, “datetime” in future.

If you like you can add to index.go. Thanks.

mrjn · September 14, 2016, 12:54am

Instead of adding a type to indexing config, couldn’t you add a tokenizer for spatial indexing? What purpose is the type going to achieve, that the predicate name couldn’t?

kostub · September 14, 2016, 1:53am

There is no tokenization of the data. The entire geometry would need to be added to the index with a bounding box. The implementation of the index is completely different from what @jchiu has. Yes we could switch on value of the tokenizer field and use different index implementation, however in that case tokenizer is just the wrong name for it. type seems more appropriate. In the future, we may have different types of spatial index implementations optimizing for different kinds of data/queries.

Predicate name does not tell us much, it just tells us if we need to index a particular mutation or not. It does not tell us how to index it.

mrjn · September 14, 2016, 1:54am

Then sounds like it should belong to the schema file? Which is where we store the data types attached to a predicate.

Topic		Replies	Views
Optimizing Indexing in Dgraph - Dgraph Blog Blog	0	995	January 29, 2019
Understanding indexing better Users	2	507	March 21, 2019
How to add indexes in pydgraph? Dgraph Clients	2	727	December 15, 2021
Scalability of Indexes Users	5	692	May 3, 2018
Revisit index syntax for new Type System Dgraph	3	511	November 19, 2019

Specifying indexes in dgraph

Related topics