Specifying indexes in dgraph

How are we specifying indexes in dgraph? I don’t see any mention of indexing in the roadmap. But clearly string matching will require indexes. And so do geo queries.

Are we automatically creating indexes or having the user specify them using some schema? Or do we expose some API for the users to create the index manually? I see some code that was added recently to support indexes but don’t see any documentation on the wiki about this.

For reference here is now neo4j does indexing:

We will let the user specify the predicate for the indexing and the tokenizer that they need to use. I think we’re half way there that we can specify the predicate (but not the tokenizer so far). FYI: @jchiu

That’s right. I will try to get the tokenizer in this week.

Currently, index takes in a json config. I do foresee reading this from some centralized config in the future, but I didn’t want to be blocked by this at this point.

We haven’t updated the wiki because we haven’t exposed this functionality yet.

@jchiu do you have some document describing what you propose to implement? That’ll help me understand and incorporate it in the geo query design.

Do you refer to the schema design?

Currently, this is how it looks like:

	"config": [{
		"attribute": "type.object.name.en"

The next step is to add a tokenizer.

	"config": [{
		"attribute": "type.object.name.en",
		"tokenizer": "term"

It is possible that we don’t use JSON in the future, but I am more concerned about what is being specified. For now, it is really just a list of (attribute, tokenizer).

Does this apply the index on every single node with this attribute?

The node holding the predicate being indexed will do the indexing.

The implementation is essentially:

If I get a mutation on a predicate value, I add an additional inverted mutation. So that makes sense only if it happens on the same node holding the predicate. Hope this answers the question?

Add: Every node will check if the mutation applies to a predicate that is being indexed. Every node will have a copy of the index file I expect.

I think you interpreted what i meant by “node” differently from me :slight_smile:. By “node”, I mean the “node” in the graph, not a server.

If I interpret this correctly, every single mutation will check if there is an index on that particular attribute (predicate), irrespective of the type of subject.

I’d like to add a type to the config to indicate whether this is a spatial index (for geo data) or an inverted index (in your case). An unspecified type could just mean inverted index.

Haha… Yes, you are right about “every single mutation will check if …” Index will ignore the mutation if it’s valueBytes is nil, which means we are just changing UIDs/subjects not values.

Yes, it makes sense to add a type. The default could be “string/text”. And we can add more like “spatial”, “numeric”, “datetime” in future.

If you like you can add to index.go. Thanks.

1 Like

Instead of adding a type to indexing config, couldn’t you add a tokenizer for spatial indexing? What purpose is the type going to achieve, that the predicate name couldn’t?

There is no tokenization of the data. The entire geometry would need to be added to the index with a bounding box. The implementation of the index is completely different from what @jchiu has. Yes we could switch on value of the tokenizer field and use different index implementation, however in that case tokenizer is just the wrong name for it. type seems more appropriate. In the future, we may have different types of spatial index implementations optimizing for different kinds of data/queries.

Predicate name does not tell us much, it just tells us if we need to index a particular mutation or not. It does not tell us how to index it.

Then sounds like it should belong to the schema file? Which is where we store the data types attached to a predicate.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.