Is it possible to join/merge two indexes? Merging different indexes of the same kind for general search, but also keeping the single indexes

is it possible to join two indexes, of the same type, together, and search through them with the index capability? else I’ll make a feature request for that one, since it’s a very easy feature to implement (since you just create a new index under the hood, sure you duplicate values, but it is what it is, maybe with more sophisticated algorithms you can speed that up, but that ‘shallow’ solution is already more than enough)

//

better explained what i mean with joint indexes:
e.g you have two different predicates, hotel.rating and restaurant.rating
schema:
hotel.rating: int @index(int) .
restaurant.rating: int @index(int) .

and now you want to find all businesses that have a rating above 4 stars, first you want to get 25 businesses listed. Now you have to do ugly things, getting 12 restaurants and 13 hotels and mixing them in the result. things get uglier if you want the top 10 businesses, then you would need to get
5 restaurants and 5 hotels, and then you have to sort them again and deliver the top 10. But what if the top 10 are actually all hotels? to solve that problem you have to not get 5 hotels 5 restaurants, but 10 of each. you have to overfetch. and then sort. this is not cool

so what I basically want, is that hotel.rating and restaurant.rating predicates, are basically duplicated into one single business.rating
so that my schema basically looks like this:

hotel.rating: int @index(int) .
restaurant.rating: int @index(int) .
business.rating: int @index(int) . #which is just a merge of the two aboves

this is a very simple thing.
I know I could solve it if I always add to restaurant nodes also a second predicate business.rating which would be just a duplicate of restaurant.rating , so whenever I mutate or compute a new rating, I change both of them. This is a good solution. I could also just change only the restaurant.rating predicate, and let a post-mutation hook change for me the business.rating ;this would be a good solution too

(BTW: just removing hotel.rating & restaurant.rating and ONLY using business.rating is NOT a solution, because if a user would query for the best 10 restaurants, dgraph couldnt use the index anymore, which would slow down things)

getting that handled automatically, which is no problem to implement into dgraph since this is such an easy feature, would be really cool and neat.

if this feature does not exist yet, it would be really cool and neat to have it. since it won’t be that hard to implement it. and it would make things more convenient.

disclaimer: I know I know things wouldn’t matter anymore if a user does not want top10 restaurants, but want top10 restaurants within 5km radius. Then to improve speed you would query first for restaurants within 5 km, and then filter them by their rating. This whole restaurant hotel thing is just an example since it is easy to understand what I exactly mean since my english sux

BTW2: this future would be also extremely useful for merged fulltext search. e.g you have a social media site about cats and dogs. you would want to use for dogs and for cats an own index. if a user wants to search cats with fulltext search e.g “small cute cats eating” then you don’t want get results with dogs. So if you had ONLY a general search for both cats and dogs, you would need to filter that the noderesult is about cats. this increases latency. So you want 3 fulltext indexes: One for cats, one for dogs, and one for both.
Yes I know this would be a trade of server ressources(disk to store the index + CPU to update the index) for a more quick search. But this is not a problem if I want to give my users the best possible experience no matter what the $$ cost is.

BTW3: better use case for that feature: e.g you have big text that you make a hash equal index with. You don’t want to store the same texts about cats twice just to get two times the same hash, one for cat search one for general search.

I don’t see the problem here. I would map both to a single edge @dgraph(pred:"rating")

Then you could do something in DQL like

{
  n(func: has(rating), orderdesc: rating, first: 10) @filter(type(Restaurants)) {
    uid
    rating
    # other fields
  }
}

Umm… I don’t think you understand the complexity here. You are talking about indexing two things that might be distributed across different servers, for starters.

2 Likes

this is exactly the query I don’t want to do. Because @filter(type(Restaurants)) will increase latency or not? dgraph will first do the quick index stuff and get the businesses with the highest rankings, and then check if these businesses are even restaurants. Having 3 indexes, one for restaurants, one for hotel, and one for both; is the better solution or not?

BTW wasn’t there this ‘bug’ in first: 10 that e.g if the best 10 businesses are hotels and not restaurants, that I will get returned NO results, because filtering happens after getting the first 10 results?

No, you are thinking about the cascade w/pagination problem. Pagination works with these secondary filters. I just tested it again to be sure.

Not. It gets the galaxy first, but the additional filters and pagination is done on the received galaxy. I see the confusion the pagination is actually applied after the @filter but looking at the order of the syntax it would appear otherwise.

The root function is just to define the smallest part of the galaxy where to start to make the filter better performance to have less data to work through.

Think of it like a logical IF block that only then lets the inner logic fire if it really needs to. This way the @filter is not trying to work on the whole universe, just the nodes in the current galaxy

1 Like

thanks a lot!!!

But doing if to check every value if it is a restaurant, is more slowly, than having 3 indexes? because checking something through indexes is faster than checking it through filters/if-checks ? or not?

creating an own index/galaxy for restaurants is a better/faster approach or not?

You want my thorough thoughts on the subject of types then check out:

And more specifically to answer

Maybe I didn’t explain it good enough, but I am getting done for the night, I’ve wasted too many hours tonight on Dgraph.

Doing the IF check with a single index (would be better with an edge ^^^ maybe ?) is the only way to do it because it is illogical to build an index across three types that may no coexist.

1 Like

E.g I have 1 000 000 hotels in my system, and just 100 restaurants.
@filter(type(Restaurants)) would be a bottleneck or not? That would be 1 million internal IF checks that dgraph would do, just to check if a node is a restaurant. or not?

i didnt understood that right because of my english - you buddy mean that it is not a good practice to have 3 index’ (one for restaurants one for hotels and one for both)? yea it would a waste of resources, but better performance would be gained or not?

tbh i still didnt understood why several same indexes are bad pr what exactly you meant :sweat_smile: Is it possible to join/merge two indexes? Merging different indexes of the same kind for general search, but also keeping the single indexes - #7 by Juri can you maybe tell me if that is a bottleneck

Anthony’s reply:

Lets break it down as if it was SQL. So in SQL you can index 2 columns in the same table, but can you index together two columns together from different tables-No. Or could you index 2 columns where the columns are not only not in the same table, but not even in the same database-of course not, this is getting close. Now the absurd question, could you index across 2 columns that were not just not in the same table or database, but not even on the same server-definatly no. This is how Dgraph distributes. It spreads “columns” across multiple servers. So how exactly do you propose writing an index and updating that index in any kind of performant way when the two predicates that make up the index cannot be ensured that they even coexist on the same server. The index would be more complex to make then refactoring the data to something more feasable.

AHHHHHHHH I AM SO RETARDED NOW I FINALLY UNDERSTAND WHAT YOU MEANNNNNN yea that is true definitely!!! I should’ve written an example graphql schema to show better what I mean

type Restaurant {
id: ID!
name: String
restaurantrating: Int @search
businessrating: Int @search @dgraph(pred: "business.rating")
}

type Hotel {
id: ID!
name: String
hotelrating: Int @search
businessrating: Int @search @dgraph(pred: "business.rating")
}

E.g the new rating of a hotel is 4.5
I then update in one transaction hotelrating to 4.5 and businessrating to 4.5
so I’d basically store duplicate values. This is OK for an int like 4.5 which does not use up space, but if you think of the dog and cat blog example (a website where you can make blogposts about cats and dogs and hamsters), if using fulltext search, duplicating the text would be wasteful

type Catblogpost {
id: ID!
text: String @search(by: [fulltext]) 
text: String @search(by: [fulltext]) @dgraph(pred: "general.text")
}

type Dogblogpost {
id: ID!
text: String @search(by: [fulltext]) 
text: String @search(by: [fulltext]) @dgraph(pred: "general.text")
}

type Hamsterblogpost {
id: ID!
text: String @search(by: [fulltext]) 
text: String @search(by: [fulltext]) @dgraph(pred: "general.text")
}

for faster search we want an own index for every animal, e.g If I want to search something from the catblog | but we also want a general search if I want to search something in general

so my problem here is, that the blogtext is stored twice, what I want is something like this:

type Catblogpost {
id: ID!
text: String @search(by: [fulltext]) @dgraph(pred: "general.text", "catblogpost.text")
}

type Dogblogpost {
id: ID!
text: String @search(by: [fulltext]) @dgraph(pred: "general.text", "dogblogpost.text")
}

type Hamsterblogpost {
id: ID!
text: String @search(by: [fulltext]) @dgraph(pred: "general.text", "hamsterblogpost.text")
}

so I don’t have to store values twice and update two same values always

what I want is really highend first world problems optimization