Understanding why `count(~fooPredicate) @filter(eq(dgraph.type, "fooType"))` is slow

Just need some clarification on my understanding.

Schema:

type Hotel {
	name
}

type Room {
	hotel
	name
}

type Ledger {
	hotel
	room
	createdTs
	amount
}

type Cafe {
    name
    hotel
}

name: string @index(exact, term) .
hotel: uid @reverse .
room: uid @reverse .
createdTs: datetime @index(hour) .
amount: int @index(int) .

Query:

{
  getRoomCountForAllHotels(func: eq(dgraph.type, "Hotel")) {
    roomCount: count(~hotel) @filter(eq(dgraph.type, "Room"))
  }
}

I’ve got 100 hotels, and each hotel has 1000 rooms and 1 cafe.
After a few minutes this returns a : context deadline exceeded error.

I’m assuming because Badger is inspired by RocksDB which was inspired by LevelDB, the storage engine would be sort of a hexastore… is this true?

So then my intuition told me:

  1. Perhaps the query planner might use the indexed hexastore to find the hotels via Predicate(dgraph.type)-Object(“Hotel”)-Subject(uid) and load 100 of the hotel uids into memory,
  2. and for each of these hotel uids, we’ll go into a nested loop and resolve their ~hotel via Predicate(hotel)-Object(hotel_uid)-Subject(room_uid or cafe_uid) which would bring up 100K room_uids and 100 cafe_uids that would be loaded into memory.
  3. Then we’d have to filter these 100K + 100 records by querying each of them (my goodness) via Subject(room_uid or cafe_uid)-Predicate(dgraph.type)-Object(“Room”) to make sure they’re a dgraph.type of Room and not Cafe.
  4. Then we’d sum them up according to their hotel uid (the RDBMS equivalent of GROUP BY) and return the values.

Am I sort of correct?

This filtering step is essentially an intersection of two sorted lists where the two lists are

List 1 => uids of rooms for a hotel
List 2 => uids of all rooms

This intersection would happen for every hotel.

I am still surprised that your query doesn’t return after minutes with such a small amount of data. Perhaps you could provide us with a sample data set which would help us dig deeper into your issue? Also, another suggestion is that won’t it be better if your schema instead looked like

type Hotel {
	name
        rooms
        ledgers
        cafes
}

type Room {
	name
}

type Ledger {
	rooms
	createdTs
	amount
}

type Cafe {
    name
}

name: string @index(exact, term) .
rooms: [uid] @reverse .
ledgers: [uid] @reverse .
cafes: [uid] @reverse .
createdTs: datetime @index(hour) .
amount: int @index(int)

Then your query won’t need the second filter.

{
  getRoomCountForAllHotels(func: eq(dgraph.type, "Hotel")) {
    roomCount: count(rooms)   
  }
}
1 Like