Optimizing sum() performance, Dgraph takes 10s (16 vCPU, 128GB RAM), PostgreSQL takes 1.12s (4vCPU, 16GB RAM)

Given this schema:

type Hotel {
	name
}

type Room {
	hotel
	name
}

type Ledger {
	hotel
	room
	createdTs
	amount
}

name: string @index(exact, term) .
hotel: uid @reverse .
room: uid @reverse .
createdTs: datetime @index(hour) .
amount: int @index(int) .

And there being:

  1. 100 Hotels,
  2. 1000 Rooms per Hotel, and
  3. 2000 Ledgers per Room;

this query took 10.09s:

{
  getAmountSumForAllLedgersOfHotel99(func: eq(name, "Hotel99")) {
    uid
    ~hotel @groupby(hotel) {
      sum(amount)
    }
  }
}

and this query timed out:

{
  getLedgerSumForHotel99WIthTimeRange(func: eq(name, "Hotel99"))  {
 		~hotel @filter(ge(createdTs, "2019-03-01") AND le(createdTs, "2019-03-31")) @groupby(hotel) {
    sum(amount) 
  }
    }
}

10.09s is too long because PostgreSQL and MSSQL could do it under 1.5s with this equivalent query:

SELECT Ht.NAME, SUM(lgr.Amount) AS TotalAmount, COUNT(lgr.LedgerID) AS TotalBill
 FROM Hotel Ht
 JOIN Room Rm ON Ht.HotelID = Rm.HotelID
 JOIN Ledger lgr ON Ht.HotelID = lgr.HotelID AND Rm.RoomID = lgr.RoomID
 WHERE Ht.NAME = 'Hotel 99'
 GROUP BY Ht.NAME

I have 2 questions:

  1. How should I optimize my queries? WIth these perf numbers… my management will likely drop Dgraph from their consideration. :frowning:
  2. Is hotel: uid @reverse @count . permitted?
1 Like

Hi @geoyws, thanks for reaching out to us with above numbers. We would like to look more what is causing the slowdown. Can you somehow share you alpha p directory with us, if its testing data?

Hi @ashishgoswami, how should I share that directory? It’s about 400GB in size possibly… Yes it’s just randomly generated test data don’t worry, nothing sensitive. Would you like ssh access into the server itself?

Hi @geoyws, will it be possible for you share your data generation script? If yes, please share it on my mail - ashish@dgraph.io.
If possible please share the CPU profile while Dgraph is running above query. https://dgraph.io/docs/howto/#profiling-information

@ashishgoswami I shared it to your email address. I’ll message you directly about how to use it.

I’d suspect that groupby might be taking time. It happens serially.

Some things to try to identify that would be to:

  1. Try a query without groupby and with sum.
  2. Try a query with groupby and without sum.
  3. Try a query without either.

And see how the latencies vary. That would help identify which part is slow.

@mrjn We’re having a meeting with @dereksfoster99 at 9pm PST to figure out the best way to run our sum queries later… will try your suggestion (don’t mind you joining in!)

Created this GitHub issue: Optimizing sum() performance · Issue #5432 · dgraph-io/dgraph · GitHub

@dmai Hi Daniel, this thread is also related to the script I provided you. I forgot to mention that you had the script here. I appreciate y’all taking the time. Really hoping to use Dgraph extensively here.

Hey @geoyws. Thanks for sharing your test scripts. It’ll take time for us to really dig deeper to optimize Dgraph to perform well for your specific queries. It’s not a simple task, so I’d expect at least three months for any update on this.

no worries man. Will be here rooting for you guys.

@geoyws , would you share your data generation script in this thread ? thanks

1 Like

@questiondgraph Apologies for the late reply. I updated its README.md with some instructions on how to use it.

The script was hastily put together, so there’s no multi-threading.

thank you for sharing.