Jeff Dean's talk about rapid response times

Hey @staff,

Take time to watch this video, and read his paper. Jeff Dean along with Sanjay Ghemawat, is a pioneer at Google, leading almost all the externally popular infrastructure technologies, i.e. Mapreduce, GFS, Bigtable, and now neural network stuff etc.



Watching the video right now.

Amazing point about sending backup requests to replicas, only if the original one takes over X ms. From his benchmarks:

No backups

  • Avg: 33 ms
  • Std Dev: 1524 ms
  • 99.9%ile: 994 ms

Backup after 10 ms

  • Avg: 14 ms
  • Std Dev: 4 ms (!!)
  • 99.9%ile: 50 ms (!!!)
  • Increase in request load: < 5% extra requests

This is a huge difference! And it’s selective in the sense that you only send our requests to replicas if the original one doesn’t reply back within X ms. This is a great sweet spot between controlling the number of network calls, and hence throughput; while still improving tail latencies dramatically. Love it! We’ll have to figure out what X is for us.

1 Like

He wrote an article on CACM:

It doesn’t open for me right now, but this magazine looks really interesting. Might be worth subscribing to. They have an article on Parallel Graph Processing in the current May edition.

The conference that he gave this talk in

Yeah, I remember subscribing to it, so we’ll get notified when they announce the next conference.

1 Like

Same slides, but different conference this time:

In the end, he basically recommends doing this:

  • Send request to the first replica, telling it that it’s going to send it to a second one.
  • 2 ms later, send the request to the second one, telling it that it’s already sent to the first one.
  • When one of them starts processing the request, it sends a cancellation request directly to its peer.
  • If the peer hasn’t started processing the request, it would just cancel the request.
  • Otherwise, both of them process it and overall do twice the work. This should be really rare because your latencies improve due to this method.

I think this is even more powerful than the first one I pointed above. Because this is something we can throw at any cluster in any environment (GCE, Amazon, owned datacenter), and this would achieve good results.

Amazing presentation!


Did he say more about how he came up with the 2ms time?

Technically, it’s left as an exercise to the audience. He asked the audience this question in the 45 min talk at Ricon.

The answer is, if the machines are just sitting idle, which happens pretty frequently, then the 2 ms delay helps with the last case, where both process and overall do twice the work.

Update: Found the slides:

1 Like

Just saw the talk. He also says they use protocol buffers for all internal communication. They should work for us too if they work for Google.

1 Like

Well, they’re too deep into it and have been for a while. We’re starting afresh, so we can use better protocols.


Update: We ended up switching to Protocol Buffers, after Flatbuffers proved to be slower. Also, Flatbuffers API is complicated and ugly; so once the benchmarks proved that FB is slower for our data, it was an easy decision to move to PB.