Jeff Dean's talk about rapid response times

mrjn · April 19, 2016, 4:54am

Take time to watch this video, and read his paper. Jeff Dean along with Sanjay Ghemawat, is a pioneer at Google, leading almost all the externally popular infrastructure technologies, i.e. Mapreduce, GFS, Bigtable, and now neural network stuff etc.

Cheers,
Manish

mrjn · May 13, 2016, 1:57am

Watching the video right now.

Amazing point about sending backup requests to replicas, only if the original one takes over X ms. From his benchmarks:

No backups

Avg: 33 ms
Std Dev: 1524 ms
99.9%ile: 994 ms

Backup after 10 ms

Avg: 14 ms
Std Dev: 4 ms (!!)
99.9%ile: 50 ms (!!!)
Increase in request load: < 5% extra requests

This is a huge difference! And it’s selective in the sense that you only send our requests to replicas if the original one doesn’t reply back within X ms. This is a great sweet spot between controlling the number of network calls, and hence throughput; while still improving tail latencies dramatically. Love it! We’ll have to figure out what X is for us.

mrjn · May 13, 2016, 2:31am

He wrote an article on CACM:

It doesn’t open for me right now, but this magazine looks really interesting. Might be worth subscribing to. They have an article on Parallel Graph Processing in the current May edition.

pawan · May 13, 2016, 5:10am

The conference that he gave this talk in http://ricon.io/

mrjn · May 13, 2016, 9:03am

Yeah, I remember subscribing to it, so we’ll get notified when they announce the next conference.

mrjn · May 13, 2016, 9:40am

Same slides, but different conference this time:

In the end, he basically recommends doing this:

Send request to the first replica, telling it that it’s going to send it to a second one.
2 ms later, send the request to the second one, telling it that it’s already sent to the first one.
When one of them starts processing the request, it sends a cancellation request directly to its peer.
If the peer hasn’t started processing the request, it would just cancel the request.
Otherwise, both of them process it and overall do twice the work. This should be really rare because your latencies improve due to this method.

I think this is even more powerful than the first one I pointed above. Because this is something we can throw at any cluster in any environment (GCE, Amazon, owned datacenter), and this would achieve good results.

Amazing presentation!

pawan · May 13, 2016, 9:56am

Did he say more about how he came up with the 2ms time?

mrjn · May 13, 2016, 1:33pm

Technically, it’s left as an exercise to the audience. He asked the audience this question in the 45 min talk at Ricon.

The answer is, if the machines are just sitting idle, which happens pretty frequently, then the 2 ms delay helps with the last case, where both process and overall do twice the work.

Update: Found the slides:

pawan · May 27, 2016, 2:51am

Just saw the talk. He also says they use protocol buffers for all internal communication. They should work for us too if they work for Google.

mrjn · May 27, 2016, 2:56am

Well, they’re too deep into it and have been for a while. We’re starting afresh, so we can use better protocols.

mrjn · November 26, 2016, 10:40am

Update: We ended up switching to Protocol Buffers, after Flatbuffers proved to be slower. Also, Flatbuffers API is complicated and ugly; so once the benchmarks proved that FB is slower for our data, it was an easy decision to move to PB.

Topic		Replies	Views
Good Materials on learning fundamentals of Distributed systems Misc	3	1820	May 27, 2016
I imported 50 billion rdf into dgraph in 15 hours Dgraph	8	571	June 6, 2020
Can dgraph do the following use case? Users performance , area:performance	9	858	December 14, 2022
Master-slave replication suggestions Badger kind:question , badger	1	698	July 13, 2020
Query throughput Dgraph	11	1103	April 30, 2020

Jeff Dean's talk about rapid response times

No backups

Backup after 10 ms

Related topics