Same slides, but different conference this time:
In the end, he basically recommends doing this:
- Send request to the first replica, telling it that it’s going to send it to a second one.
- 2 ms later, send the request to the second one, telling it that it’s already sent to the first one.
- When one of them starts processing the request, it sends a cancellation request directly to its peer.
- If the peer hasn’t started processing the request, it would just cancel the request.
- Otherwise, both of them process it and overall do twice the work. This should be really rare because your latencies improve due to this method.
I think this is even more powerful than the first one I pointed above. Because this is something we can throw at any cluster in any environment (GCE, Amazon, owned datacenter), and this would achieve good results.