As per the documentation of golang, a Pool is a set of temporary objects that may be individually saved and retrieved and it is concurrency safe. Its main purpose is to cache unused items for later reuse and thus taking away some burden from garbage collector. It has been proven that using pools rightly can improve performance by many folds.
But, a small mistake while using it can lead to a microscopic bug which can get super hard to find and debug. One such bug was recently found in the K-Shortest Paths query in dgraph.
The bug
The weird response of k-shortest paths was reported by an user in this post. A shortest path query was returning a path which was completely garbage. It didn’t contain the destination node as its last node and its length was random. But how did this bug stayed under our nose for so long? It is because this bug show up once in every 20k-30k queries and that too for some specific kind of graph and query.
What was causing this?
It was caused by mishandling of objects taken from the pool. Consider the following sample code.
type route struct {
path []int
weight int
}
type shortestRoute struct {
routeVal route
idx int
}
var pathPool = sync.Pool{
New: func() interface{} {
return route{}
},
}
func main() {
var kroutes []shortestRoute
// create a shortestRoute object
curRoute := route{
path: []int{1, 2, 3, 4},
weight: 1,
}
curRoute.path = append(curRoute.path, 5)
sroute := shortestRoute{
idx: 1,
routeVal: curRoute,
}
// append sroute to kroutes and put curRoute in pool
kroutes = append(kroutes, sroute)
pathPool.Put(curRoute)
for i := 0; i < 2; i++ {
// get newRoute from pool and if it's path can hold one more element
// then use it otherwise create a new one
newRoute := pathPool.Get().(route)
if cap(newRoute.path) >= len(newRoute.path)+1 {
newRoute.path = newRoute.path[:len(newRoute.path)+1]
} else {
newRoute = route{}
for j := 1; j <= i+len(kroutes[len(kroutes)-1].routeVal.path); j++ {
newRoute.path[j] = j
}
}
// perform some operations on the path
newRoute.path[len(newRoute.path)-1] = len(newRoute.path)
newRoute.weight = i + 2
// create a shortestRoute struct
sroute := shortestRoute{}
sroute.idx = i + 2
sroute.routeVal = newRoute
// append sroute in kroutes and put back newRoute in the pool
kroutes = append(kroutes, sroute)
pathPool.Put(newRoute)
}
fmt.Println(kroutes)
}
In the above code, we create curRoute
, a route structure with the path as 1 - 2 - 3 - 4 - 5
and weight = 1
and then create a shortestRoute struct sroute
using it. We then append sroute
to the kroutes
slice and put curRoute
in the pool. Later in the for loop we reuse the item from pool if its capacity is sufficient to contain one more element otherwise we create a new route struct. And then append the next element in the newRoute.path
and finally the new shortestRoute
is appended in the kroutes.
We get the shortestRoute in kroutes finally as
{
path: [1 2 3 4 5]
weight: 1
}
{
path: [1 2 3 4 5 6]
weight: 2
}
{
path: [1 2 3 4 5 6 7]
weight: 3
}
]
The above result is exactly what we wanted. Now, lets change the code a little by adding one more change in the newRoute.path
, we make newRoute.path[0] = i + 100
newRoute.path[len(newRoute.path)-1] = len(newRoute.path)
newRoute.path[0] = i + 100
We want the result to be:
{
path: [1 2 3 4 5]
weight: 1
}
{
path: [100 2 3 4 5 6]
weight: 2
}
{
path: [101 2 3 4 5 6 7]
weight: 3
}
]
but we actually get:
{
path: [101 2 3 4 5]
weight: 1
}
{
path: [101 2 3 4 5 6]
weight: 2
}
{
path: [101 2 3 4 5 6 7]
weight: 3
}
]
We get the weight
as expected but something get messed up with the path. This happened because the assignment sroute.routeVal = newRoute
didn’t deep copy the whole []path
but just the pointer (the underlying array for the slice remains the same). Although, the case is not the same with the weight
. Any operations on newRoute.path
is reflected on all the previous items appended in kroutes. The issue shows up rarely because we used the pools rarely (depending upon if the item we get from pool has sufficient capacity as required).
Take away
The take away for @core-devs is that we should be careful while assigning an item from pools to a variable and processing the variable later, specially with items like slices. This would help us keep the bugs away which can live under our noses for a long time. These bugs brings non-determinism in the response and hence causes it difficult to replicate and debug.