Floating-point Number Parsing With Perfect Accuracy at a Gigabyte Per Second
Parsing decimal numbers from strings of characters into binary types is a common but relatively expensive task. Parsing a single number can require hundreds of instructions and dozens of branches. Standard C functions may parse numbers at 200 MB/s while recent disks have bandwidths in the gigabytes per second. Number parsing becomes the bottleneck when ingesting CSV, JSON, or XML files containing numerical data. We consider the problem of rounding exactly to the nearest floating-point value. The general problem requires variable-precision arithmetic. We show that a relatively simple approach can be many times faster than the conventional algorithms often present in standard C and C++ libraries. We break the gigabyte per second barrier without sacrificing safety or accuracy. To ensure reproducibility, our work is available as open-source software. Our approach has been adopted by the standard library of the Go programming language for its ParseFloat
function.
About Daniel
Daniel is a computer science professor at the University of Quebec, contributor to major data-science open-source projects, and long-time blogger. He is @lemire on Twitter, and he blogs at https://lemire.me/.
Feel free to follow and connect with Daniel on Twitter, LinkedIn, and GitHub!
Have questions for Daniel?
Please submit them in this thread. Daniel would love to answer them!
Haven’t signed up for the free conference yet?
Grab your free tickets here