Daniel Lemire - Floating-point Number Parsing With Perfect Accuracy at a Gigabyte Per Second

phoebe · November 19, 2020, 2:09am

Floating-point Number Parsing With Perfect Accuracy at a Gigabyte Per Second

Parsing decimal numbers from strings of characters into binary types is a common but relatively expensive task. Parsing a single number can require hundreds of instructions and dozens of branches. Standard C functions may parse numbers at 200 MB/s while recent disks have bandwidths in the gigabytes per second. Number parsing becomes the bottleneck when ingesting CSV, JSON, or XML files containing numerical data. We consider the problem of rounding exactly to the nearest floating-point value. The general problem requires variable-precision arithmetic. We show that a relatively simple approach can be many times faster than the conventional algorithms often present in standard C and C++ libraries. We break the gigabyte per second barrier without sacrificing safety or accuracy. To ensure reproducibility, our work is available as open-source software. Our approach has been adopted by the standard library of the Go programming language for its ParseFloat function.

About Daniel

Daniel is a computer science professor at the University of Quebec, contributor to major data-science open-source projects, and long-time blogger. He is @lemire on Twitter, and he blogs at https://lemire.me/.

Feel free to follow and connect with Daniel on Twitter, LinkedIn, and GitHub!

Have questions for Daniel?

Please submit them in this thread. Daniel would love to answer them!

Haven’t signed up for the free conference yet?

Grab your free tickets here

Topic		Replies	Views
Klaus Post - Serializing Data in Go Go Systems Conf 2020	2	2150	December 11, 2020
Benchmarking converting types to binary Misc	1	1142	September 21, 2016
Consider switching to Simdjson-go Dev rfc	5	1823	June 5, 2020
Manish Jain - High Performance Manual Memory Management in Go Go Systems Conf 2020	4	2099	December 3, 2020
Raphael 'kena' Poss - Can We Panic Yet? Error Handling in Go Go Systems Conf 2020	2	1453	December 13, 2020

Daniel Lemire - Floating-point Number Parsing With Perfect Accuracy at a Gigabyte Per Second

Floating-point Number Parsing With Perfect Accuracy at a Gigabyte Per Second

About Daniel

Have questions for Daniel?

Haven’t signed up for the free conference yet?

Related topics