The previous float-parsing method was lacking in a lot of areas. This
commit introduces a state-of-the art implementation that is both
accurate and fast to std.
Code is derived from working repo https://github.com/tiehuis/zig-parsefloat.
This includes more test-cases and performance numbers that are present
in this commit.
* Accuracy
The primary testing regime has been using test-data found at
https://github.com/tiehuis/parse-number-fxx-test-data. This is a fork of
upstream with support for f128 test-cases added. This data has been
verified against other independent implementations and represents
accurate round-to-even IEEE-754 floating point semantics.
* Performance
Compared to the existing parseFloat implementation there is ~5-10x
performance improvement using the above corpus. (f128 parsing excluded
in below measurements).
** Old
$ time ./test_all_fxx_data
3520298/5296694 succeeded (1776396 fail)
________________________________________________________
Executed in 28.68 secs fish external
usr time 28.48 secs 0.00 micros 28.48 secs
sys time 0.08 secs 694.00 micros 0.08 secs
** This Implementation
$ time ./test_all_fxx_data
5296693/5296694 succeeded (1 fail)
________________________________________________________
Executed in 4.54 secs fish external
usr time 4.37 secs 515.00 micros 4.37 secs
sys time 0.10 secs 171.00 micros 0.10 secs
Further performance numbers can be seen using the
https://github.com/tiehuis/simple_fastfloat_benchmark/ repository, which
compares against some other well-known string-to-float conversion
functions. A breakdown can be found here:
0d9f020f1a/PERFORMANCE.md (commit-b15406a0d2e18b50a4b62fceb5a6a3bb60ca5706)
In summary, we are within 20% of the C++ reference implementation and
have about ~600-700MB/s throughput on a Intel I5-6500 3.5Ghz.
* F128 Support
Finally, f128 is now completely supported with full accuracy. This does
use a slower path which is possible to improve in future.
* Behavioural Changes
There are a few behavioural changes to note.
- `parseHexFloat` is now redundant and these are now supported directly
in `parseFloat`.
- We implement round-to-even in all parsing routines. This is as
specified by IEEE-754. Previous code used different rounding
mechanisms (standard was round-to-zero, hex-parsing looked to use
round-up) so there may be subtle differences.
Closes#2207.
Fixes#11169.
* unify the logic for exporting math functions from compiler-rt,
with the appropriate suffixes and prefixes.
- add all missing f128 and f80 exports. Functions with missing
implementations call other functions and have TODO comments.
- also add f16 functions
* move math functions from freestanding libc to compiler-rt (#7265)
* enable all the f128 and f80 code in the stage2 compiler and behavior
tests (#11161).
* update std lib to use builtins rather than `std.math`.
This also addresses a nit from #10133 where IntT might be a confusing
name because it might imply signed integer (iX, not uX). We settled on
TBits for math/float.zig so I've applied that change here too.
When I originally wrote ldexp() I copied the name from parse_hex_float.
I consider this an interim workaround/hack until #1299 is finished.
There is a bug in the original C implementation of the errol3 (and errol4)
algorithm that can result in undefined behavior or an obviously incorrect
result (leading ':' in the output)
This change checks for those two problems and uses a slower fallback
path if they occur. I can't guarantee that this will always produce
the correct result, but since the workaround is only used if the original
algorithm is guaranteed to fail, it should never turn a previously-correct
result into an incorrect one.
Fixes#11283
This commit fixes an out of bounds write that can occur when
formatting certain float values. The write messes up the stack and
causes incorrect results, segfaults, or nothing at all, depending on the
optimization mode used.
The `errol` function writes the digits of the float into `buffer`
starting from index 1, leaving index 0 untouched, and returns `buffer[1..]`
and the exponent. This is because `roundToPrecision` relies on index 0 being
unused in case the rounding adds a digit (e.g rounding 999.99
to 1000.00). When this happens, pointer arithmetic is used
[here](0e6d2184ca/lib/std/fmt/errol.zig (L61-L65))
to access index 0 and put the ones digit in the right place.
However, `errol3u` contains two special cases: `errolInt` and `errolFixed`,
which return from the function early. For these two special cases
index 0 was never reserved, and the return value contains `buffer`
instead of `buffer[1..]`. This causes the pointer arithmetic in
`roundToPrecision` to write out of bounds, which in the case of
`std.fmt.formatFloatDecimal` messes up the stack and causes undefined behavior.
The fix is to move the slicing of `buffer` to `buffer[1..]` from `errol3u`
to `errol` so that both the default and the special cases operate on the sliced
buffer.
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.
Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
* Add support for recursive objects to std.json.parse
* Remove previously defined error set
* Try with function which returns an error set
* Don't analyze already inferred types
* Add comptime to inferred_type parameter
* Make ParseInternalError to accept only a single argument
* Add public `ParseError` for `parse` function
* Use error.Foo syntax for errors instead of a named error set
* Better formatting
* Update to latest code changes
Should be good enough to unblock progress on the stage2 compiler.
Unifying this parser and the regular one (and perhaps rewrite it, #2207)
is left as an exercise for the reader.
Comparisons with absolute epsilons are usually useful when comparing
numbers to zero, for non-zero numbers it's advised to switch to relative
epsilons instead to obtain meaningful results (check [1] for more
details).
The new API introduces approxEqAbs and approxEqRel, where the former
aliases and deprecated the old `approxEq`, allowing the user to pick the
right tool for the job.
The documentation is meant to guide the user in the choice of the
correct alternative.
[1] https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/