For x86_64, LLVMABIAlignmentOfType(i128) reports 8. However I think 16
is a better number for two reasons:
1. Better machine code when loading into SIMD register.
2. The C ABI wants 16 for extern structs.
ZIR instructions updated: atomic_load, atomic_rmw, atomic_store, cmpxchg
These no longer construct a pointer type as the result location. This
solves a TODO that was preventing the pointer from possibly being
volatile, as well as properly handling allowzero and addrspace.
It also allows the pointer to be over-aligned, which may be needed
depending on the target. As a consequence, the element type needs to be
communicated in the ZIR. This is done by strategically making one of the
operands be ResultLoc.ty instead of ResultLoc.coerced_ty if possible, or
otherwise explicitly adding elem_type into the ZIR encoding, such as in
the case of atomic_load.
The pointer type of atomic operations is now checked in Sema by coercing
it to an expected pointer type, that maybe over-aligned according to
target requirements.
Together with the previous commit, Zig now has smaller alignment for
large integers, depending on the target, and yet still has type safety
for atomic operations that specially require higher alignment.
Prior to this commit, the logic for ABI size and ABI alignment for
integers was naive and incorrect. This results in wasted hardware as
well as undefined behavior in the LLVM backend when we memset an
incorrect number of bytes to 0xaa due to disagreeing with LLVM about the
ABI size of integers.
This commit introduces a "max int align" value which is different per
Target. This value is used to derive the ABI size and alignment of all
integers.
This commit makes an interesting change from stage1, which treats
128-bit integers as 16-bytes aligned for x86_64-linux. stage1 is
incorrect. The maximum integer alignment on this system is only 8 bytes.
This change breaks the behavior test called "128-bit cmpxchg" because on
that target, 128-bit cmpxchg does require a 16-bytes aligned pointer to
a 128 bit integer. However, this alignment property does not belong on
*all* 128 bit integers - only on the pointer type in the `@cmpxchg`
builtin function prototype. The user can then use an alignment override
annotation on a 128-bit integer variable or struct field to obtain such
a pointer.
With this improved iterator, type of test is now inferred from
the filename, enabling us to put all cases in one common parent
directory, and iterate over that, thus automating a lot of tasks.
Adding `unreachable` prevents the futex code from being inspected during
a single-threaded build. Without futex, first draft BYOS packages don't
need to implement `nanosleep` to get a single-threaded "hello world"
program working.
Use of `assert()` did not achieve the desired effect of avoiding futex
in a single-threaded build.
This was a bit trickier than it should be due to symbol conflicts with
zig's compiler-rt implementation. We attempt to use weak linkage in
our compiler-rt, but this does not seem to be working in all cases. I
manually disabled export of the problematic compiler-rt math functions
in order to cross compile musl's libc.so for all targets as input to
`tools/gen_stubs.zig`.
Other than that, this update went fairly smoothly. Quite a few
additional symbols were added to the blacklist in `tools/gen_stubs.zig`
due to recent reorganization of zig's compiler-rt.
* outputs can have names and be referenced with template replacements
the same as inputs.
* fix print_air.zig not decoding correctly.
* LLVM backend: use a table for template names for simplicity
The previous float-parsing method was lacking in a lot of areas. This
commit introduces a state-of-the art implementation that is both
accurate and fast to std.
Code is derived from working repo https://github.com/tiehuis/zig-parsefloat.
This includes more test-cases and performance numbers that are present
in this commit.
* Accuracy
The primary testing regime has been using test-data found at
https://github.com/tiehuis/parse-number-fxx-test-data. This is a fork of
upstream with support for f128 test-cases added. This data has been
verified against other independent implementations and represents
accurate round-to-even IEEE-754 floating point semantics.
* Performance
Compared to the existing parseFloat implementation there is ~5-10x
performance improvement using the above corpus. (f128 parsing excluded
in below measurements).
** Old
$ time ./test_all_fxx_data
3520298/5296694 succeeded (1776396 fail)
________________________________________________________
Executed in 28.68 secs fish external
usr time 28.48 secs 0.00 micros 28.48 secs
sys time 0.08 secs 694.00 micros 0.08 secs
** This Implementation
$ time ./test_all_fxx_data
5296693/5296694 succeeded (1 fail)
________________________________________________________
Executed in 4.54 secs fish external
usr time 4.37 secs 515.00 micros 4.37 secs
sys time 0.10 secs 171.00 micros 0.10 secs
Further performance numbers can be seen using the
https://github.com/tiehuis/simple_fastfloat_benchmark/ repository, which
compares against some other well-known string-to-float conversion
functions. A breakdown can be found here:
0d9f020f1a/PERFORMANCE.md (commit-b15406a0d2e18b50a4b62fceb5a6a3bb60ca5706)
In summary, we are within 20% of the C++ reference implementation and
have about ~600-700MB/s throughput on a Intel I5-6500 3.5Ghz.
* F128 Support
Finally, f128 is now completely supported with full accuracy. This does
use a slower path which is possible to improve in future.
* Behavioural Changes
There are a few behavioural changes to note.
- `parseHexFloat` is now redundant and these are now supported directly
in `parseFloat`.
- We implement round-to-even in all parsing routines. This is as
specified by IEEE-754. Previous code used different rounding
mechanisms (standard was round-to-zero, hex-parsing looked to use
round-up) so there may be subtle differences.
Closes#2207.
Fixes#11169.
* edwards25519: fix X coordinate of the base point
Reported by @OfekShochat -- Thanks!
* edwards25519: reduce public scalar when the top bit is set, not cleared
This is an optimization for the unexpected case of a scalar
larger than the field size.
Fixes#11563
* edwards25519: add a test implicit reduction of invalid scalars
Instead of doing heterogeneous comparison at comptime. This makes the
following test pass (as it already does for stage1):
```zig
test {
const x: f64 = 12.34;
expect(x == 12.34);
}
```
There is already behavior test coverage for this, however, other bugs in
`std.fmt.parseFloat` are masking the failures.
From a language specification perspective, this makes sense because it
makes comptime comparisons with comptime_float work the same way they
work with runtime comparisons.
Maybe after we have incremental compilation metadata serialization
and non-LLVM backends, it will make sense to switch this back. For now,
however, this makes successive `zig build` commands much faster.
Just like for Struct in 8238d4b33585a715c58ab559cd001dd3ea1db55b, in the
case of ErrorUnion struct we need to return a compound literal "(T){...}"
instead of just "{}", which is invalid code when used in e.g. a "return"
expression.
* Sema: Correctly determine whether array_cat lhs and rhs are single ptrs
Many-pointers are also not single-pointers and wouldn't be considered
here. This commit makes the conditions use the appropriately-named
isSinglePointer instead.
* Sema: Correctly obtain ArrayInfo for many-pointer concatenation
Many-pointers at comptime have a known size like slices and can be used
in array concatenation. This fixes a stage1 regression.
* test: Add comptime manyptr concatenation test
Co-authored-by: sin-ack <sin-ack@users.noreply.github.com>
Instead, just return ChildProcess directly. This structure does not
require a stable address, so we can put it on the stack just fine. If
someone wants it on the heap they should do.
const proc = try allocator.create(ChildProcess);
proc.* = ChildProcess.init(args, allocator);