This ensures that the Zig version will be re-computed when jumping
through the source tree, which is especially important if bisecting
across AstGen- or other changes that must not use the old cache.
* Add tagName to Value which behaves like @tagName.
* Add hashUncoerced to Value as an alternative to hash when we want to
produce the same hash for value that can coerce to each other.
* Hash owner_decl instead of module_fn in Sema.instantiateGenericCall
since Module.Decl.Index is not affected by ASLR like *Module.Fn was,
and also because GenericCallAdapter.eql was already doing this.
* Use Value.hashUncoerced in Sema.instantiateGenericCall because
GenericCallAdapter.eql uses Value.eqlAdvanced to compare args, which
ignores coersions.
* Add revealed missing cases to Value.eqlAdvanced.
Without these changes, we were breaking the hash contract for
monomorphed_funcs, and were generating different hashes for values that
compared equal. This resulted in a 0.2% chance when compiling
self-hosted of producing a different output, which depended on
fingerprint collisions of hashes that were affected by ASLR. Normally,
the different hashes would have resulted in equal checks being skipped,
but in the case of a fingerprint collision, the truth would be revealed
and the compiler's behavior would diverge.
These functions have a very error-prone API. They are essentially
`all(cmp(op, ...))` but that's not reflected in the name.
This renames these functions to `compareAllAgainstZero...` etc.
for clarity and fixes >20 locations where the predicate was
incorrect.
In the future, the scalar `compare` should probably be split off
from the vector comparison. Rank-polymorphic programming is great,
but a proper implementation in Zig would decouple comparison and
reduction, which then needs a way to fuse ops at comptime.
Adds optimizations for by-ref types to:
- .struct_field_val
- .slice_elem_val
- .ptr_elem_val
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.
Resolves https://github.com/ziglang/zig/issues/12713
Resolves https://github.com/ziglang/zig/issues/12638
This change makes any of the `*_val` instructions check whether it's
safe to elide copies for by-ref types rather than performing this
elision blindly.
AIR instructions fixed:
- .array_elem_val
- .struct_field_val
- .unwrap_errunion_payload
- .try
- .optional_payload
These now all respect value semantics, as expected.
P.S. Thanks to Andrew for the new way to approach this. Many of the
lines here are from his recommended change, which comes with the
significant advantage that loads are now as small as the intervening
memory access allows.
Co-authored by: Andrew Kelley <andrew@ziglang.org>
This is a follow-up to 9dc98fba, which made comptime initialization
patterns for union/struct more robust, especially when storing to
comptime-known pointers (and globals).
Resolves#13063.
These constants were read as a block count in initForBlockCount()
but at the same time, as a size in update().
The unit could be blocks or bytes, but we should use the same one
everywhere.
So, use blocks as intended.
Fixes#13506
On Windows, lld-link resolves PDB output paths using `/` and embeds the
result in the final executable, which breaks some native tooling like
WPR/WPA. This commit overrides the default behavior of lld-link by
explicitly setting the output PDB filename and binary-embedded path.
We postpone emitting debug info until *after* we generate the function
so that we have an idea of the consumed stack space. The stack offsets
encoded within DWARF are with respect to the frame pointer `.fp`.
- the meaning of packed structs changed in zig 0.10. adjust accordingly.
Use "extern struct" for the cases that directly map to C structs.
- Add new type info kinds, like enum64 and DeclTag
- change the Type enum to use the canonical names from libbpf.
This is more predictable when comparing with external BPF
documentation (than invented synonyms that need to be guessed)
This service stopped working two days ago for unknown reasons. Until it
is determined how to get it working again, or we switch to a different
CI provider for aarch64, this CI test coverage is disabled so that
we can continue to use the CI for other targets.
* std.crypto: make ghash faster, esp. for small messages
Aggregated reduction requires 5 additional multiplications (to
precompute the powers of H), in order to save 2 multiplications
per batch.
So, only use large batches when it's actually interesting to do so.
For the last blocks, reuse the precomputations in order to perform
a single reduction.
Also, even in .ReleaseSmall, allow 2-block aggregation.
The speedup is worth it, and the code increase is reasonable.
And in .ReleaseFast, bump the upper batch size up to 16.
Leverage comptime by the way instead of duplicating code.
std/crypto/benchmark.zig on Apple M1:
Zig 0.10.0: 2769 MiB/s
Before: 6014 MiB/s
After: 7334 MiB/s
Normalize function names by the way.
* Change clmul() to accept the half to be processed
This avoids a bunch of truncate() calls.
* Add more ghash tests to check all code paths
* crypto.core.aes: process 6 block in parallel instead of 8 on aarch64
At least on Apple Silicon, this is slightly faster than 8 blocks.
* AES: add parallel blocks for tigerlake, rocketlake, alderlake, zen3
...instead of hard-coding it to 20.
- This is consistent with the ChaCha implementation
- NaCl and libsodium, that this API is designed to interop with,
also support 8 and 12 round variants. The 12 round variant, in
particular, provides the same security level as the 20 round variant,
but is obviously faster.
- scrypt currently uses its own non optimized version of Salsa, just
because it use 8 rounds instead of 20. This will help remove code
duplication.
No behavior nor public API changes. The Salsa20 and XSalsa20 still
represent the 20-round variant.