* Update the AEGIS specification URL to the current draft
* std.crypto.auth: add AEGIS MAC
The Pelican-based authentication function of the AEGIS construction
can be used independently from authenticated encryption, as a faster
and more secure alternative to GHASH/POLYVAL/Poly1305.
We already expose GHASH, POLYVAL and Poly1305 for use outside AES-GCM
and ChaChaPoly, so there are no reasons not to expose the MAC from AEGIS
as well.
Like other 128-bit hash functions, finding a collision only requires
~2^64 attempts or inputs, which may still be acceptable for many
practical applications.
Benchmark (Apple M1):
siphash128-1-3: 3222 MiB/s
ghash: 8682 MiB/s
aegis-128l mac: 12544 MiB/s
Benchmark (Zen 2):
siphash128-1-3: 4732 MiB/s
ghash: 5563 MiB/s
aegis-128l mac: 19270 MiB/s
PR #12837 handled control flow for break and return, but I forgot
about `continue`. This is effectively another break, so we just
need another `.restore_err_ret_index` ZIR instruction.
Resolves#13618.
* Rely on libSystem when targeting macOS.
* Make tools/gen_outline_atomics.zig more idiomatic.
* Remove the CPU detection / auxval checking from compiler_rt. This
functionality belongs in a different component. Zig's compiler_rt
must not rely on constructors. Instead it will export a symbol for
setting the value, and start code can detect and activate it.
* Remove the separate logic for inline assembly when the target does or
does not have lse support. `.inst` works in both cases.
POLYVAL is GHASH's little brother, required by the AES-GCM-SIV
construction. It's defined in RFC8452.
The irreducible polynomial is a mirror of GHASH's (which doesn't
change anything in our implementation that didn't reverse the raw
bits to start with).
But most importantly, POLYVAL encodes byte strings as little-endian
instead of big-endian, which makes it a little bit faster on the
vast majority of modern CPUs.
So, both share the same code, just with comptime magic to use the
correct endianness and only double the key for GHASH.
Closes#7484. Right now for UEFI targets an alignment
of 32 is being used for no reason other than support
a rare bytecode. As this is far from the standard case,
removing this alignment and using the default one,
as most toolchains do, should be the desired behavior.
I wasn't able to create a reduced test case for this but the reasoning
can be seen in `abiAlignmentAdvancedUnion` where if `strat` was lazy
`hasRuntimeBitsAdvanced` would be given `null` instead of `sema`
which would cause eager evaluation when it is not valid or desired.
This also modifies the inline assembly to be more optimizable - instead of
doing explicit movs, we instead communicate to LLVM which registers we
would like to, somehow, have the correct values. This is how the x86_64
code already worked and thus allows the code to be unified across the
two architectures.
As a bonus, I threw in x86 support.