* move `ptrBitWidth` from Arch to Target since it needs to know about the abi
* double isn't always 8 bits
* AVR uses 1-byte alignment for everything in GCC
Also get rid of the TTY wrapper struct, which was exlusively used as a
namespace - this is done by the tty.zig root struct now.
detectTTYConfig has been renamed to just detectConfig, which is enough
given the new namespace. Additionally, a doc comment had been added.
* std.crypto: faster ghash and polyval on WebAssembly
Before: 91 MiB/s
After : 243 MiB/s
Some other platforms might benefit from this, but WebAssembly is
the obvious one (simd128 doesn't make a difference).
Also remove all the wasi-libc files we used to ship, but never compile.
The latest wasi-libc HEAD has an extra commit (a6f871343313220b76009827ed0153586361c0d5), which makes preopen initialization lazy.
Unfortunately, that breaks quite a lot of things on our end. Applications now need to explicitly call __wasilibc_populate_preopens() everywhere when the libc is linked. That can wait after 0.11.
Support for 64-bit counters was a hack built upon the version with
a 32-bit counter, that emulated a larger counter by splitting the
input into large blocks.
This is fragile, particularily if the initial counter is set to
a non-default value and if we have parallelism.
Simply add a comptime parameter to check if we have a 32 bit or a
64 bit counter instead.
Also convert a couple while() loops to for(), and change @panic()
to @compileError().
These operations are constant-time on most, if not all currently
supported architectures. However, even if they are not, this is not
a big deal in the case on Poly1305, as the key is added at the end.
The final addition remains protected.
SalsaPoly and ChaChaPoly do encrypt-then-mac, so side channels would
not leak anything about the plaintext anyway.
* Apple Silicon (M1)
Before: 2048 MiB/s
After : 2823 MiB/s
* AMD Ryzen 7
Before: 3165 MiB/s
After : 4774 MiB/s
* std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets
Ryzen 7 7700, ChaCha20/8 stream, long outputs:
Generic: 3268 MiB/s
AVX2 : 6023 MiB/s
AVX512 : 8086 MiB/s
Bump the rand.chacha buffer a tiny bit to take advantage of this.
More than 8 blocks doesn't seem to make any measurable difference.
ChaChaPoly also gets a small performance boost from this, albeit
Poly1305 remains the bottleneck.
Generic: 707 MiB/s
AVX2 : 981 MiB/s
AVX512 : 1202 MiB/s
aarch64 appears to generally benefit from 4-way vectorization.
Verified on Apple Silicon, but also on a Cortex A72.
A minimal set of simple, safe functions for Montgomery arithmetic,
designed for cryptographic primitives.
Also update the current RSA cert validation to use it, getting rid
of the FixedBuffer hack and the previous limitations.
Make the check of the RSA public key a little bit more strict by
the way.