Apple M1/M2 have an EOR3 instruction that can XOR 2 operands with
another one, and LLVM knows how to take advantage of it.
However, two EOR can't be automatically combined into an EOR3 if
one of them is in an assembly block.
That simple change speeds up ciphers doing an AES round immediately
followed by a XOR operation on Apple Silicon.
Before:
aegis-128l mac: 12534 MiB/s
aegis-256 mac: 6722 MiB/s
aegis-128l: 10634 MiB/s
aegis-256: 6133 MiB/s
aes128-gcm: 3890 MiB/s
aes256-gcm: 3122 MiB/s
aes128-ocb: 2832 MiB/s
aes256-ocb: 2057 MiB/s
After:
aegis-128l mac: 15667 MiB/s
aegis-256 mac: 8240 MiB/s
aegis-128l: 12656 MiB/s
aegis-256: 7214 MiB/s
aes128-gcm: 3976 MiB/s
aes256-gcm: 3202 MiB/s
aes128-ocb: 2835 MiB/s
aes256-ocb: 2118 MiB/s
Today I found out that posix_spawn is trash. It's actually implemented
on top of fork/exec inside of libc (or libSystem in the case of macOS).
So, anything posix_spawn can do, we can do better. In particular, what
we can do better is handle spawning of child processes that are
potentially foreign binaries. If you try to spawn a wasm binary, for
example, posix spawn does the following:
* Goes ahead and creates a child process.
* The child process writes "foo.wasm: foo.wasm: cannot execute binary file"
to stderr (yes, it prints the filename twice).
* The child process then exits with code 126.
This behavior is indistinguishable from the binary being successfully
spawned, and then printing to stderr, and exiting with a failure -
something that is an extremely common occurrence.
Meanwhile, using the lower level fork/exec will simply return ENOEXEC
code from the execve syscall (which is mapped to zig error.InvalidExe).
The posix_spawn behavior means the zig build runner can't tell the
difference between a failure to run a foreign binary, and a binary that
did run, but failed in some other fashion. This is unacceptable, because
attempting to excecve is the proper way to support things like Rosetta.
The TurboSHAKE paper just got published:
https://eprint.iacr.org/2023/342.pdf
and unlike the previous K12 paper, suggests 0x1F instead of 0x01
as the default value for "D".
* Fix SHA3 with streaming
Leftover bytes should be added to the buffer, not to the state.
(or, always to the state; we can and probably should eventually get
rid of the buffer)
Fixes#14851
* Add a test for SHA-3 with streaming
Man page for posix lists EMFILE, man page for linux ENFILE.
Also posix says "The mmap() function adds an extra reference to the file
associated with the file descriptor fildes which is not removed by a
subsequent close() on that file descriptor. This reference is removed
when there are no more mappings to the file."
It sounds counter-intuitive, that a process limit but no system limit can
be exceeeded.
As far as I understand, fildes is only used for file descriptor backed mmaps.
In Linux when interacting with the virtual file system when writing
in invalid value to a file the OS will return errno 22 (INVAL).
Instead of triggering an unreachable, this change now returns a
newly introduced error.InvalidArgument.
In Linux when writing to various files in the virtual file system,
for example /sys/fs/cgroup, if you write an invalid value to a file
you'll get errno 16.
This change allows for these specific cases to be caught instead of
being lumped together in UnexpectedError.
Previously, this API had pid, to be used on POSIX systems, and handle,
to be used on Windows.
This commit unifies the API, defining an Id type that is either the pid
or the HANDLE depending on the target OS.
This commit also prepares for the future by allowing one to import via
`std.process.Child` which is the fully qualified namespace that I intend
to migrate to in the future.
Also add `std.fs.has_executable_bit` for doing conditional compilation.
This adds the linux syscalls for chmod and fchmodat, as well as the
extern libc function declarations.
Only `fchmodat` is added to `std.os`, and it is not yet added to std.fs.
Make the Keccak permutation public, as it's useful for more than
SHA-3 (kMAC, SHAKE, TurboSHAKE, TupleHash, etc).
Our Keccak implementation was accepting f as a comptime parameter,
but always used 64-bit words and 200 byte states, so it actually
didn't work with anything besides f=1600.
That has been fixed. The ability to use reduced-round versions
was also added in order to support M14 and K12.
The state was constantly converted back and forth between bytes
and words, even though only a part of the state is actually used
for absorbing and squeezing bytes. It was changed to something
similar to the other permutations we have, so we can avoid extra
copies, and eventually add vectorized implementations.
In addition, the SHAKE extendable output function (XOF) was
added (SHAKE128, SHAKE256). It is required by newer schemes,
such as the Kyber post-quantum key exchange mechanism, whose
implementation is currently blocked by SHAKE missing from our
standard library.
Breaking change: `Keccak_256` and `Keccak_512` were renamed to
`Keccak256` and `Keccak512` for consistency with all other
hash functions.
And make it not do any installation, only objcopying. We already have
install steps for doing installation.
This commit also makes ObjCopyStep properly integrate with caching.
And additionally support writing files to source files. This means a
custom build step in zig's own build.zig is no longer needed for copying
zig.h because it is handled by WriteFileStep.