* std.crypto: faster ghash and polyval on WebAssembly
Before: 91 MiB/s
After : 243 MiB/s
Some other platforms might benefit from this, but WebAssembly is
the obvious one (simd128 doesn't make a difference).
Also remove all the wasi-libc files we used to ship, but never compile.
The latest wasi-libc HEAD has an extra commit (a6f871343313220b76009827ed0153586361c0d5), which makes preopen initialization lazy.
Unfortunately, that breaks quite a lot of things on our end. Applications now need to explicitly call __wasilibc_populate_preopens() everywhere when the libc is linked. That can wait after 0.11.
Support for 64-bit counters was a hack built upon the version with
a 32-bit counter, that emulated a larger counter by splitting the
input into large blocks.
This is fragile, particularily if the initial counter is set to
a non-default value and if we have parallelism.
Simply add a comptime parameter to check if we have a 32 bit or a
64 bit counter instead.
Also convert a couple while() loops to for(), and change @panic()
to @compileError().
These operations are constant-time on most, if not all currently
supported architectures. However, even if they are not, this is not
a big deal in the case on Poly1305, as the key is added at the end.
The final addition remains protected.
SalsaPoly and ChaChaPoly do encrypt-then-mac, so side channels would
not leak anything about the plaintext anyway.
* Apple Silicon (M1)
Before: 2048 MiB/s
After : 2823 MiB/s
* AMD Ryzen 7
Before: 3165 MiB/s
After : 4774 MiB/s
* std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets
Ryzen 7 7700, ChaCha20/8 stream, long outputs:
Generic: 3268 MiB/s
AVX2 : 6023 MiB/s
AVX512 : 8086 MiB/s
Bump the rand.chacha buffer a tiny bit to take advantage of this.
More than 8 blocks doesn't seem to make any measurable difference.
ChaChaPoly also gets a small performance boost from this, albeit
Poly1305 remains the bottleneck.
Generic: 707 MiB/s
AVX2 : 981 MiB/s
AVX512 : 1202 MiB/s
aarch64 appears to generally benefit from 4-way vectorization.
Verified on Apple Silicon, but also on a Cortex A72.
A minimal set of simple, safe functions for Montgomery arithmetic,
designed for cryptographic primitives.
Also update the current RSA cert validation to use it, getting rid
of the FixedBuffer hack and the previous limitations.
Make the check of the RSA public key a little bit more strict by
the way.
This commit removes the `field_call_bind` and `field_call_bind_named` ZIR
instructions, replacing them with a `field_call` instruction which does the bind
and call in one.
`field_call_bind` is an unfortunate instruction. It's tied into one very
specific usage pattern - its result can only be used as a callee. This means
that it creates a value of a "pseudo-type" of sorts, `bound_fn` - this type used
to exist in Zig, but now we just hide it from the user and have AstGen ensure
it's only used in one way. This is quite silly - `Type` and `Value` should, as
much as possible, reflect real Zig types and values.
It makes sense to instead encode the `a.b()` syntax as its own ZIR instruction,
so that's what we do here. This commit introduces a new instruction,
`field_call`. It's like `call`, but rather than a callee ref, it contains a ref
to the object pointer (`&a` in `a.b()`) and the string field name (`b`). This
eliminates `bound_fn` from the language, and slightly decreases the size of
generated ZIR - stats below.
This commit does remove a few usages which used to be allowed:
- `@field(a, "b")()`
- `@call(.auto, a.b, .{})`
- `@call(.auto, @field(a, "b"), .{})`
These forms used to work just like `a.b()`, but are no longer allowed. I believe
this is the correct choice for a few reasons:
- `a.b()` is a purely *syntactic* form; for instance, `(a.b)()` is not valid.
This means it is *not* inconsistent to not allow it in these cases; the
special case here isn't "a field access as a callee", but rather this exact
syntactic form.
- The second argument to `@call` looks much more visually distinct from the
callee in standard call syntax. To me, this makes it seem strange for that
argument to not work like a normal expression in this context.
- A more practical argument: it's confusing! `@field` and `@call` are used in
very different contexts to standard function calls: the former normally hints
at some comptime machinery, and the latter that you want more precise control
over parts of a function call. In these contexts, you don't want implicit
arguments adding extra confusion: you want to be very explicit about what
you're doing.
Lastly, some stats. I mentioned before that this change slightly reduces the
size of ZIR - this is due to two instructions (`field_call_bind` then `call`)
being replaced with one (`field_call`). Here are some numbers:
+--------------+----------+----------+--------+
| File | Before | After | Change |
+--------------+----------+----------+--------+
| Sema.zig | 4.72M | 4.53M | -4% |
| AstGen.zig | 1.52M | 1.48M | -3% |
| hash_map.zig | 283.9K | 276.2K | -3% |
| math.zig | 312.6K | 305.3K | -2% |
+--------------+----------+----------+--------+
This is in preparation of removing indirect lowering again. Also
modifies constant() to accept a repr so that both direct as well
as indirect representations can be generated. Indirect is not yet
used, but will be used for globals.
Previously the tag type was generated even if it was nonexistant,
triggering an assertion that an integer type should never have
zero bits. Now its only generated when the tag type is actually emitted.
The first dereference of PtrAccessChain returns a pointer of the same type
as the base pointer, in contrast to AccessChain, where the first dereference
returns a pointer of the dereferenced type of the base pointer.
This instruction is not really working well in the LLVM SPIRV translator,
as it is not implemented.
This commit also intruces the constructStruct helper function to initialize
structs at runtime. This is ALSO buggy in the translator, and we must work
around OpCompositeConstruct not working when some of the constituents are
runtime-known only.
Some other improvements are made:
- improved variable() so that it is more useful and no longer requires the
address space. It always puts values in the Function address space,
and returns a pointer to the Generic address space
- adds a boolToInt utility function
This ensures that we can also cast enums and error sets here. In the future
this function will need to be changed to support composite and strange
integers, but that is fine.
It turns out that the Khronos LLVM SPIRV translator does not support OpPtrEqual.
Therefore, this instruction is emitted using a series of conversions.
This commit breaks intToEnum, because enum was removed from the arithmetic type
info. The enum should be converted to an int before this function is called.
When initializing a packed struct, we must ensure the result local
is zero'd. Previously we would do this by ensuring a new local is
allocated. Although a local is always zero by default, it meant that
if such an initialization was being done inside a loop, it would re-
use that very same local that could potentially still hold a different
value. Because this value is `or`'d with the value, it would result
in a miscompilation. By manually setting this result to 0, we guarantee
the correct behavior.