Comptime code can't execute assembly code, so we need some way to
force comptime code to use the generic path. This should be replaced
with whatever is implemented for #868, when that day comes.
I am seeing that the result for the hash is incorrect in stage1 and
crashes stage2, so presumably this never worked correctly. I will follow
up on that soon.
This gets us most of the way back to the performance I had when
I was using the LLVM intrinsics:
- Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz:
190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s
- AMD EPYC 7763 (VM) @ 2.45 GHz:
240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s
- Apple M1:
216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s
Minor changes to this source can swing performance from 400 MB/s to
1400 MB/s or... 20 MB/s, depending on how it interacts with the
optimizer. I have a sneaking suspicion that despite LLVM inheriting
GCC's extremely strict inline assembly semantics, its passes are
rather skittish around inline assembly (and almost certainly, its
instruction cost models can assume nothing)
There's probably plenty of room to optimize these further in the
future, but for the moment this gives ~3x improvement on Intel
x86-64 processors, ~5x on AMD, and ~10x on M1 Macs.
These extensions are very new - Most processors prior to 2020 do
not support them.
AVX-512 is a slightly older alternative that we could use on Intel
for a much bigger performance bump, but it's been fused off on
Intel's latest hybrid architectures and it relies on computing
independent SHA hashes in parallel. In contrast, these SHA intrinsics
provide the usual single-threaded, single-stream interface, and should
continue working on new processors.
AArch64 also has SHA-512 intrinsics that we could take advantage
of in the future
Packed memory has a well-defined layout that doesn't require
conversion from an integer to read from. Let's use it :-)
This change means that for bitcasting to/from a packed value that
is N layers deep, we no longer have to create N temporary big-ints
and perform N copies.
Other miscellaneous improvements:
- Adds support for casting to packed enums and vectors
- Fixes bitcasting to/from vectors outside of a packed struct
- Adds a fast path for bitcasting <= u/i64
- Fixes bug when bitcasting f80 which would clear following fields
This also changes the bitcast memory layout of exotic integers on
big-endian systems to match what's empirically observed on our targets.
Technically, this layout is not guaranteed by LLVM so we should probably
ban bitcasts that reveal these padding bits, but for now this is an
improvement.
Similar to what was done for EdDSA, allow incremental creation
and verification of ECDSA signatures.
Doing so for ECDSA is trivial, and can be useful for TLS as well
as the future package manager.
This commit accepts unusual parameters like EcdsaP384Sha256.
Some certifictes(below certs are in /etc/ssl/certs/ca-certificates.crt on Ubuntu 22.04) use EcdsaP384Sha256 to sign itself.
- Subject: C=GR, L=Athens, O=Hellenic Academic and Research Institutions Cert. Authority, CN=Hellenic Academic and Research Institutions ECC RootCA 2015
- Subject: C=US, ST=Texas, L=Houston, O=SSL Corporation, CN=SSL.com EV Root Certification Authority ECC
- Subject: C=US, ST=Texas, L=Houston, O=SSL Corporation, CN=SSL.com Root Certification Authority ECC
In verify(), hash array `h` is allocated to be larger than the scalar.encoded_length.
The array is regarded as big-endian.
Hash values are filled in the back of the array and the rest bytes in front are filled with zero.
In sign(), the hash array is allocated and filled as same as verify().
In deterministicScalar(), hash bytes are insufficient to generate `k`
To generate `k` without narrowing its value range,
this commit uses algorithm stage h. in "Section 3.2 Generation of k" in RFC6979.
When an object file or binary contains the target_features section
we can now parse and then dump its contents in string format so
we can use them in our linker tests to verify the features section.
These ifs were missing a case for f80 which should have shifted by one,
but we can just compute the correct value instead. Also, we want the
fractional bits to be a multiple of four, not the mantissa bits, since
the mantissa could have a leading one which we want to be separated.
The Zig LLVM backend emits calls to softfloat methods with the "standard
compiler-rt" names. Rather than add complexity to the backend and
have to synchronize the naming scheme across all targets, the simplest
fix is just to export these symbols under both the "standard" and the
platform-specific naming convention.
This documents status of routines and adds the next work item
"Decimal float library routines", which are only recommended for
binary data. Complete absence of tests is also documented.
This does not document the various aliases, e.g. those for ARM.
Missing Integer library routines:
- __addvsi3
- __addvdi3
- __addvti3
- __addvdi3
- __addvti3
- __subvsi3
- __subvdi3
- __subvti3
- __subvdi3
- __subvti3
- __mulvsi3
- __mulvdi3
- __mulvti3
- __mulvdi3
- __mulvti3
Missing floating library routines:
- __powisf2
- __powidf2
- __powitf2
- __powixf2
Missing routines for symbol-level compatibility to gcc:
- __ashlsi3
- __ashrsi3
- __lshrsi3
cmp.zig was accidently being referenced twice, rather than importing
memcmp.zig. This means that its symbols were also not included in
the generated compiler-rt output.
This function is redundant with CType.sizeInBits(), and until the
previous commit they disagreed about the correct long double type
for several targets. Although they're all synced up now, it's much
simpler just to have a single source of truth.
These updates were made by testing against the `sizeof/_Alignof` reported
by Clang for all supported arch-OS-ABI combinations and correcting any
discrepancies.
This is bound to have a few errors (the recent long double fix for i386
Android is one example), but Clang is certainly not a bad place to start,
especially for our most popular targets.
Previously, we'd overwrite the errors in a circular buffer. Now that
error return traces are intended to follow a stack discipline, we no
longer have to support the index rolling over. By treating the trace
like a saturating stack, any pop/restore code still behaves correctly
past-the-end of the trace.
As a bonus, this adds a small blurb to let the user know when the trace
saturated and x number of frames were dropped.
This change extends the "lifetime" of the error return trace associated
with an error to continue throughout the block of a `const` variable
that it is assigned to.
This is necessary to support patterns like this one in test_runner.zig:
```zig
const result = foo();
if (result) |_| {
// ... success logic
} else |err| {
// `foo()` should be included in the error trace here
return error.TestFailed;
}
```
To make this happen, the majority of the error return trace popping logic
needed to move into Sema, since `const x = foo();` cannot be examined
syntactically to determine whether it modifies the error return trace. We
also have to make sure not to delete pertinent block information before it
makes it to Sema, so that Sema can pop/restore around blocks correctly.
* Why do this only for `const` and not `var`? *
There is room to relax things for `var`, but only a little bit. We could
do the same thing we do for const and keep the error trace alive for the
remainder of the block where the *assignment* happens. Any wider scope
would violate the stack discipline for traces, so it's not viable.
In the end, I decided the most consistent behavior for the user is just
to kill all error return traces assigned to a mutable `var`.
In order to enforce a strict stack discipline for error return traces,
we cannot track error return traces that are stored in variables:
```zig
const x = errorable(); // errorable()'s error return trace is killed here
// v-- error trace starts here instead
return x catch error.UnknownError;
```
In order to propagate error return traces, function calls need to be passed
directly to an error-handling expression (`if`, `catch`, `try` or `return`):
```zig
// When passed directly to `catch`, the return trace is propagated
return errorable() catch error.UnknownError;
// Using a break also works
return blk: {
// code here
break :blk errorable();
} catch error.UnknownError;
```
Why do we need this restriction? Without it, multiple errors can co-exist
with their own error traces. Handling that situation correctly means either:
a. Dynamically allocating trace memory and tracking lifetimes, OR
b. Allowing the production of one error to interfere with the trace of another
(which is the current status quo)
This is piece (3/3) of https://github.com/ziglang/zig/issues/1923#issuecomment-1218495574
This makes possible to query the memory map size from EFI firmware
without making any allocation beforehand. This makes possible to be
precise about the size of the allocation which will own a copy of
the memory map from the UEFI application.
This reverts commit 80ac022c4667e1995ccdf70fff90e5af26b6eb97.
I changed my mind on this one, sorry. I don't think this belongs in the
standard library.