I wasn't able to create a reduced test case for this but the reasoning
can be seen in `abiAlignmentAdvancedUnion` where if `strat` was lazy
`hasRuntimeBitsAdvanced` would be given `null` instead of `sema`
which would cause eager evaluation when it is not valid or desired.
This also modifies the inline assembly to be more optimizable - instead of
doing explicit movs, we instead communicate to LLVM which registers we
would like to, somehow, have the correct values. This is how the x86_64
code already worked and thus allows the code to be unified across the
two architectures.
As a bonus, I threw in x86 support.
There was no check for linker errors after flushing,
which meant that if the link failed the build would
continue and try to copy the non-existant exe, and
also write the manifest as if it had succeeded.
Also adds parsing of lld output, which is surfaced at the
end of the compilation with the other errors instead
of via stderr
While it is already mentioned on the `items` attributes of the structs, it is
interesting to comment in every method potentially invalidating pointers to items
that they may do so.
fixes#12877
Current implementation (before this fix) observes number of waiters when
broadcast occurs and then makes that number of wakeups.
If we have multiple threads waiting for wakeup which immediately go into
wait if wakeup is not for that thread (as described in the issue). The
same thread can get multiple wakeups while some got none.
That is not consistent with documented behavior for condition variable
broadcast: `Unblocks all threads currently blocked in a call to wait()
or timedWait() with a given Mutex.`.
This fix ensures that the thread waiting on futext is woken up on futex wake.
* std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs
Carryless multiplication was slow on older Intel CPUs, justifying
the need for using Karatsuba multiplication.
This is not the case any more; using 4 multiplications to multiply
two 128-bit numbers is actually faster than 3 multiplications +
shifts and additions.
This is also true on aarch64.
Keep using Karatsuba only when targeting x86 (granted, this is a bit
of a brutal shortcut, we should really list all the CPU models that
had a slow clmul instruction).
Also remove useless agg_2 treshold and restore the ability to
precompute only H and H^2 in ReleaseSmall.
Finally, avoid using u256. Using 128-bit registers is actually faster.
* Use a switch, add some comments
The packed struct example was mistakenly applying endianness where it
shouldn't have been. This wasn't being caught because we don't currently
test the examples on Big-endian systems.
I updated the test to remove the endianness where it didn't apply, and
added a new part of the test to demonstrate when it would apply.
perf_event_attr.type needs to take a runtime defined value to enable
dynamic PMU:s, such as kprobe and uprobe. This value can exceed
predefined values defined in the linux headers.
reference: perf_event_open(2) man page