Frank Denis 7cfeae1ce7
std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs (#13566)
* std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs

Carryless multiplication was slow on older Intel CPUs, justifying
the need for using Karatsuba multiplication.

This is not the case any more; using 4 multiplications to multiply
two 128-bit numbers is actually faster than 3 multiplications +
shifts and additions.

This is also true on aarch64.

Keep using Karatsuba only when targeting x86 (granted, this is a bit
of a brutal shortcut, we should really list all the CPU models that
had a slow clmul instruction).

Also remove useless agg_2 treshold and restore the ability to
precompute only H and H^2 in ReleaseSmall.

Finally, avoid using u256. Using 128-bit registers is actually faster.

* Use a switch, add some comments
2022-11-17 13:07:07 +01:00
..
2022-10-18 10:18:09 -07:00
2022-08-04 18:02:01 -07:00
2022-08-04 18:09:10 -07:00
2022-11-04 00:09:27 +03:30
2022-11-04 00:09:27 +03:30