Frank Denis 5af89b3dcc
std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets (#15809)
* std.crypto.chacha: support larger vectors on AVX2 and AVX512 targets

Ryzen 7 7700, ChaCha20/8 stream, long outputs:

Generic: 3268 MiB/s
AVX2   : 6023 MiB/s
AVX512 : 8086 MiB/s

Bump the rand.chacha buffer a tiny bit to take advantage of this.
More than 8 blocks doesn't seem to make any measurable difference.

ChaChaPoly also gets a small performance boost from this, albeit
Poly1305 remains the bottleneck.

Generic:  707 MiB/s
AVX2   :  981 MiB/s
AVX512 : 1202 MiB/s

aarch64 appears to generally benefit from 4-way vectorization.

Verified on Apple Silicon, but also on a Cortex A72.
2023-05-22 20:33:35 +02:00
..
2023-04-30 18:16:04 -07:00
2023-05-16 20:39:01 -07:00
2023-05-22 13:13:57 +03:00
2023-04-30 18:16:04 -07:00
2023-04-30 18:16:04 -07:00
2023-04-30 18:16:04 -07:00
2023-05-22 13:34:39 +03:00
2022-12-13 13:14:20 +02:00
2023-04-30 18:16:04 -07:00
2023-04-22 13:09:15 +03:00
2023-04-30 18:16:04 -07:00
2023-04-28 13:24:43 -07:00
2023-04-20 15:17:07 -07:00
2023-05-17 06:06:41 +03:00
2023-04-30 18:16:04 -07:00
2023-04-30 18:16:04 -07:00
2023-04-30 18:16:04 -07:00
2023-04-23 21:06:21 +03:00
2023-04-30 18:16:04 -07:00
2023-04-30 18:16:04 -07:00
2023-04-23 21:06:21 +03:00
2023-04-23 21:06:21 +03:00
2023-03-17 17:50:25 +01:00
2023-04-28 13:24:43 -07:00
2023-05-11 20:31:50 +02:00
2023-04-30 18:16:04 -07:00
2023-04-30 18:16:04 -07:00
2022-04-15 17:01:01 -05:00