Frank Denis 72064eba23 std/crypto: vectorize BLAKE3
Gives a ~40% speedup on x86_64.

However, the generic code remains faster on aarch64.

This is still processing only one block at a time for now.

I'm pretty confident that processing more blocks per round
will eventually give a substantial performance improvement on
all platforms with vector units.
2020-10-25 21:13:14 -04:00
..
2020-10-18 18:24:36 +02:00
2020-09-11 20:02:41 -04:00
2020-10-25 21:13:14 -04:00
2020-09-04 05:15:03 +03:00
2020-09-11 20:02:41 -04:00
2020-10-19 15:15:43 +02:00
2020-10-07 04:34:09 -04:00
2020-10-17 21:06:54 -04:00
2020-09-04 22:49:14 +03:00
2020-09-22 05:12:21 -07:00