Faster BLAKE3 implementation (#25574)

This is a rewrite of the BLAKE3 implementation, with vectorization.

On Apple Silicon, the new implementation is about twice as fast as the previous one.

With AVX2, it is more than 4 times faster.

With AVX512, it is more than 7.5x faster than the previous implementation (from 678 MB/s to 5086 MB/s).
This commit is contained in:
Frank Denis 2025-10-15 14:03:56 +02:00 committed by GitHub
parent 70c21fdbab
commit 6669885aa2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

File diff suppressed because it is too large Load Diff