mirror of
https://github.com/ziglang/zig.git
synced 2026-01-26 09:15:24 +00:00
Gives a ~40% speedup on x86_64. However, the generic code remains faster on aarch64. This is still processing only one block at a time for now. I'm pretty confident that processing more blocks per round will eventually give a substantial performance improvement on all platforms with vector units.