ML-DSA is a post-quantum signature scheme that was recently
standardized by NIST.
Keys and signatures are pretty large, not making it a drop-in
replacement for classical signature schemes.
But if you are shipping keys that may still be used in 10 years
or whenever large quantum computers able to break ECC arrive,
it that ever happens, and you don't have the ability to replace
these keys, ML-DSA is for you.
Performance is great, verification is faster than Ed25519 / ECDSA.
I tried manual vectorization, but it wasn't worth it, the compiler
does at good job at auto-vectorization already.
This change removes the ref_start_index from the possible enum values of
Index and OptionalIndex. It is not really a index, but a constant that
tells the offset of static Refs, so lets move it where such constant
belongs i.e. to the Ref.
It was not obvious that the KT128/KT256 customization string can be
used to set a key, or what it was designed to be used for at all.
Also properly use key_length and not digest_length for the BLAKE3
key length (no practical changes as they are both 32, but that was
confusing).
Remove unneeded simd_degree copies by the way, and that doesn't need
to be in the public interface.
If we use `undefined`, then `netReceive` can `@intCast` the
control slice len to msghdr controllen, which is sometimes `u32`,
even on 64-bit platforms.
`init` just avoids this entirely by setting `control` to an empty
slice rather than undefined.
68d2f68ed introduced special handling for StructInit fields
containing multiline strings to prevent inserting whitespace after =.
However, this logic didn't handle cases without a trailing comma,
which resulted in unwanted trailing whitespace.
The subsystem detection was flaky and often incorrect and was not
actually needed by the compiler or standard library. The actual
subsystem won't be known until at link time, so it doesn't make
sense to try to determine it at compile time.
* threaded K12: separate context computation from thread spawning
Compute all contexts and store them in a pre-allocated array,
then spawn threads using the pre-computed contexts.
This ensures each context is fully materialized in memory with the
correct values before any thread tries to access it.
* kt128: unroll the permutation rounds only twice
This appears to deliver the best performance thanks to improved cache
utilization, and it’s consistent with what we already do for SHA3.
KT128 and KT256 are fast, secure cryptographic hash functions based on Keccak (SHA-3).
They can be seen as the modern version of SHA-3, and evolution of SHAKE, with better performance.
After the SHA-3 competition, the Keccak team proposed these variants in 2016, and the constructions underwent 8 years of public scrutiny before being standardized in October 2025 as RFC 9861.
They uses a tree-hashing mode on top of TurboSHAKE, providing both high security and excellent performance, especially on large inputs.
They support arbitrary-length output and optional customization strings.
Hashing of very large inputs can be done using multiple threads, for high throughput.
KT128 provides 128-bit security strength, equivalent to AES-128 and SHAKE128, which is sufficient for virtually all applications.
KT256 provides 256-bit security strength, equivalent to SHA-512. For virtually all applications, KT128 is enough (equivalent to SHA-256 or BLAKE3).
For small inputs, TurboSHAKE128 and TurboSHAKE256 (which KT128 and KT256 are based on) can be used instead as they have less overhead.