ML-DSA is a post-quantum signature scheme that was recently
standardized by NIST.
Keys and signatures are pretty large, not making it a drop-in
replacement for classical signature schemes.
But if you are shipping keys that may still be used in 10 years
or whenever large quantum computers able to break ECC arrive,
it that ever happens, and you don't have the ability to replace
these keys, ML-DSA is for you.
Performance is great, verification is faster than Ed25519 / ECDSA.
I tried manual vectorization, but it wasn't worth it, the compiler
does at good job at auto-vectorization already.
This change removes the ref_start_index from the possible enum values of
Index and OptionalIndex. It is not really a index, but a constant that
tells the offset of static Refs, so lets move it where such constant
belongs i.e. to the Ref.
It was not obvious that the KT128/KT256 customization string can be
used to set a key, or what it was designed to be used for at all.
Also properly use key_length and not digest_length for the BLAKE3
key length (no practical changes as they are both 32, but that was
confusing).
Remove unneeded simd_degree copies by the way, and that doesn't need
to be in the public interface.
If we use `undefined`, then `netReceive` can `@intCast` the
control slice len to msghdr controllen, which is sometimes `u32`,
even on 64-bit platforms.
`init` just avoids this entirely by setting `control` to an empty
slice rather than undefined.
68d2f68ed introduced special handling for StructInit fields
containing multiline strings to prevent inserting whitespace after =.
However, this logic didn't handle cases without a trailing comma,
which resulted in unwanted trailing whitespace.
The subsystem detection was flaky and often incorrect and was not
actually needed by the compiler or standard library. The actual
subsystem won't be known until at link time, so it doesn't make
sense to try to determine it at compile time.
* threaded K12: separate context computation from thread spawning
Compute all contexts and store them in a pre-allocated array,
then spawn threads using the pre-computed contexts.
This ensures each context is fully materialized in memory with the
correct values before any thread tries to access it.
* kt128: unroll the permutation rounds only twice
This appears to deliver the best performance thanks to improved cache
utilization, and it’s consistent with what we already do for SHA3.