36 Commits

Author SHA1 Message Date
Andrew Kelley
e7b18a7ce6 std.crypto: remove inline from most functions
To quote the language reference,

It is generally better to let the compiler decide when to inline a
function, except for these scenarios:

* To change how many stack frames are in the call stack, for debugging
  purposes.
* To force comptime-ness of the arguments to propagate to the return
  value of the function, as in the above example.
* Real world performance measurements demand it. Don't guess!

Note that inline actually restricts what the compiler is allowed to do.
This can harm binary size, compilation speed, and even runtime
performance.

`zig run lib/std/crypto/benchmark.zig -OReleaseFast`
[-before-] vs {+after+}

              md5:        [-990-]        {+998+} MiB/s
             sha1:       [-1144-]       {+1140+} MiB/s
           sha256:       [-2267-]       {+2275+} MiB/s
           sha512:        [-762-]        {+767+} MiB/s
         sha3-256:        [-680-]        {+683+} MiB/s
         sha3-512:        [-362-]        {+363+} MiB/s
        shake-128:        [-835-]        {+839+} MiB/s
        shake-256:        [-680-]        {+681+} MiB/s
   turboshake-128:       [-1567-]       {+1570+} MiB/s
   turboshake-256:       [-1276-]       {+1282+} MiB/s
          blake2s:        [-778-]        {+789+} MiB/s
          blake2b:       [-1071-]       {+1086+} MiB/s
           blake3:       [-1148-]       {+1137+} MiB/s
            ghash:      [-10044-]      {+10033+} MiB/s
          polyval:       [-9726-]      {+10033+} MiB/s
         poly1305:       [-2486-]       {+2703+} MiB/s
         hmac-md5:        [-991-]        {+998+} MiB/s
        hmac-sha1:       [-1134-]       {+1137+} MiB/s
      hmac-sha256:       [-2265-]       {+2288+} MiB/s
      hmac-sha512:        [-765-]        {+764+} MiB/s
      siphash-2-4:       [-4410-]       {+4438+} MiB/s
      siphash-1-3:       [-7144-]       {+7225+} MiB/s
   siphash128-2-4:       [-4397-]       {+4449+} MiB/s
   siphash128-1-3:       [-7281-]       {+7374+} MiB/s
  aegis-128x4 mac:      [-73385-]      {+74523+} MiB/s
  aegis-256x4 mac:      [-30160-]      {+30539+} MiB/s
  aegis-128x2 mac:      [-66662-]      {+67267+} MiB/s
  aegis-256x2 mac:      [-16812-]      {+16806+} MiB/s
   aegis-128l mac:      [-33876-]      {+34055+} MiB/s
    aegis-256 mac:       [-8993-]       {+9087+} MiB/s
         aes-cmac:       2036 MiB/s
           x25519:      [-20670-]      {+16844+} exchanges/s
          ed25519:      [-29763-]      {+29576+} signatures/s
       ecdsa-p256:       [-4762-]       {+4900+} signatures/s
       ecdsa-p384:       [-1465-]       {+1500+} signatures/s
  ecdsa-secp256k1:       [-5643-]       {+5769+} signatures/s
          ed25519:      [-21926-]      {+21721+} verifications/s
          ed25519:      [-51200-]      {+50880+} verifications/s (batch)
 chacha20Poly1305:       [-1189-]       {+1109+} MiB/s
xchacha20Poly1305:       [-1196-]       {+1107+} MiB/s
 xchacha8Poly1305:       [-1466-]       {+1555+} MiB/s
 xsalsa20Poly1305:        [-660-]        {+620+} MiB/s
      aegis-128x4:      [-76389-]      {+78181+} MiB/s
      aegis-128x2:      [-53946-]      {+53495+} MiB/s
       aegis-128l:      [-27219-]      {+25621+} MiB/s
      aegis-256x4:      [-49351-]      {+49542+} MiB/s
      aegis-256x2:      [-32390-]      {+32366+} MiB/s
        aegis-256:       [-8881-]       {+8944+} MiB/s
       aes128-gcm:       [-6095-]       {+6205+} MiB/s
       aes256-gcm:       [-5306-]       {+5427+} MiB/s
       aes128-ocb:       [-8529-]      {+13974+} MiB/s
       aes256-ocb:       [-7241-]       {+9442+} MiB/s
        isapa128a:        [-204-]        {+214+} MiB/s
    aes128-single:  [-133857882-]  {+134170944+} ops/s
    aes256-single:   [-96306962-]   {+96408639+} ops/s
         aes128-8: [-1083210101-] {+1073727253+} ops/s
         aes256-8:  [-762042466-]  {+767091778+} ops/s
           bcrypt:      0.009 s/ops
           scrypt:      [-0.018-]      {+0.017+} s/ops
           argon2:      [-0.037-]      {+0.060+} s/ops
      kyber512d00:     [-206057-]     {+205779+} encaps/s
      kyber768d00:     [-156074-]     {+150711+} encaps/s
     kyber1024d00:     [-116626-]     {+115469+} encaps/s
      kyber512d00:     [-181149-]     {+182046+} decaps/s
      kyber768d00:     [-136965-]     {+135676+} decaps/s
     kyber1024d00:     [-101307-]     {+100643+} decaps/s
      kyber512d00:     [-123624-]     {+123375+} keygen/s
      kyber768d00:      [-69465-]      {+70828+} keygen/s
     kyber1024d00:      [-43117-]      {+43208+} keygen/s
2025-07-13 18:26:13 +02:00
Alex Rønne Petersen
9d534790eb std.Target: Introduce Cpu convenience functions for feature tests.
Before:

* std.Target.arm.featureSetHas(target.cpu.features, .has_v7)
* std.Target.x86.featureSetHasAny(target.cpu.features, .{ .sse, .avx, .cmov })
* std.Target.wasm.featureSetHasAll(target.cpu.features, .{ .atomics, .bulk_memory })

After:

* target.cpu.has(.arm, .has_v7)
* target.cpu.hasAny(.x86, &.{ .sse, .avx, .cmov })
* target.cpu.hasAll(.wasm, &.{ .atomics, .bulk_memory })
2025-06-05 06:12:00 +02:00
Jacob Young
8c8dfb35f3 x86_64: fix crashes compiling the compiler and tests 2025-01-16 20:47:30 -05:00
Frank Denis
636308a17d
std.crypto.aes: introduce AES block vectors (#22023)
* std.crypto.aes: introduce AES block vectors

Modern Intel CPUs with the VAES extension can handle more than a
single AES block per instruction.

So can some ARM and RISC-V CPUs. Software implementations with
bitslicing can also greatly benefit from this.

Implement low-level operations on AES block vectors, and the
parallel AEGIS variants on top of them.

AMD Zen4:

      aegis-128x4:      73225 MiB/s
      aegis-128x2:      51571 MiB/s
       aegis-128l:      25806 MiB/s
      aegis-256x4:      46742 MiB/s
      aegis-256x2:      30227 MiB/s
        aegis-256:       8436 MiB/s
       aes128-gcm:       5926 MiB/s
       aes256-gcm:       5085 MiB/s

AES-GCM, and anything based on AES-CTR are also going to benefit
from this later.

* Make AEGIS-MAC twice a fast
2024-11-22 10:00:49 +01:00
Frank Denis
acba2645f7
crypto.aes.soft: use std.atomic.cache_line instead of a harcoded value (#22026) 2024-11-20 03:48:18 +00:00
mlugg
018262d537
std: update eval branch quotas after bdbc485
Also, update `std.math.Log2Int[Ceil]` to more efficient implementations
that don't use up so much damn quota!
2024-08-21 01:30:46 +01:00
Andrew Kelley
3fc6fc6812 std.builtin.Endian: make the tags lower case
Let's take this breaking change opportunity to fix the style of this
enum.
2023-10-31 21:37:35 -04:00
Jacob Young
d890e81761 mem: fix ub in writeInt
Use inline to vastly simplify the exposed API.  This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
2023-10-31 21:37:35 -04:00
Frank Denis
a0b35249a2
Replace hand-written endian-specific loads with std.mem.readInt*() (#16431)
And when we have the choice, favor little-endian because it's 2023.

Gives a slight performance improvement:

   md5: 552 -> 555 MiB/s
  sha1: 768 -> 786 MiB/s
sha512: 211 -> 217 MiB/s
2023-07-18 00:40:31 +02:00
mlugg
f26dda2117 all: migrate code to new cast builtin syntax
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:

* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
2023-06-24 16:56:39 -07:00
Frank Denis
0000b34a2d
crypto.aes: define optimal_parallel_blocks for more CPUs (#15829) 2023-05-23 19:47:11 +00:00
Veikka Tuominen
ebbc521a87 workaround AstGen's love for copying arrays 2023-05-16 11:37:25 +03:00
Frank Denis
5a12d00708
Move std.crypto.config options to std.options (#14906)
Options have been moved to a single namespace.
2023-03-14 06:40:23 +00:00
Frank Denis
9622991578
Add configurable side channels mitigations; enable them on soft AES (#13739)
* Add configurable side channels mitigations; enable them on soft AES

Our software AES implementation doesn't have any mitigations against
side channels.

Go's generic implementation is not protected at all either, and even
OpenSSL only has minimal mitigations.

Full mitigations against cache-based attacks (bitslicing, fixslicing)
come at a huge performance cost, making AES-based primitives pretty
much useless for many applications. They also don't offer any
protection against other classes of side channel attacks.

In practice, partially protected, or even unprotected implementations
are not as bad as it sounds. Exploiting these side channels requires
an attacker that is able to submit many plaintexts/ciphertexts and
perform accurate measurements. Noisy measurements can still be
exploited, but require a significant amount of attempts. Wether this
is exploitable or not depends on the platform, application and the
attacker's proximity.

So, some libraries made the choice of minimal mitigations and some
use better mitigations in spite of the performance hit. It's a
tradeoff (security vs performance), and there's no one-size-fits all
implementation.

What applies to AES applies to other cryptographic primitives.

For example, RSA signatures are very sensible to fault attacks,
regardless of them using the CRT or not. A mitigation is to verify
every produced signature. That also comes with a performance cost.
Wether to do it or not depends on wether fault attacks are part of
the threat model or not.

Thanks to Zig's comptime, we can try to address these different
requirements.

This PR adds a `side_channels_protection` global, that can later
be complemented with `fault_attacks_protection` and possibly other
knobs.

It can have 4 different values:

- `none`: which doesn't enable additional mitigations.
"Additional", because it only disables mitigations that don't have
a big performance cost. For example, checking authentication tags
will still be done in constant time.

- `basic`: which enables mitigations protecting against attacks in
a common scenario, where an attacker doesn't have physical access to
the device, cannot run arbitrary code on the same thread, and cannot
conduct brute-force attacks without being throttled.

- `medium`: which enables additional mitigations, offering practical
protection in a shared environement.

- `full`: which enables all the mitigations we have.

The tradeoff is that the more mitigations we enable, the bigger the
performance hit will be. But this let applications choose what's
best for their use case.

`medium` is the default.

Currently, this only affects software AES, but that setting can
later be used by other primitives.

For AES, our implementation is a traditional table-based, with 4
32-bit tables and a sbox.

Lookups in that table have been replaced by function calls. These
functions can add a configurable noise level, making cache-based
attacks more difficult to conduct.

In the `none` mitigation level, the behavior is exactly the same
as before. Performance also remains the same.

In other levels, we compress the T tables into a single one, and
read data from multiple cache lines (all of them in `full` mode),
for all bytes in parallel. More precise measurements and way more
attempts become necessary in order to find correlations.

In addition, we use distinct copies of the sbox for key expansion
and encryption, so that they don't share the same L1 cache entries.

The best known attacks target the first two AES round, or the last
one.

While future attacks may improve on this, AES achieves full
diffusion after 4 rounds. So, we can relax the mitigations after
that. This is what this implementation does, enabling mitigations
again for the last two rounds.

In `full` mode, all the rounds are protected.

The protection assumes that lookups within a cache line are secret.
The cachebleed attack showed that it can be circumvented, but
that requires an attacker to be able to abuse hyperthreading and
run code on the same core as the encryption, which is rarely a
practical scenario.

Still, the current AES API allows us to transparently switch to
using fixslicing/bitslicing later when the `full` mitigation level
is enabled.

* Software AES: use little-endian representation.

Virtually all platforms are little-endian these days, so optimizing
for big-endian CPUs doesn't make sense any more.
2023-03-13 22:18:26 +01:00
Frank Denis
1d96a17af4
crypto.aescrypto.encrypt: do not add the round key in an asm block (#14899)
Apple M1/M2 have an EOR3 instruction that can XOR 2 operands with
another one, and LLVM knows how to take advantage of it.

However, two EOR can't be automatically combined into an EOR3 if
one of them is in an assembly block.

That simple change speeds up ciphers doing an AES round immediately
followed by a XOR operation on Apple Silicon.

Before:

   aegis-128l mac:      12534 MiB/s
    aegis-256 mac:       6722 MiB/s
       aegis-128l:      10634 MiB/s
        aegis-256:       6133 MiB/s
       aes128-gcm:       3890 MiB/s
       aes256-gcm:       3122 MiB/s
       aes128-ocb:       2832 MiB/s
       aes256-ocb:       2057 MiB/s

After:

   aegis-128l mac:      15667 MiB/s
    aegis-256 mac:       8240 MiB/s
       aegis-128l:      12656 MiB/s
        aegis-256:       7214 MiB/s
       aes128-gcm:       3976 MiB/s
       aes256-gcm:       3202 MiB/s
       aes128-ocb:       2835 MiB/s
       aes256-ocb:       2118 MiB/s
2023-03-13 07:06:27 +00:00
Andrew Kelley
4dd958d585 improve error message for byref capture of byval array 2023-02-18 19:20:19 -07:00
Andrew Kelley
aeaef8c0ff update std lib and compiler sources to new for loop syntax 2023-02-18 19:17:21 -07:00
Frank Denis
32563e6829
crypto.core.aes: process 6 block in parallel instead of 8 on aarch64 (#13473)
* crypto.core.aes: process 6 block in parallel instead of 8 on aarch64

At least on Apple Silicon, this is slightly faster than 8 blocks.

* AES: add parallel blocks for tigerlake, rocketlake, alderlake, zen3
2022-11-07 12:28:37 +01:00
Helio Machado
e0be22b6c0
std.crypto: cosmetic improvement to AES multiplication algorithm (#11616)
std.crypto: cosmetic improvement to AES multiplication algorithm (#11616)
2022-05-25 19:23:49 +02:00
Helio Machado
941b6830b1
std.crypto: generate AES constants at compile time (#11612)
* std/crypto: generate AES constants at compile time

* Apply suggestions from code review

Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com>

* Update lib/std/crypto/aes/soft.zig

* Separate encryption and decryption tables

* Run `zig fmt`

* Increase branch quota and remove redundant align

* Update lib/std/crypto/aes/soft.zig

Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com>

* Rename identifiers and simplify dataflow

* Increase branch quota (again) and fix comment

Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com>
2022-05-09 18:50:01 +02:00
Meghan
b73cf97c93
replace other uses of std.meta.Vector with @Vector (#11346) 2022-03-30 14:12:14 -04:00
Andrew Kelley
6115cf2240 migrate from std.Target.current to @import("builtin").target
closes #9388
closes #9321
2021-10-04 23:48:55 -07:00
jdmichaud
49c9975484
zig fmt: respect trailing commas in inline assembly 2021-08-29 11:57:32 +02:00
Andrew Kelley
d29871977f remove redundant license headers from zig standard library
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.

Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
2021-08-24 12:25:09 -07:00
Jacob G-W
641ecc260f std, src, doc, test: remove unused variables 2021-06-21 17:03:03 -07:00
Isaac Freund
5b850d5c92
Run zig fmt on src/ and lib/std/
This replaces callconv(.Inline) with the more idiomatic inline keyword.
2021-05-20 17:14:18 +02:00
Frank Denis
0423f0f7d8 std/crypto/aes: fix AES {encrypt,decrypt}Wide
These functions are  not used by anything yet, but run the last
round only once.
2021-02-28 21:55:58 +02:00
Tadeo Kondrak
5dfe0e7e8f
Convert inline fn to callconv(.Inline) everywhere 2021-02-10 20:06:12 -07:00
Frank Denis
6c2e0c2046 Year++ 2020-12-31 15:45:24 -08:00
Frank Denis
0adc144f88 std/crypto: adjust aesni parallelism to CPU models
Intel keeps changing the latency & throughput of the aes* and clmul
instructions every time they release a new model.

Adjust `optimal_parallel_blocks` accordingly, keeping 8 as a safe
default for unknown data.
2020-10-28 21:44:00 +02:00
Frank Denis
91a1c20e74 Fix a typo (s/multple/multiple/) 2020-10-24 07:57:34 +02:00
Frank Denis
fa17447090 std/crypto: make the whole APIs more consistent
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm.
- consistently use `_length` instead of mixing `_size` and `_length` for the
constants we expose
- Use `minimum_key_length` when it represents an actual minimum length.
Otherwise, use `key_length`.
- Require output buffers (for ciphertexts, macs, hashes) to be of the right
size, not at least of that size in some functions, and the exact size elsewhere.
- Use a `_bits` suffix instead of `_length` when a size is represented as a
number of bits to avoid confusion.
- Functions returning a constant-sized slice are now defined as a slice instead
of a pointer + a runtime assertion. This is the case for most hash functions.
- Use `camelCase` for all functions instead of `snake_case`.

No functional changes, but these are breaking API changes.
2020-10-17 18:53:08 -04:00
Frank Denis
60d1e675d2 aes/aesni is not based on a Go implementation, only aes/soft is
Don't blame them for our bugs :)
2020-10-08 14:55:11 +02:00
Frank Denis
f39dc00ed4 std/crypto/aes: add AES hardware acceleration on aarch64 2020-10-08 14:55:08 +02:00
Frank Denis
9f274e1f7d std/crypto: add the AEGIS128L AEAD
Showcase that Zig can be a great option for high performance cryptography.

The AEGIS family of authenticated encryption algorithms was selected for
high-performance applications in the final portfolio of the CAESAR
competition.

They reuse the AES core function, but are substantially faster than the
CCM, GCM and OCB modes while offering a high level of security.

AEGIS algorithms are especially fast on CPUs with built-in AES support, and
the 128L variant fully takes advantage of the pipeline in modern Intel CPUs.

Performance of the Zig implementation is on par with libsodium.
2020-09-29 17:10:04 +02:00
Frank Denis
bd89bd6fdb Revamp crypto/aes
* Reorganize crypto/aes in order to separate parameters, implementations and
modes.
* Add a zero-cost abstraction over the internal representation of a block,
so that blocks can be kept in vector registers in optimized implementations.
* Add architecture-independent aesenc/aesdec/aesenclast/aesdeclast operations,
so that any AES-based primitive can be implemented, including these that don't
use the original key schedule (AES-PRF, AEGIS, MeowHash...)
* Add support for parallelization/wide blocks to take advantage of hardware
implementations.
* Align T-tables to cache lines in the software implementations to slightly
reduce side channels.
* Add an optimized implementation for modern Intel CPUs with AES-NI.
* Add new tests (AES256 key expansion).
* Reimplement the counter mode to work with any block cipher, any endianness
and to take advantage of wide blocks.
* Add benchmarks for AES.
2020-09-24 13:16:00 -04:00