28 Commits

Author SHA1 Message Date
Frank Denis
4dd061a7ac ghash: handle the .hi_lo case when no CLMUL acceleration is present, too 2022-11-17 23:54:21 +01:00
Frank Denis
3051e279a5 Reapply "std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs (#13566)"
This reapplies commit 72d3f4b5dc0dda9fd0a048c2391f03604f4b30ac.
2022-11-17 23:52:58 +01:00
Andrew Kelley
72d3f4b5dc Revert "std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs (#13566)"
This reverts commit 7cfeae1ce7aa9f1b3a219d032c43bc2e694ba63b which
is causing std lib tests to fail on wasm32-wasi.
2022-11-17 15:37:37 -07:00
Frank Denis
7cfeae1ce7
std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs (#13566)
* std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs

Carryless multiplication was slow on older Intel CPUs, justifying
the need for using Karatsuba multiplication.

This is not the case any more; using 4 multiplications to multiply
two 128-bit numbers is actually faster than 3 multiplications +
shifts and additions.

This is also true on aarch64.

Keep using Karatsuba only when targeting x86 (granted, this is a bit
of a brutal shortcut, we should really list all the CPU models that
had a slow clmul instruction).

Also remove useless agg_2 treshold and restore the ability to
precompute only H and H^2 in ReleaseSmall.

Finally, avoid using u256. Using 128-bit registers is actually faster.

* Use a switch, add some comments
2022-11-17 13:07:07 +01:00
Naoki MATSUMOTO
b29057b6ab
std.crypto.ghash: fix uninitialized polynomial use (#13527)
In the process of 'remaining blocks',
the length of processed message can be from 1 to 79.
The value of 'n-1' is ranged from 0 to 3.
So, st.hx[i] must be initialized at least from st.hx[0] to st.hx[3]
2022-11-14 16:35:08 +01:00
Frank Denis
59af6417bb
crypto.ghash: define aggregate tresholds as blocks, not bytes (#13507)
These constants were read as a block count in initForBlockCount()
but at the same time, as a size in update().

The unit could be blocks or bytes, but we should use the same one
everywhere.

So, use blocks as intended.

Fixes #13506
2022-11-10 19:00:00 +01:00
Frank Denis
36e618aef1 crypto.ghash: compatibility with stage1
Defining the selector enum outside the function definition is
required for stage1.
2022-11-08 16:59:53 +01:00
Frank Denis
7d48cb1138
std.crypto: make ghash faster, esp. for small messages (#13464)
* std.crypto: make ghash faster, esp. for small messages

Aggregated reduction requires 5 additional multiplications (to
precompute the powers of H), in order to save 2 multiplications
per batch.

So, only use large batches when it's actually interesting to do so.

For the last blocks, reuse the precomputations in order to perform
a single reduction.

Also, even in .ReleaseSmall, allow 2-block aggregation.
The speedup is worth it, and the code increase is reasonable.

And in .ReleaseFast, bump the upper batch size up to 16.

Leverage comptime by the way instead of duplicating code.

std/crypto/benchmark.zig on Apple M1:

    Zig 0.10.0: 2769 MiB/s
        Before: 6014 MiB/s
         After: 7334 MiB/s

Normalize function names by the way.

* Change clmul() to accept the half to be processed

This avoids a bunch of truncate() calls.

* Add more ghash tests to check all code paths
2022-11-07 21:45:29 +01:00
Frank Denis
0d192ee9ef
std.crypto.onetimeauth.Ghash: make GHASH 2 - 2.5x faster (#13374)
Rewrite GHASH to use 128-bit multiplication over non-reversed
integers, and up to 8 blocks aggregated reduction.

lib/std/crypto/benchmark.zig results:

Xeon E5:
  Before: 1604 MiB/s
   After: 4005 MiB/s

Apple M1:
  Before: 2769 MiB/s
   After: 6014 MiB/s

This also makes AES-GCM faster by the way.
2022-11-01 13:49:13 -04:00
Veikka Tuominen
62ff8871ed stage2+stage1: remove type parameter from bit builtins
Closes #12529
Closes #12511
Closes #6835
2022-08-22 11:19:20 +03:00
Meghan
b73cf97c93
replace other uses of std.meta.Vector with @Vector (#11346) 2022-03-30 14:12:14 -04:00
Andrew Kelley
6115cf2240 migrate from std.Target.current to @import("builtin").target
closes #9388
closes #9321
2021-10-04 23:48:55 -07:00
jdmichaud
49c9975484
zig fmt: respect trailing commas in inline assembly 2021-08-29 11:57:32 +02:00
Andrew Kelley
d29871977f remove redundant license headers from zig standard library
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.

Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
2021-08-24 12:25:09 -07:00
Isaac Freund
5b850d5c92
Run zig fmt on src/ and lib/std/
This replaces callconv(.Inline) with the more idiomatic inline keyword.
2021-05-20 17:14:18 +02:00
Andrew Kelley
5619ce2406 Merge remote-tracking branch 'origin/master' into stage2-whole-file-astgen
Conflicts:
 * doc/langref.html.in
 * lib/std/enums.zig
 * lib/std/fmt.zig
 * lib/std/hash/auto_hash.zig
 * lib/std/math.zig
 * lib/std/mem.zig
 * lib/std/meta.zig
 * test/behavior/alignof.zig
 * test/behavior/bitcast.zig
 * test/behavior/bugs/1421.zig
 * test/behavior/cast.zig
 * test/behavior/ptrcast.zig
 * test/behavior/type_info.zig
 * test/behavior/vector.zig

Master branch added `try` to a bunch of testing function calls, and some
lines also had changed how to refer to the native architecture and other
`@import("builtin")` stuff.
2021-05-08 14:45:21 -07:00
Veikka Tuominen
fd77f2cfed std: update usage of std.testing 2021-05-08 15:15:30 +03:00
Andrew Kelley
c60d8f017e std: remove redundant comptime keyword
@g-w1's fancy new compile error in action
2021-04-28 22:58:12 -07:00
Tadeo Kondrak
5dfe0e7e8f
Convert inline fn to callconv(.Inline) everywhere 2021-02-10 20:06:12 -07:00
Frank Denis
6c2e0c2046 Year++ 2020-12-31 15:45:24 -08:00
Frank Denis
bd07154242 Add mem.timingSafeEql() for constant-time array comparison
This is a trivial implementation that just does a or[xor] loop.

However, this pattern is used by virtually all crypto libraries and
in practice, even without assembly barriers, LLVM never turns it into
code with conditional jumps, even if one of the parameters is constant.

This has been verified to still be the case with LLVM 11.0.0.
2020-11-07 20:18:43 +01:00
Frank Denis
fa17447090 std/crypto: make the whole APIs more consistent
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm.
- consistently use `_length` instead of mixing `_size` and `_length` for the
constants we expose
- Use `minimum_key_length` when it represents an actual minimum length.
Otherwise, use `key_length`.
- Require output buffers (for ciphertexts, macs, hashes) to be of the right
size, not at least of that size in some functions, and the exact size elsewhere.
- Use a `_bits` suffix instead of `_length` when a size is represented as a
number of bits to avoid confusion.
- Functions returning a constant-sized slice are now defined as a slice instead
of a pointer + a runtime assertion. This is the case for most hash functions.
- Use `camelCase` for all functions instead of `snake_case`.

No functional changes, but these are breaking API changes.
2020-10-17 18:53:08 -04:00
Frank Denis
1bc2b68916 ghash: add pmull support on aarch64 2020-10-08 18:09:23 -04:00
Frank Denis
d343b75e7f ghash & poly1305: fix handling of partial blocks and add pad()
pad() aligns the next input to the first byte of a block, which is
useful to implement the IETF version of ChaCha20Poly1305 and AES-GCM.
2020-10-05 23:50:38 +02:00
Frank Denis
97fd0974b9 ghash: add pclmul support on x86_64 2020-10-01 02:05:11 +02:00
Frank Denis
8161de7fa4 Implement ghash aggregated reduction
Performance increases from ~400 MiB/s to 450 MiB/s at the expense of
extra code. Thus, aggregation is disabled on ReleaseSmall.

Since the multiplication cost is significant compared to the reduction,
aggregating more than 2 blocks is probably not worth it.
2020-10-01 02:05:07 +02:00
Frank Denis
f1ad94437b ghash & poly1305: use pointer to slices for keys and output 2020-10-01 02:04:30 +02:00
Frank Denis
58873ed3f9 std/crypto: add GHASH implementation
GHASH is required to implement AES-GCM.

Optimized implementations for CPUs with instructions for carry-less
multiplication will be added next.
2020-10-01 02:04:30 +02:00