Many of the Managed methods accepted by-val parameters which could
reference Limb slices that became invalid memory after any
ensureCapacity calls. Now, Managed methods accept `*const Managed`
parameters so that if the function allows aliasing and the
ensure-capacity call resizes the Limb slice, it also affects the
aliased parameters, avoiding use-after-free bugs.
This is a breaking change that reduces the requirement for callsites to
manually make the ensure-capacity changes prior to calling many of the
Managed methods.
Closes#11897
Big-int functions were updated to respect the provided abi_size, rather
than inferring a potentially incorrect abi_size implicitly.
In combination with the convention that any required padding bits are
added on the MSB end, this means that exotic integers can potentially
have a well-defined memory layout.
This adds a new path which avoids using compiler_rt generated div
udivmod instructions in the case that a divisor is less than half the
max usize value. Two half-limb divisions are performed instead which
ensures that non-emulated division instructions are actually used. This
does not improve the udivmod code which should still be reviewed
independently of this issue.
Notably this improves the performance of the toString implementation of
non-power-of-two bases considerably.
Division performance is improved ~1000% based on some coarse testing.
The following test code is used to provide a rough comparison between
the old vs. new method.
```
const std = @import("std");
const Managed = std.math.big.int.Managed;
const allocator = std.heap.c_allocator;
fn fib(a: *Managed, n: usize) !void {
var b = try Managed.initSet(allocator, 1);
defer b.deinit();
var c = try Managed.init(allocator);
defer c.deinit();
var i: usize = 0;
while (i < n) : (i += 1) {
try c.add(a.toConst(), b.toConst());
a.swap(&b);
b.swap(&c);
}
}
pub fn main() !void {
var a = try Managed.initSet(allocator, 0);
defer a.deinit();
try fib(&a, 1_000_000);
// Note: Next two lines (and printed digit count) omitted on no-print version.
const as = try a.toString(allocator, 10, .lower);
defer allocator.free(as);
std.debug.print("fib: digit count: {}, limb count: {}\n", .{ as.len, a.limbs.len });
}
```
```
==> time.no-print <==
limb count: 10849
________________________________________________________
Executed in 10.60 secs fish external
usr time 10.44 secs 0.00 millis 10.44 secs
sys time 0.02 secs 1.12 millis 0.02 secs
==> time.old <==
fib: digit count: 208988, limb count: 10849
________________________________________________________
Executed in 22.78 secs fish external
usr time 22.43 secs 1.01 millis 22.43 secs
sys time 0.03 secs 0.13 millis 0.03 secs
==> time.optimized <==
fib: digit count: 208988, limb count: 10849
________________________________________________________
Executed in 11.59 secs fish external
usr time 11.56 secs 1.03 millis 11.56 secs
sys time 0.03 secs 0.12 millis 0.03 secs
```
Perf data for non-optimized and optimized, verifying no udivmod is
generated by the new code.
```
$ perf report -i perf.data.old --stdio
- Total Lost Samples: 0
-
- Samples: 90K of event 'cycles:u'
- Event count (approx.): 71603695208
-
- Overhead Command Shared Object Symbol
- ........ ....... ................ ...........................................
-
52.97% t t [.] compiler_rt.udivmod.udivmod
45.97% t t [.] std.math.big.int.Mutable.addCarry
0.83% t t [.] main
0.08% t libc-2.33.so [.] __memmove_avx_unaligned_erms
0.08% t t [.] __udivti3
0.03% t [unknown] [k] 0xffffffff9a0010a7
0.02% t t [.] std.math.big.int.Managed.ensureCapacity
0.01% t libc-2.33.so [.] _int_malloc
0.00% t libc-2.33.so [.] __malloc_usable_size
0.00% t libc-2.33.so [.] _int_free
0.00% t t [.] 0x0000000000004a80
0.00% t t [.] std.heap.CAllocator.resize
0.00% t libc-2.33.so [.] _mid_memalign
0.00% t libc-2.33.so [.] sysmalloc
0.00% t libc-2.33.so [.] __posix_memalign
0.00% t t [.] std.heap.CAllocator.alloc
0.00% t ld-2.33.so [.] do_lookup_x
$ perf report -i perf.data.optimized --stdio
- Total Lost Samples: 0
-
- Samples: 46K of event 'cycles:u'
- Event count (approx.): 36790112336
-
- Overhead Command Shared Object Symbol
- ........ ....... ................ ...........................................
-
79.98% t t [.] std.math.big.int.Mutable.addCarry
15.14% t t [.] main
4.58% t t [.] std.math.big.int.Managed.ensureCapacity
0.21% t libc-2.33.so [.] __memmove_avx_unaligned_erms
0.05% t [unknown] [k] 0xffffffff9a0010a7
0.02% t libc-2.33.so [.] _int_malloc
0.01% t t [.] std.heap.CAllocator.alloc
0.01% t libc-2.33.so [.] __malloc_usable_size
0.00% t libc-2.33.so [.] systrim.constprop.0
0.00% t libc-2.33.so [.] _mid_memalign
0.00% t t [.] 0x0000000000000c7d
0.00% t libc-2.33.so [.] malloc
0.00% t ld-2.33.so [.] check_match
```
Closes#10630.
AIR:
* div is renamed to div_trunc.
* Add div_float, div_floor, div_exact.
- Implemented in Sema and LLVM codegen. C backend has a stub.
Improvements to std.math.big.Int:
* Add `eqZero` function to `Mutable`.
* Fix incorrect results for `divFloor`.
Compiler-rt:
* Add muloti4 to the stage2 section.
This function can be used to initialize a big integer to either the upper
or lower limit of a 2s-complement integer. Note that the result is still
in sign-magnitude representation, though in order to convert it into twos
complement all one has to do is take the absolute value.
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.
Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
After a right shift, top limbs may be all zero. However, without
normalization, the number of limbs is not going to change.
In order to check if a big number is zero, we used to assume that the
number of limbs is 1. Which may not be the case after right shifts,
even if the actual value is zero.
- Normalize after a right shift
- Add a test for that issue
- Check all the limbs in `eqlZero()`. It may not be necessary if
callers always remember to normalize before calling the function.
But checking all the limbs is very cheap and makes the function less
bug-prone.
* Add an optimized squaring routine under the `sqr` name.
Algorithms for squaring bigger numbers efficiently will come in a
PR later.
* Fix a bug where a multiplication was done twice if the threshold for
the use of Karatsuba algorithm was crossed. Add a test to make sure
this won't happen again.
* Streamline `pow` method, take a `Const` parameter.
* Minor tweaks to `pow`, avoid bit-reversing the exponent.
* Correctly scan all the exponent bits, this caused the incorrect result
to be computed for exponents being powers of two.
* Allocate enough limbs to make llmulacc stop whining.
Implemented following Knuth's "Evaluation of Powers" chapter in TAOCP,
some extra complexity is needed to make sure there's no aliasing and
avoid allocating too many limbs.
A brief example to illustrate why the last point is important:
consider 10^123, since 10 is well within the limits of a single limb we
can safely say that the result will surely fit in:
⌈log2(10)⌉ bit * 123 = 492 bits = 7 limbs
A naive calculation using only the number of limbs yields:
1 limb * 123 = 123 limbs
The space savings are noticeable.
Now there are 3 types:
* std.math.big.int.Const
- the memory is immutable, only stores limbs and is_positive
- all methods operating on constant data go here
* std.math.big.int.Mutable
- the memory is mutable, stores capacity in addition to limbs and
is_positive
- methods here have some Mutable parameters and some Const
parameters. These methods expect callers to pre-calculate the
amount of resources required, and asserts that the resources are
available.
* std.math.big.int.Managed
- the memory is mutable and additionally stores an allocator.
- methods here perform the resource calculations for the programmer.
- this is the high level abstraction from before
Each of these 3 types can be converted to the other ones.
You can see the use case for this in the self-hosted compiler, where we
only store limbs, and construct the big ints as needed.
This gets rid of the hack where the allocator was optional and the
notion of "fixed" versions of the struct. Such things are now modeled
with the `big.int.Const` type.