* link: add a virtual function `lowerUnnamedConsts`, similar to
`updateFunc` or `updateDecl` which needs to be implemented by the
linker backend in order to be used with the `CodeGen` code
* elf: implement `lowerUnnamedConsts` specialization where we
lower unnamed constants to `.rodata` section. We keep track of the
atoms encompassing the lowered unnamed consts in a global table
indexed by parent `Decl`. When the `Decl` is updated or destroyed,
we clear the unnamed consts referenced within the `Decl`.
* macho: implement `lowerUnnamedConsts` specialization where we
lower unnamed constants to `__TEXT,__const` section. We keep track of the
atoms encompassing the lowered unnamed consts in a global table
indexed by parent `Decl`. When the `Decl` is updated or destroyed,
we clear the unnamed consts referenced within the `Decl`.
* x64: change `MCValue.linker_sym_index` into two `MCValue`s: `.got_load` and
`.direct_load`. The former signifies to the emitter that it should
emit a GOT load relocation, while the latter that it should emit
a direct load (`SIGNED`) relocation.
* x64: lower `struct` instantiations
In accordance with the requesting issue (#10750):
- `zig test` skips any tests that it cannot spawn, returning success
- `zig run` and `zig build` exit with failure, reporting the command the cannot be run
- `zig clang`, `zig ar`, etc. already punt directly to the appropriate clang/lld main(), even before this change
- Native `libc` Detection is not supported
Additionally, `exec()` and related Builder functions error at run-time, reporting the command that cannot be run
Currently Zig lowers `@intToFloat` for f80 incorrectly on non-x86
targets:
```
broken LLVM module found:
UIToFP result must be FP or FP vector
%62 = uitofp i64 %61 to i128
SIToFP result must be FP or FP vector
%66 = sitofp i64 %65 to i128
```
This happens because on such targets, we use i128 instead of x86_fp80 in
order to avoid "LLVM ERROR: Cannot select". `@intToFloat` must be
lowered differently to account for this difference as well.
We're going to remove the first parameter from this function in the
future. Stage2 already ignores the first parameter. So we put an `@as`
in here to make it work for both.
This adds a new path which avoids using compiler_rt generated div
udivmod instructions in the case that a divisor is less than half the
max usize value. Two half-limb divisions are performed instead which
ensures that non-emulated division instructions are actually used. This
does not improve the udivmod code which should still be reviewed
independently of this issue.
Notably this improves the performance of the toString implementation of
non-power-of-two bases considerably.
Division performance is improved ~1000% based on some coarse testing.
The following test code is used to provide a rough comparison between
the old vs. new method.
```
const std = @import("std");
const Managed = std.math.big.int.Managed;
const allocator = std.heap.c_allocator;
fn fib(a: *Managed, n: usize) !void {
var b = try Managed.initSet(allocator, 1);
defer b.deinit();
var c = try Managed.init(allocator);
defer c.deinit();
var i: usize = 0;
while (i < n) : (i += 1) {
try c.add(a.toConst(), b.toConst());
a.swap(&b);
b.swap(&c);
}
}
pub fn main() !void {
var a = try Managed.initSet(allocator, 0);
defer a.deinit();
try fib(&a, 1_000_000);
// Note: Next two lines (and printed digit count) omitted on no-print version.
const as = try a.toString(allocator, 10, .lower);
defer allocator.free(as);
std.debug.print("fib: digit count: {}, limb count: {}\n", .{ as.len, a.limbs.len });
}
```
```
==> time.no-print <==
limb count: 10849
________________________________________________________
Executed in 10.60 secs fish external
usr time 10.44 secs 0.00 millis 10.44 secs
sys time 0.02 secs 1.12 millis 0.02 secs
==> time.old <==
fib: digit count: 208988, limb count: 10849
________________________________________________________
Executed in 22.78 secs fish external
usr time 22.43 secs 1.01 millis 22.43 secs
sys time 0.03 secs 0.13 millis 0.03 secs
==> time.optimized <==
fib: digit count: 208988, limb count: 10849
________________________________________________________
Executed in 11.59 secs fish external
usr time 11.56 secs 1.03 millis 11.56 secs
sys time 0.03 secs 0.12 millis 0.03 secs
```
Perf data for non-optimized and optimized, verifying no udivmod is
generated by the new code.
```
$ perf report -i perf.data.old --stdio
- Total Lost Samples: 0
-
- Samples: 90K of event 'cycles:u'
- Event count (approx.): 71603695208
-
- Overhead Command Shared Object Symbol
- ........ ....... ................ ...........................................
-
52.97% t t [.] compiler_rt.udivmod.udivmod
45.97% t t [.] std.math.big.int.Mutable.addCarry
0.83% t t [.] main
0.08% t libc-2.33.so [.] __memmove_avx_unaligned_erms
0.08% t t [.] __udivti3
0.03% t [unknown] [k] 0xffffffff9a0010a7
0.02% t t [.] std.math.big.int.Managed.ensureCapacity
0.01% t libc-2.33.so [.] _int_malloc
0.00% t libc-2.33.so [.] __malloc_usable_size
0.00% t libc-2.33.so [.] _int_free
0.00% t t [.] 0x0000000000004a80
0.00% t t [.] std.heap.CAllocator.resize
0.00% t libc-2.33.so [.] _mid_memalign
0.00% t libc-2.33.so [.] sysmalloc
0.00% t libc-2.33.so [.] __posix_memalign
0.00% t t [.] std.heap.CAllocator.alloc
0.00% t ld-2.33.so [.] do_lookup_x
$ perf report -i perf.data.optimized --stdio
- Total Lost Samples: 0
-
- Samples: 46K of event 'cycles:u'
- Event count (approx.): 36790112336
-
- Overhead Command Shared Object Symbol
- ........ ....... ................ ...........................................
-
79.98% t t [.] std.math.big.int.Mutable.addCarry
15.14% t t [.] main
4.58% t t [.] std.math.big.int.Managed.ensureCapacity
0.21% t libc-2.33.so [.] __memmove_avx_unaligned_erms
0.05% t [unknown] [k] 0xffffffff9a0010a7
0.02% t libc-2.33.so [.] _int_malloc
0.01% t t [.] std.heap.CAllocator.alloc
0.01% t libc-2.33.so [.] __malloc_usable_size
0.00% t libc-2.33.so [.] systrim.constprop.0
0.00% t libc-2.33.so [.] _mid_memalign
0.00% t t [.] 0x0000000000000c7d
0.00% t libc-2.33.so [.] malloc
0.00% t ld-2.33.so [.] check_match
```
Closes#10630.
AstGen: Fixed bug where f80 types in source were triggering illegal
behavior.
Value: handle f80 in floating point arithmetic functions.
Value: implement floatRem and floatMod
This commit introduces dependencies on compiler-rt that are not
implemented. Those are a prerequisite to merging this branch.
When setting the break value in an if expression we must explicitly
check if a result location type coercion that needs to happen. This was
already done for switch expression, so let's just imitate that check
and fix for if expressions. To make this possible, we now also propagate
`rl_ty_inst` to sub scopes.
These options were removed in 5e63baae8 (CLI: remove --verbose-ast and
--verbose-tokenize, 2021-06-09) but some remainders were left in.
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
For PIE targets, we defer getting an address of value until the linker
has allocated all atoms and performed the relocations. In codegen,
we represent this via `MCValue.linker_sym_index` value.
- use usize to decide if register size is big enough to store
multiplication result or if division is necessary
- multiplication routine with check of integer bounds
- wrapping multipliation and division routine from Hacker's Delight
fieldVal handles pointer to pointer to array. This can happen for
example, if a pointer to an array is used as the condition expression of
a for loop.
resolveStructFully handles tuples (by doing nothing).
fixed Type comparison for tuples to handle comptime fields properly.
* resolve_inferred_alloc now gives a proper mutability attribute to the
corresponding alloc instruction. Previously, it would fail to mark
things const.
* slicing: fix the detection for when the end index equals the length
of the underlying object. Previously it was using `end - start` but
it should just use the end index directly. It also takes into account
when slicing a comptime-known slice.
* `Type.sentinel`: fix not handling all slice tags
Singular tests (such as in the bug ones) are moved to top level with exclusions for non-passing backends.
The big behavior tests such as array_llvm and slice are moved to the inner scope with the C backend disabled.
They all pass for the wasm backend now
We now calculate the total stack size required for the current frame.
The default alignment of the stack is 16 bytes, and will be overwritten when the alignment
of a given type is larger than that.
After we have generated all instructions for the body, we calculate the total stack size
by forward aligning the stack size while accounting for the max alignment.
We then insert a prologue into the body, where we substract this size from the stack pointer
and save it inside a bottom stackframe local. We use this local then, to calculate
the stack pointer locals of all variables we allocate into the stack.
In a future iteration we can improve this further by storing the offsets as a new `stack_offset` `WValue`.
This has the benefit of not having to spend runtime cost of storing those offsets, but instead we append
those offsets whenever we need the value that lives in the stack.
Implements the instruction `vector_init` for structs and arrays.
For arrays, it checks if the element must be passed by reference or not.
When not, it can simply use the `offset` field of a store instruction to copy the values
into the array. When it is byref, it will move the pointer by the element size, and then perform
a store operation. This ensures types like structs will be moved into the right position.
For structs we will always move the pointer, as we currently cannot verify if all fields are
not by ref.
This implements lowering elem_ptr for decl's and constants.
To generate the correct pointer, we perform a relocation by using the addend
that represents the offset. The offset is calculated by taking the element's size
and multiplying that by the index.
For constants this generates a single immediate instruction, and for decl's
this generates a single pointer address.