15 Commits

Author SHA1 Message Date
Matthew Lugg
69f39868b4
Air.Legalize: revert to loops for scalarizations
I had tried unrolling the loops to avoid requiring the
`vector_store_elem` instruction, but it's arguably a problem to generate
O(N) code for an operation on `@Vector(N, T)`. In addition, that
lowering emitted a lot of `.aggregate_init` instructions, which is
itself a quite difficult operation to codegen.

This requires reintroducing runtime vector indexing internally. However,
I've put it in a couple of instructions which are intended only for use
by `Air.Legalize`, named `legalize_vec_elem_val` (like `array_elem_val`,
but for indexing a vector with a runtime-known index) and
`legalize_vec_store_elem` (like the old `vector_store_elem`
instruction). These are explicitly documented as *not* being emitted by
Sema, so need only be implemented by backends if they actually use an
`Air.Legalize.Feature` which emits them (otherwise they can be marked as
`unreachable`).
2025-11-12 16:00:16 +00:00
Matthew Lugg
6576c3b898
x86_64: spill eflags when initializing bool vector 2025-11-12 16:00:16 +00:00
Matthew Lugg
c091e27aac
compiler: spring cleaning
I started this diff trying to remove a little dead code from the C
backend, but ended up finding a bunch of dead code sprinkled all over
the place:

* `packed` handling in the C backend which was made dead by `Legalize`
* Representation of pointers to runtime-known vector indices
* Handling for the `vector_store_elem` AIR instruction (now removed)
* Old tuple handling from when they used the InternPool repr of structs
* Straightforward unused functions
* TODOs in the LLVM backend for features which Zig just does not support
2025-11-12 16:00:15 +00:00
Jacob Young
a6d444c271 x86_64: implement split vector stores
Closes #25809
2025-11-04 22:45:54 -05:00
Jacob Young
52a029e503 x86_64: continue hacking around unimplemented linker logic
Closes #25666
2025-10-29 19:31:44 -04:00
Jacob Young
2e31077fe0 Coff: implement threadlocal variables 2025-10-10 22:47:47 -07:00
Jacob Young
07c3f9ef8e x86_64: fix bool vector init register clobber
Closes #25439
2025-10-03 12:18:53 -04:00
Jacob Young
d5f09f56e0 x86_64: fix windows calling convention abi 2025-10-02 15:59:51 -04:00
Jacob Young
a896a22932 x86_64: fix @mulAdd miscomp 2025-09-27 20:10:32 -04:00
Jacob Young
a744fbd22f x86_64: fix ~/! miscomps 2025-09-27 18:30:52 -04:00
Jacob Young
b206b0626a x86_64: fix @floatFromInt miscomps 2025-09-27 18:30:52 -04:00
mlugg
611c38e6da x86_64: fix unencodable rem lowerings
The memory operand might use one of the extended GPRs R8 through R15 and
hence require a REX prefix, but having a REX prefix makes the high-byte
register AH unencodeable as the src operand. This latent bug was exposed
by this branch, presumably because `select` now happens to be putting
something in an extended GPR instead of a legacy GPR.

In theory this could be fixed with minimal cost by introducing a way to
communicate to `select` that neither the destination memory nor the
other temporary can be in an extended GPR. However, I just went for the
simple solution which comes at a cost of one trivial instruction: copy
the remainder from AH to AL, and *then* copy AL to the destination.
2025-09-27 18:30:52 -04:00
mlugg
77fca1652f x86_64: fix miscompilation of mul on vectors of large ints 2025-09-27 18:30:52 -04:00
mlugg
0c476191a4 x86_64: generate better constant memcpy code
`rep movsb` isn't usually a great idea here. This commit makes the logic
which tentatively existed in `genInlineMemcpy` apply in more cases, and
in particular applies it to the "new" backend logic. Put simply, all
copies of 128 bytes or fewer will now attempt this path first,
where---provided there is an SSE register and/or a general-purpose
register available---we will lower the operation using a sequence of 32,
16, 8, 4, 2, and 1 byte copy operations.

The feedback I got on this diff was "Push it to master and if it
miscomps I'll revert it" so don't blame me when it explodes
2025-09-27 18:30:52 -04:00
Alex Rønne Petersen
86077fe6bd compiler: move self-hosted backends from src/arch to src/codegen 2025-09-26 02:02:07 +02:00