125 Commits

Author SHA1 Message Date
Jacob Young
1f98c98fff x86_64: increase passing test coverage on windows
Now that codegen has no references to linker state this is much easier.

Closes #24153
2025-06-19 18:41:12 -04:00
Jacob Young
917640810e Target: pass and use locals by pointer instead of by value
This struct is larger than 256 bytes and code that copies it
consistently shows up in profiles of the compiler.
2025-06-19 11:45:06 -04:00
Jacob Young
d312dfc1f2
codegen: make threadlocal logic consistent 2025-06-12 17:51:29 +01:00
Jacob Young
ba53b14028
x86_64: remove linker references from codegen 2025-06-12 13:55:41 +01:00
Jacob Young
c95b1bf2d3
x86_64: remove air references from mir 2025-06-12 13:55:41 +01:00
mlugg
5ab307cf47
compiler: get most backends compiling again
As of this commit, every backend other than self-hosted Wasm and
self-hosted SPIR-V compiles and (at least somewhat) functions again.
Those two backends are currently disabled with panics.

Note that `Zcu.Feature.separate_thread` is *not* enabled for the fixed
backends. Avoiding linker references from codegen is a non-trivial task,
and can be done after this branch.
2025-06-12 13:55:40 +01:00
mlugg
2fb6f5c1ad
link: divorce LLD from the self-hosted linkers
Similar to the previous commit, this commit untangles LLD integration
from the self-hosted linkers. Despite the big network of functions which
were involved, it turns out what was going on here is quite simple. The
LLD linking logic is actually very self-contained; it requires a few
flags from the `link.File.OpenOptions`, but that's really about it. We
don't need any of the mutable state on `Elf`/`Coff`/`Wasm`, for
instance. There was some legacy code trying to handle support for using
self-hosted codegen with LLD, but that's not a supported use case, so
I've just stripped it out.

For now, I've just pasted the logic for linking the 3 targets we
currently support using LLD for into this new linker implementation,
`link.Lld`; however, it's almost certainly possible to combine some of
the logic and simplify this file a bit. But to be honest, it's not
actually that bad right now.

This commit ends up eliminating the distinction between `flush` and
`flushZcu` (formerly `flushModule`) in linkers, where the latter
previously meant something along the lines of "flush, but if you're
going to be linking with LLD, just flush the ZCU object file, don't
actually link"?. The distinction here doesn't seem like it was properly
defined, and most linkers seem to treat them as essentially identical
anyway. Regardless, all calls to `flushZcu` are gone now, so it's
deleted -- one `flush` to rule them all!

The end result of this commit and the preceding one is that LLVM and LLD
fit into the pipeline much more sanely:

* If we're using LLVM for the ZCU, that state is on `zcu.llvm_object`
* If we're using LLD to link, then the `link.File` is a `link.Lld`
* Calls to "ZCU link functions" (e.g. `updateNav`) lower to calls to the
  LLVM object if it's available, or otherwise to the `link.File` if it's
  available (neither is available under `-fno-emit-bin`)
* After everything is done, linking is finalized by calling `flush` on
  the `link.File`; for `link.Lld` this invokes LLD, for other linkers it
  flushes self-hosted linker state

There's one messy thing remaining, and that's how self-hosted function
codegen in a ZCU works; right now, we process AIR with a call sequence
something like this:

* `link.doTask`
* `Zcu.PerThread.linkerUpdateFunc`
* `link.File.updateFunc`
* `link.Elf.updateFunc`
* `link.Elf.ZigObject.updateFunc`
* `codegen.generateFunction`
* `arch.x86_64.CodeGen.generate`

So, we start in the linker, take a scenic detour through `Zcu`, go back
to the linker, into its implementation, and then... right back out, into
code which is generic over the linker implementation, and then dispatch
on the *backend* instead! Of course, within `arch.x86_64.CodeGen`, there
are some more places which switch on the `link` implementation being
used. This is all pretty silly... so it shall be my next target.
2025-06-12 13:55:39 +01:00
mlugg
3743c3e39c
compiler: slightly untangle LLVM from the linkers
The main goal of this commit is to make it easier to decouple codegen
from the linkers by being able to do LLVM codegen without going through
the `link.File`; however, this ended up being a nice refactor anyway.

Previously, every linker stored an optional `llvm.Object`, which was
populated when using LLVM for the ZCU *and* linking an output binary;
and `Zcu` also stored an optional `llvm.Object`, which was used only
when we needed LLVM for the ZCU (e.g. for `-femit-llvm-bc`) but were not
emitting a binary.

This situation was incredibly silly. It meant there were N+1 places the
LLVM object might be instead of just 1, and it meant that every linker
had to start a bunch of methods by checking for an LLVM object, and just
dispatching to the corresponding method on *it* instead if it was not
`null`.

Instead, we now always store the LLVM object on the `Zcu` -- which makes
sense, because it corresponds to the object emitted by, well, the Zig
Compilation Unit! The linkers now mostly don't make reference to LLVM.
`Compilation` makes sure to emit the LLVM object if necessary before
calling `flush`, so it is ready for the linker. Also, all of the
`link.File` methods which act on the ZCU -- like `updateNav` -- now
check for the LLVM object in `link.zig` instead of in every single
individual linker implementation. Notably, the change to LLVM emit
improves this rather ludicrous call chain in the `-fllvm -flld` case:

* Compilation.flush
* link.File.flush
* link.Elf.flush
* link.Elf.linkWithLLD
* link.Elf.flushModule
* link.emitLlvmObject
* Compilation.emitLlvmObject
* llvm.Object.emit

Replacing it with this one:

* Compilation.flush
* llvm.Object.emit

...although we do currently still end up in `link.Elf.linkWithLLD` to do
the actual linking. The logic for invoking LLD should probably also be
unified at least somewhat; I haven't done that in this commit.
2025-06-12 13:55:39 +01:00
Jacob Young
c04be630d9 Legalize: introduce a new pass before liveness
Each target can opt into different sets of legalize features.
By performing these transformations before liveness, instructions
that become unreferenced will have up-to-date liveness information.
2025-05-29 03:57:48 -04:00
mlugg
37a9a4e0f1
compiler: refactor Zcu.File and path representation
This commit makes some big changes to how we track state for Zig source
files. In particular, it changes:

* How `File` tracks its path on-disk
* How AstGen discovers files
* How file-level errors are tracked
* How `builtin.zig` files and modules are created

The original motivation here was to address incremental compilation bugs
with the handling of files, such as #22696. To fix this, a few changes
are necessary.

Just like declarations may become unreferenced on an incremental update,
meaning we suppress analysis errors associated with them, it is also
possible for all imports of a file to be removed on an incremental
update, in which case file-level errors for that file should be
suppressed. As such, after AstGen, the compiler must traverse files
(starting from analysis roots) and discover the set of "live files" for
this update.

Additionally, the compiler's previous handling of retryable file errors
was not very good; the source location the error was reported as was
based only on the first discovered import of that file. This source
location also disappeared on future incremental updates. So, as a part
of the file traversal above, we also need to figure out the source
locations of imports which errors should be reported against.

Another observation I made is that the "file exists in multiple modules"
error was not implemented in a particularly good way (I get to say that
because I wrote it!). It was subject to races, where the order in which
different imports of a file were discovered affects both how errors are
printed, and which module the file is arbitrarily assigned, with the
latter in turn affecting which other files are considered for import.
The thing I realised here is that while the AstGen worker pool is
running, we cannot know for sure which module(s) a file is in; we could
always discover an import later which changes the answer.

So, here's how the AstGen workers have changed. We initially ensure that
`zcu.import_table` contains the root files for all modules in this Zcu,
even if we don't know any imports for them yet. Then, the AstGen
workers do not need to be aware of modules. Instead, they simply ignore
module imports, and only spin off more workers when they see a by-path
import.

During AstGen, we can't use module-root-relative paths, since we don't
know which modules files are in; but we don't want to unnecessarily use
absolute files either, because those are non-portable and can make
`error.NameTooLong` more likely. As such, I have introduced a new
abstraction, `Compilation.Path`. This type is a way of representing a
filesystem path which has a *canonical form*. The path is represented
relative to one of a few special directories: the lib directory, the
global cache directory, or the local cache directory. As a fallback, we
use absolute (or cwd-relative on WASI) paths. This is kind of similar to
`std.Build.Cache.Path` with a pre-defined list of possible
`std.Build.Cache.Directory`, but has stricter canonicalization rules
based on path resolution to make sure deduplicating files works
properly. A `Compilation.Path` can be trivially converted to a
`std.Build.Cache.Path` from a `Compilation`, but is smaller, has a
canonical form, and has a digest which will be consistent across
different compiler processes with the same lib and cache directories
(important when we serialize incremental compilation state in the
future). `Zcu.File` and `Zcu.EmbedFile` both contain a
`Compilation.Path`, which is used to access the file on-disk;
module-relative sub paths are used quite rarely (`EmbedFile` doesn't
even have one now for simplicity).

After the AstGen workers all complete, we know that any file which might
be imported is definitely in `import_table` and up-to-date. So, we
perform a single-threaded graph traversal; similar to what
`resolveReferences` plays for `AnalUnit`s, but for files instead. We
figure out which files are alive, and which module each file is in. If a
file turns out to be in multiple modules, we set a field on `Zcu` to
indicate this error. If a file is in a different module to a prior
update, we set a flag instructing `updateZirRefs` to invalidate all
dependencies on the file. This traversal also discovers "import errors";
these are errors associated with a specific `@import`. With Zig's
current design, there is only one possible error here: "import outside
of module root". This must be identified during this traversal instead
of during AstGen, because it depends on which module the file is in. I
tried also representing "module not found" errors in this same way, but
it turns out to be much more useful to report those in Sema, because of
use cases like optional dependencies where a module import is behind a
comptime-known build option.

For simplicity, `failed_files` now just maps to `?[]u8`, since the
source location is always the whole file. In fact, this allows removing
`LazySrcLoc.Offset.entire_file` completely, slightly simplifying some
error reporting logic. File-level errors are now directly built in the
`std.zig.ErrorBundle.Wip`. If the payload is not `null`, it is the
message for a retryable error (i.e. an error loading the source file),
and will be reported with a "file imported here" note pointing to the
import site discovered during the single-threaded file traversal.

The last piece of fallout here is how `Builtin` works. Rather than
constructing "builtin" modules when creating `Package.Module`s, they are
now constructed on-the-fly by `Zcu`. The map `Zcu.builtin_modules` maps
from digests to `*Package.Module`s. These digests are abstract hashes of
the `Builtin` value; i.e. all of the options which are placed into
"builtin.zig". During the file traversal, we populate `builtin_modules`
as needed, so that when we see this imports in Sema, we just grab the
relevant entry from this map. This eliminates a bunch of awkward state
tracking during construction of the module graph. It's also now clearer
exactly what options the builtin module has, since previously it
inherited some options arbitrarily from the first-created module with
that "builtin" module!

The user-visible effects of this commit are:
* retryable file errors are now consistently reported against the whole
  file, with a note pointing to a live import of that file
* some theoretical bugs where imports are wrongly considered distinct
  (when the import path moves out of the cwd and then back in) are fixed
* some consistency issues with how file-level errors are reported are
  fixed; these errors will now always be printed in the same order
  regardless of how the AstGen pass assigns file indices
* incremental updates do not print retryable file errors differently
  between updates or depending on file structure/contents
* incremental updates support files changing modules
* incremental updates support files becoming unreferenced

Resolves: #22696
2025-05-18 17:37:02 +01:00
Andrew Kelley
eb3c7f5706 zig build fmt 2025-02-22 17:09:20 -08:00
Alex Rønne Petersen
f87b443af1 link.MachO: Add support for the -x flag (discard local symbols).
This can also be extended to ELF later as it means roughly the same thing there.

This addresses the main issue in #21721 but as I don't have a macOS machine to
do further testing on, I can't confirm whether zig cc is able to pass the entire
cgo test suite after this commit. It can, however, cross-compile a basic program
that uses cgo to x86_64-macos-none which previously failed due to lack of -x
support. Unlike previously, the resulting symbol table does not contain local
symbols (such as C static functions).

I believe this satisfies the related donor bounty: https://ziglang.org/news/second-donor-bounty
2025-02-22 06:35:19 +01:00
Andrew Kelley
94648a0383 fix merge conflicts with updating line numbers 2025-01-15 15:11:36 -08:00
Andrew Kelley
b3ecdb21ee switch to ArrayListUnmanaged for machine code 2025-01-15 15:11:35 -08:00
Andrew Kelley
e521879e47 rewrite wasm/Emit.zig
mainly, rework how relocations works. This is the point at which symbol
indexes are known - not before. And don't emit unnecessary relocations!
They're only needed when emitting an object file.

Changes wasm linker to keep MIR around long-lived so that fixups can be
reapplied after linker garbage collection.

use labeled switch while we're at it
2025-01-15 15:11:35 -08:00
Andrew Kelley
943dac3e85 compiler: add type safety for export indices 2025-01-15 15:11:35 -08:00
Andrew Kelley
9bf715de74 rework error handling in the backends 2025-01-15 15:11:35 -08:00
Andrew Kelley
da25ed95fc macho linker conforms to explicit error sets, again 2025-01-15 15:11:35 -08:00
Andrew Kelley
16180f525a macho linker: conform to explicit error sets
Makes linker functions have small error sets, required to report
diagnostics properly rather than having a massive error set that has a
lot of codes.

Other linker implementations are not ported yet.

Also the branch is not passing semantic analysis yet.
2025-01-15 15:11:35 -08:00
Andrew Kelley
795e7c64d5 wasm linker: aggressive DODification
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
2025-01-15 15:11:35 -08:00
Jacob Young
dde3116e50
Dwarf: implement new incremental line number update API 2025-01-05 02:20:56 +00:00
mlugg
065e10c95c
link: new incremental line number update API 2025-01-05 02:20:56 +00:00
mlugg
3afda4322c
compiler: analyze type and value of global declaration separately
This commit separates semantic analysis of the annotated type vs value
of a global declaration, therefore allowing recursive and mutually
recursive values to be declared.

Every `Nav` which undergoes analysis now has *two* corresponding
`AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val`
unit is responsible for *fully resolving* the `Nav`: determining its
value, linksection, addrspace, etc. The `nav_ty` unit, on the other
hand, resolves only the information necessary to construct a *pointer*
to the `Nav`: its type, addrspace, etc. (It does also analyze its
linksection, but that could be moved to `nav_val` I think; it doesn't
make any difference).

Analyzing a `nav_ty` for a declaration with no type annotation will just
mark a dependency on the `nav_val`, analyze it, and finish. Conversely,
analyzing a `nav_val` for a declaration *with* a type annotation will
first mark a dependency on the `nav_ty` and analyze it, using this as
the result type when evaluating the value body.

The `nav_val` and `nav_ty` units always have references to one another:
so, if a `Nav`'s type is referenced, its value implicitly is too, and
vice versa. However, these dependencies are trivial, so, to save memory,
are only known implicitly by logic in `resolveReferences`.

In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the
corresponding `Nav`. There are two exceptions to this. If the
declaration is an `extern` declaration, then we immediately ensure the
`Nav` value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag `.@"fn"`, we again
immediately resolve the `Nav` value. The latter restriction is in place
for two reasons:

* Functions are special, in that their externs are allowed to trivially
  alias; i.e. with a declaration `extern fn foo(...)`, you can write
  `const bar = foo;`. This is not allowed for non-function externs, and
  it means that function types are the only place where it is possible
  for a declaration `Nav` to have a `.@"extern"` value without actually
  being declared `extern`. We need to identify this situation
  immediately so that the `decl_ref` can create a pointer to the *real*
  extern `Nav`, not this alias.
* In certain situations, such as taking a pointer to a `Nav`, Sema needs
  to queue analysis of a runtime function if the value is a function. To
  do this, the function value needs to be known, so we need to resolve
  the value immediately upon `&foo` where `foo` is a function.

This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.

A consequence of this commit is that codegen and linking logic needs to
be more careful when looking at `Nav`s. In general:

* When `updateNav` or `updateFunc` is called, it is safe to assume that
  the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully
  resolved.
* Any `Nav` whose value is/will be an `@"extern"` or a function is fully
  resolved; see `Nav.getExtern` for a helper for a common case here.
* Any other `Nav` may only have its type resolved.

This didn't seem to be too tricky to satisfy in any of the existing
codegen/linker backends.

Resolves: #131
2024-12-24 02:18:41 +00:00
Jacob Young
5c76e08f49 lldb: add pretty printer for intern pool indices 2024-12-20 22:51:20 -05:00
Alex Rønne Petersen
762ad4c6f4
link: Use defaultFunctionAlignment() when function alignment is unspecified.
max(user_align, minFunctionAlignment()) is only appropriate when the user has
actually given an explicit, non-zero alignment value.
2024-10-20 09:21:17 +02:00
Andrew Kelley
13fb68c064 link: consolidate diagnostics
By organizing linker diagnostics into this struct, it becomes possible
to share more code between linker backends, and more importantly it
becomes possible to pass only the Diag struct to some functions, rather
than passing the entire linker state object in. This makes data
dependencies more obvious, making it easier to rearrange code and to
multithread.

Also fix MachO code abusing an atomic variable. Not only was it using
the wrong atomic operation, it is unnecessary additional state since
the state is already being protected by a mutex.
2024-10-11 10:36:19 -07:00
Andrew Kelley
14c8e270bb link: fix false positive crtbegin/crtend detection
Embrace the Path abstraction, doing more operations based on directory
handles rather than absolute file paths. Most of the diff noise here
comes from this one.

Fix sorting of crtbegin/crtend atoms. Previously it would look at all
path components for those strings.

Make the C runtime path detection partially a pure function, and move
some logic to glibc.zig where it belongs.
2024-10-10 14:21:52 -07:00
Linus Groh
8588964972 Replace deprecated default initializations with decl literals 2024-09-12 16:01:23 +01:00
Jacob Young
e046977354 codegen: implement output to the .debug_info section 2024-09-10 12:27:57 -04:00
mlugg
0fe3fd01dd
std: update std.builtin.Type fields to follow naming conventions
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
2024-08-28 08:39:59 +01:00
David Rubin
f777b29832
fix up merge conflicts with master 2024-08-25 22:43:57 -07:00
David Rubin
1c1feba08e
remove mod aliases for Zcus 2024-08-25 15:17:40 -07:00
David Rubin
863f74dcd2
comp: rename module to zcu 2024-08-25 15:17:21 -07:00
David Rubin
b4bb64ce78
sema: rework type resolution to use Zcu when possible 2024-08-25 15:16:42 -07:00
Jacob Young
eaa227449c Dwarf: fix issues with inline call sites 2024-08-20 15:08:23 -04:00
Jakub Konka
96441bd829 macho: update codegen and linker to distributed jump table approach 2024-08-17 08:14:38 +02:00
Jacob Young
ef11bc9899 Dwarf: rework self-hosted debug info from scratch
This is in preparation for incremental and actually being able to debug
executables built by the x86_64 backend.
2024-08-16 15:22:55 -04:00
mlugg
548a087faf
compiler: split Decl into Nav and Cau
The type `Zcu.Decl` in the compiler is problematic: over time it has
gained many responsibilities. Every source declaration, container type,
generic instantiation, and `@extern` has a `Decl`. The functions of
these `Decl`s are in some cases entirely disjoint.

After careful analysis, I determined that the two main responsibilities
of `Decl` are as follows:
* A `Decl` acts as the "subject" of semantic analysis at comptime. A
  single unit of analysis is either a runtime function body, or a
  `Decl`. It registers incremental dependencies, tracks analysis errors,
  etc.
* A `Decl` acts as a "global variable": a pointer to it is consistent,
  and it may be lowered to a specific symbol by the codegen backend.

This commit eliminates `Decl` and introduces new types to model these
responsibilities: `Cau` (Comptime Analysis Unit) and `Nav` (Named
Addressable Value).

Every source declaration, and every container type requiring resolution
(so *not* including `opaque`), has a `Cau`. For a source declaration,
this `Cau` performs the resolution of its value. (When #131 is
implemented, it is unsolved whether type and value resolution will share
a `Cau` or have two distinct `Cau`s.) For a type, this `Cau` is the
context in which type resolution occurs.

Every non-`comptime` source declaration, every generic instantiation,
and every distinct `extern` has a `Nav`. These are sent to codegen/link:
the backends by definition do not care about `Cau`s.

This commit has some minor technically-breaking changes surrounding
`usingnamespace`. I don't think they'll impact anyone, since the changes
are fixes around semantics which were previously inconsistent (the
behavior changed depending on hashmap iteration order!).

Aside from that, this changeset has no significant user-facing changes.
Instead, it is an internal refactor which makes it easier to correctly
model the responsibilities of different objects, particularly regarding
incremental compilation. The performance impact should be negligible,
but I will take measurements before merging this work into `master`.

Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com>
Co-authored-by: Jakub Konka <kubkon@jakubkonka.com>
2024-08-11 07:29:41 +01:00
Jakub Konka
8f82a66019 MachO/ZigObject: handle ref to an extern in getDeclVAddr 2024-08-10 22:25:57 +02:00
Jakub Konka
421b105d85 macho: ensure we only ever put named symbols in the symtab 2024-08-10 17:19:37 +02:00
Jakub Konka
669f285943 elf: move ownership of atoms into objects 2024-07-30 10:00:50 +02:00
Jakub Konka
8541119e9b macho: handle empty string in ZigObject.getString 2024-07-22 13:35:55 +02:00
Jakub Konka
06a0da3e8a macho: cache string len 2024-07-22 12:06:02 +02:00
Jakub Konka
2b84592858 macho: run more things in parallel 2024-07-22 12:06:02 +02:00
Jakub Konka
34f34dbe32 macho: reinstate duplicate definition checking 2024-07-18 09:13:09 +02:00
Jakub Konka
e9328e7da8 macho: fix 32bit compilation issues 2024-07-18 09:13:09 +02:00
Jakub Konka
3338813077 macho: use isec for working out getAtomData in ZigObject 2024-07-18 09:13:09 +02:00
Jakub Konka
a9e3088d9c macho: extract testing logic for TLS into a helper 2024-07-18 09:13:08 +02:00
Jakub Konka
103c16c879 macho: clean up atom+symbol creation logic in ZigObject 2024-07-18 09:13:08 +02:00
Jakub Konka
e117e05768 macho: ensure we always name decls like LLVM to avoid confusion 2024-07-18 09:13:08 +02:00