This was done by regex substitution with `sed`. I then manually went
over the entire diff and fixed any incorrect changes.
This diff also changes a lot of `callconv(.C)` to `callconv(.c)`, since
my regex happened to also trigger here. I opted to leave these changes
in, since they *are* a correct migration, even if they're not the one I
was trying to do!
mainly, rework how relocations works. This is the point at which symbol
indexes are known - not before. And don't emit unnecessary relocations!
They're only needed when emitting an object file.
Changes wasm linker to keep MIR around long-lived so that fixups can be
reapplied after linker garbage collection.
use labeled switch while we're at it
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls
In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.
For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.
This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.
This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
Rather than `Zcu.BuiltinDecl.Memoized` being a struct with fields, it
can instead just be an array, indexed by the enum. This allows runtime
indexing, avoiding a few now-unnecessary `inline` switch cases.
This commit reworks how values like the panic handler function are
memoized during a compiler invocation. Previously, the value was
resolved by whichever analysis requested it first, and cached on `Zcu`.
This is problematic for incremental compilation, as after the initial
resolution, no dependencies are marked by users of this memoized state.
This is arguably acceptable for `std.builtin`, but it's definitely not
acceptable for the panic handler/messages, because those can be set by
the user (`std.builtin.Panic` checks `@import("root").Panic`).
So, here we introduce a new kind of `AnalUnit`, called `memoized_state`.
There are 3 such units:
* `.{ .memoized_state = .va_list }` resolves the type `std.builtin.VaList`
* `.{ .memoized_state = .panic }` resolves `std.Panic`
* `.{ .memoized_state = .main }` resolves everything else we want
These units essentially "bundle" the resolution of their corresponding
declarations, storing the results into fields on `Zcu`. This way, when,
for instance, a function wants to call the panic handler, it simply runs
`ensureMemoizedStateResolved`, registering one dependency, and pulls the
values from the `Zcu`. This "bundling" minimizes dependency edges. The 3
units are separated to allow them to act independently: for instance,
the panic handler can use `std.builtin.Type` without triggering a
dependency loop.
This commit separates semantic analysis of the annotated type vs value
of a global declaration, therefore allowing recursive and mutually
recursive values to be declared.
Every `Nav` which undergoes analysis now has *two* corresponding
`AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val`
unit is responsible for *fully resolving* the `Nav`: determining its
value, linksection, addrspace, etc. The `nav_ty` unit, on the other
hand, resolves only the information necessary to construct a *pointer*
to the `Nav`: its type, addrspace, etc. (It does also analyze its
linksection, but that could be moved to `nav_val` I think; it doesn't
make any difference).
Analyzing a `nav_ty` for a declaration with no type annotation will just
mark a dependency on the `nav_val`, analyze it, and finish. Conversely,
analyzing a `nav_val` for a declaration *with* a type annotation will
first mark a dependency on the `nav_ty` and analyze it, using this as
the result type when evaluating the value body.
The `nav_val` and `nav_ty` units always have references to one another:
so, if a `Nav`'s type is referenced, its value implicitly is too, and
vice versa. However, these dependencies are trivial, so, to save memory,
are only known implicitly by logic in `resolveReferences`.
In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the
corresponding `Nav`. There are two exceptions to this. If the
declaration is an `extern` declaration, then we immediately ensure the
`Nav` value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag `.@"fn"`, we again
immediately resolve the `Nav` value. The latter restriction is in place
for two reasons:
* Functions are special, in that their externs are allowed to trivially
alias; i.e. with a declaration `extern fn foo(...)`, you can write
`const bar = foo;`. This is not allowed for non-function externs, and
it means that function types are the only place where it is possible
for a declaration `Nav` to have a `.@"extern"` value without actually
being declared `extern`. We need to identify this situation
immediately so that the `decl_ref` can create a pointer to the *real*
extern `Nav`, not this alias.
* In certain situations, such as taking a pointer to a `Nav`, Sema needs
to queue analysis of a runtime function if the value is a function. To
do this, the function value needs to be known, so we need to resolve
the value immediately upon `&foo` where `foo` is a function.
This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.
A consequence of this commit is that codegen and linking logic needs to
be more careful when looking at `Nav`s. In general:
* When `updateNav` or `updateFunc` is called, it is safe to assume that
the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully
resolved.
* Any `Nav` whose value is/will be an `@"extern"` or a function is fully
resolved; see `Nav.getExtern` for a helper for a common case here.
* Any other `Nav` may only have its type resolved.
This didn't seem to be too tricky to satisfy in any of the existing
codegen/linker backends.
Resolves: #131
The new representation is often more compact. It is also more
straightforward to understand: for instance, `extern` is represented on
the `declaration` instruction itself rather than using a special
instruction. The same applies to `var`, making both of these far more
compact.
This commit also separates the type and value bodies of a `declaration`
instruction. This is a prerequisite for #131.
In general, `declaration` now directly encodes details of the syntax
form used, and the embedded ZIR bodies are for actual expressions. The
only exception to this is functions, where ZIR is effectively designed
as if we had #1717. `extern fn` declarations are modeled as
`extern const` with a function type, and normal `fn` definitions are
modeled as `const` with a `func{,_fancy,_inferred}` instruction. This
may change in the future, but improving on this was out of scope for
this commit.
The goal here is to support both levels of unwind tables (sync and async) in
zig cc and zig build. Previously, the LLVM backend always used async tables
while zig cc was partially influenced by whatever was Clang's default.
It doesn't appear that targeting bridgeOS is meaningfully supported by Apple.
Even LLVM/Clang appear to have incomplete support for it, suggesting that Apple
never bothered to upstream that support. So there's really no sense in us
pretending to support this.
Previously, stepping from the single statement within the loop would
always exit the loop because all of the code unrolled from the loop is
associated with the same line and treated by the debugger as one line.
This is necessary since isGnuLibC() is true for hurd, so we need to be able to
represent a glibc version for it.
Also add an Os.TaggedVersionRange.gnuLibCVersion() convenience function.
From `zig build-exe --help`:
-fno-builtin Disable implicit builtin knowledge of functions
It seems entirely reasonable and even expected that this option should imply
both no-builtins on functions (which disables transformation of recognized code
patterns to libcalls) and nobuiltin on call sites (which disables transformation
of libcalls to intrinsics). We now match Clang's behavior for -fno-builtin.
In both cases, we're painting with a fairly broad brush by applying this to an
entire module, but it's better than nothing. #21833 proposes a more fine-grained
way to apply nobuiltin.
This option, by its very nature, needs to be attached to a module. If it isn't,
the code in a module could break at random when compiled into an application
that doesn't have this option set.
After this change, skip_linker_dependencies no longer implies no_builtin in the
LLVM backend.
The former prevents recognizing code patterns and turning them into libcalls,
which is what we want for compiler-rt. The latter is meant to be used on call
sites to prevent them from being turned into intrinsics.
Context: https://github.com/ziglang/zig/issues/21833
The old isARM() function was a portability trap. With the name it had, it seemed
like the obviously correct function to use, but it didn't include Thumb. In the
vast majority of cases where someone wants to ask "is the target Arm?", Thumb
*should* be included.
There are exactly 3 cases in the codebase where we do actually need to exclude
Thumb, although one of those is in Aro and mirrors a check in Clang that is
itself likely a bug. These rare cases can just add an extra isThumb() check.
Once we upgrade to LLVM 20, these should be lowered verbatim rather than to
simply musl. Similarly, the special case in llvmMachineAbi() should go away.
Like d1d95294fd657f771657ea671a6984b860347fb0, this is more Apple nonsense where
they abused the arch component of the triple to encode what's really an ABI.
Handling this correctly in Zig's target triple model would take quite a bit of
work. Fortunately, the last Armv7-based Apple Watch was released in 2017 and
these targets are now considered legacy. By the time Zig hits 1.0, they will be
a distant memory. So just remove them.