This commits adds the following distinct integer types to std.zig.Ast:
- OptionalTokenIndex
- TokenOffset
- OptionalTokenOffset
- Node.OptionalIndex
- Node.Offset
- Node.OptionalOffset
The `Node.Index` type has also been converted to a distinct type while
`TokenIndex` remains unchanged.
`Ast.Node.Data` has also been changed to a (untagged) union to provide
safety checks.
Unfortunately, I can't easily add a test for this, because the repro
depends on some details of DWARF layout; but I've confirmed that it
fixes a bug repro on another branch.
The changes from a few commits earlier, where semantic analysis no
longer occurs if any Zig files failed to lower to ZIR, mean `file`
dependencies are no longer necessary! However, we now need them for ZON
files, to be invalidated whenever a ZON file changes.
This came with a big cleanup to `Zcu.PerThread.updateFile` (formerly
`astGenFile`).
Also, change how the cache manifest works for files in the import table.
Instead of being added to the manifest when we call `semaFile` on them,
we iterate the import table after running the AstGen workers and add all
the files to the cache manifest then.
The downside is that this is a bit more eager to include files in the
manifest; in particular, files which are imported but not actually
referenced are now included in analysis. So, for instance, modifying any
standard library file will invalidate all Zig compilations using that
standard library, even if they don't use that file.
The original motivation here was simply that the old logic in `semaFile`
didn't translate nicely to ZON. However, it turns out to actually be
necessary for correctness. Because `@import("foo.zig")` is an
AstGen-level error if `foo.zig` does not exist, we need to invalidate
the cache when an imported but unreferenced file is removed to make sure
this error is triggered when it needs to be.
Resolves: #22746
This is mainly in preparation for integrating ZonGen into the pipeline
properly, although these names are better because `astGenFile` isn't
*necessarily* running AstGen; it may determine that the current ZIR is
up-to-date, or load cached ZIR.
Instead, `source`, `tree`, and `zir` should all be optional. This is
precisely what we're actually trying to model here; and `File` isn't
optimized for memory consumption or serializability anyway, so it's fine
to use a couple of extra bytes on actual optionals here.
This commit allows using ZON (Zig Object Notation) in a few ways.
* `@import` can be used to load ZON at comptime and convert it to a
normal Zig value. In this case, `@import` must have a result type.
* `std.zon.parse` can be used to parse ZON at runtime, akin to the
parsing logic in `std.json`.
* `std.zon.stringify` can be used to convert arbitrary data structures
to ZON at runtime, again akin to `std.json`.
Uses of `@embedFile` register dependencies on the corresponding
`Zcu.EmbedFile`. At the start of every update, we iterate all embedded
files and update them if necessary, and invalidate the dependencies if
they changed.
In order to properly integrate with the lazy analysis model, failed
embed files are now reported by the `AnalUnit` which actually used
`@embedFile`; the filesystem error is stored in the `Zcu.EmbedFile`.
An incremental test is added covering incremental updates to embedded
files, and I have verified locally that dependency invalidation is
working correctly.
* `std.builtin.Panic` -> `std.builtin.panic`, because it is a namespace.
* `root.Panic` -> `root.panic` for the same reason. There are type
checks so that we still allow the legacy `pub fn panic` strategy in
the 0.14.0 release.
* `std.debug.SimplePanic` -> `std.debug.simple_panic`, same reason.
* `std.debug.NoPanic` -> `std.debug.no_panic`, same reason.
* `std.debug.FormattedPanic` is now a function `std.debug.FullPanic`
which takes as input a `panicFn` and returns a namespace with all the
panic functions. This handles the incredibly common case of just
wanting to override how the message is printed, whilst keeping nice
formatted panics.
* Remove `std.builtin.panic.messages`; now, every safety panic has its
own function. This reduces binary bloat, as calls to these functions
no longer need to prepare any arguments (aside from the error return
trace).
* Remove some legacy declarations, since a zig1.wasm update has
happened. Most of these were related to the panic handler, but a quick
grep for "zig1" brought up a couple more results too.
Also, add some missing type checks to Sema.
Resolves: #22584
formatted -> full
The original motivation here was to fix regressions caused by #22414.
However, while working on this, I ended up discussing a language
simplification with Andrew, which changes things a little from how they
worked before #22414.
The main user-facing change here is that any reference to a prior
function parameter, even if potentially comptime-known at the usage
site or even not analyzed, now makes a function generic. This applies
even if the parameter being referenced is not a `comptime` parameter,
since it could still be populated when performing an inline call. This
is a breaking language change.
The detection of this is done in AstGen; when evaluating a parameter
type or return type, we track whether it referenced any prior parameter,
and if so, we mark this type as being "generic" in ZIR. This will cause
Sema to not evaluate it until the time of instantiation or inline call.
A lovely consequence of this from an implementation perspective is that
it eliminates the need for most of the "generic poison" system. In
particular, `error.GenericPoison` is now completely unnecessary, because
we identify generic expressions earlier in the pipeline; this simplifies
the compiler and avoids redundant work. This also entirely eliminates
the concept of the "generic poison value". The only remnant of this
system is the "generic poison type" (`Type.generic_poison` and
`InternPool.Index.generic_poison_type`). This type is used in two
places:
* During semantic analysis, to represent an unknown result type.
* When storing generic function types, to represent a generic parameter/return type.
It's possible that these use cases should instead use `.none`, but I
leave that investigation to a future adventurer.
One last thing. Prior to #22414, inline calls were a little inefficient,
because they re-evaluated even non-generic parameter types whenever they
were called. Changing this behavior is what ultimately led to #22538.
Well, because the new logic will mark a type expression as generic if
there is any change its resolved type could differ in an inline call,
this redundant work is unnecessary! So, this is another way in which the
new design reduces redundant work and complexity.
Resolves: #22494Resolves: #22532Resolves: #22538
This was done by regex substitution with `sed`. I then manually went
over the entire diff and fixed any incorrect changes.
This diff also changes a lot of `callconv(.C)` to `callconv(.c)`, since
my regex happened to also trigger here. I opted to leave these changes
in, since they *are* a correct migration, even if they're not the one I
was trying to do!
Makes linker functions have small error sets, required to report
diagnostics properly rather than having a massive error set that has a
lot of codes.
Other linker implementations are not ported yet.
Also the branch is not passing semantic analysis yet.
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls
In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.
For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.
This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.
This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
`Sema.explainWhyValueContainsReferenceToComptimeVar` (concise name!)
adds notes to an error explaining how to get from a given `Value` to a
pointer to some `comptime var` (or a comptime field). Previously, this
error could be very opaque in any case where it wasn't obvious where the
comptime var pointer came from; particularly for type captures. Now, the
error notes explain this to the user.
Rather than `Zcu.BuiltinDecl.Memoized` being a struct with fields, it
can instead just be an array, indexed by the enum. This allows runtime
indexing, avoiding a few now-unnecessary `inline` switch cases.
This commit reworks how values like the panic handler function are
memoized during a compiler invocation. Previously, the value was
resolved by whichever analysis requested it first, and cached on `Zcu`.
This is problematic for incremental compilation, as after the initial
resolution, no dependencies are marked by users of this memoized state.
This is arguably acceptable for `std.builtin`, but it's definitely not
acceptable for the panic handler/messages, because those can be set by
the user (`std.builtin.Panic` checks `@import("root").Panic`).
So, here we introduce a new kind of `AnalUnit`, called `memoized_state`.
There are 3 such units:
* `.{ .memoized_state = .va_list }` resolves the type `std.builtin.VaList`
* `.{ .memoized_state = .panic }` resolves `std.Panic`
* `.{ .memoized_state = .main }` resolves everything else we want
These units essentially "bundle" the resolution of their corresponding
declarations, storing the results into fields on `Zcu`. This way, when,
for instance, a function wants to call the panic handler, it simply runs
`ensureMemoizedStateResolved`, registering one dependency, and pulls the
values from the `Zcu`. This "bundling" minimizes dependency edges. The 3
units are separated to allow them to act independently: for instance,
the panic handler can use `std.builtin.Type` without triggering a
dependency loop.
`Zcu.PerThead.ensureTypeUpToDate` is set up in such a way that it only
returns the updated type the first time it is called. In general, that's
okay; however, the exception is that we want the function to continue
returning `error.AnalysisFail` when the type has been lost, or its
number of captures changed.
Therefore, the check for this case now happens before the up-to-date
success return.
For simplicity, the number of captures is now handled by intentionally
losing the instruction in `Zcu.mapOldZirToNew`, since there is nothing
to gain from tracking a type when old instances of it can never be
reused.
To avoid this PR regressing error messages, most of the work here has
gone towards improving error notes for why code was comptime-evaluated.
ZIR `block_comptime` now stores a "comptime reason", the enum for which
is also used by Sema. There are two types in Sema:
* `ComptimeReason` represents the reason we started evaluating something
at comptime.
* `BlockComptimeReason` represents the reason a given block is evaluated
at comptime; it's either a `ComptimeReason` with an attached source
location, or it's because we're in a function which was called at
comptime (and that function's `Block` should be consulted for the
"parent" reason).
Every `Block` stores a `?BlockComptimeReason`. The old `is_comptime`
field is replaced with a trivial `isComptime()` method which returns
whether that reason is non-`null`.
Lastly, the handling for `block_comptime` has been simplified. It was
previously going through an unnecessary runtime-handling path; now, it
is a trivial sub block exited through a `break_inline` instruction.
Resolves: #22296
This commit separates semantic analysis of the annotated type vs value
of a global declaration, therefore allowing recursive and mutually
recursive values to be declared.
Every `Nav` which undergoes analysis now has *two* corresponding
`AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val`
unit is responsible for *fully resolving* the `Nav`: determining its
value, linksection, addrspace, etc. The `nav_ty` unit, on the other
hand, resolves only the information necessary to construct a *pointer*
to the `Nav`: its type, addrspace, etc. (It does also analyze its
linksection, but that could be moved to `nav_val` I think; it doesn't
make any difference).
Analyzing a `nav_ty` for a declaration with no type annotation will just
mark a dependency on the `nav_val`, analyze it, and finish. Conversely,
analyzing a `nav_val` for a declaration *with* a type annotation will
first mark a dependency on the `nav_ty` and analyze it, using this as
the result type when evaluating the value body.
The `nav_val` and `nav_ty` units always have references to one another:
so, if a `Nav`'s type is referenced, its value implicitly is too, and
vice versa. However, these dependencies are trivial, so, to save memory,
are only known implicitly by logic in `resolveReferences`.
In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the
corresponding `Nav`. There are two exceptions to this. If the
declaration is an `extern` declaration, then we immediately ensure the
`Nav` value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag `.@"fn"`, we again
immediately resolve the `Nav` value. The latter restriction is in place
for two reasons:
* Functions are special, in that their externs are allowed to trivially
alias; i.e. with a declaration `extern fn foo(...)`, you can write
`const bar = foo;`. This is not allowed for non-function externs, and
it means that function types are the only place where it is possible
for a declaration `Nav` to have a `.@"extern"` value without actually
being declared `extern`. We need to identify this situation
immediately so that the `decl_ref` can create a pointer to the *real*
extern `Nav`, not this alias.
* In certain situations, such as taking a pointer to a `Nav`, Sema needs
to queue analysis of a runtime function if the value is a function. To
do this, the function value needs to be known, so we need to resolve
the value immediately upon `&foo` where `foo` is a function.
This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.
A consequence of this commit is that codegen and linking logic needs to
be more careful when looking at `Nav`s. In general:
* When `updateNav` or `updateFunc` is called, it is safe to assume that
the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully
resolved.
* Any `Nav` whose value is/will be an `@"extern"` or a function is fully
resolved; see `Nav.getExtern` for a helper for a common case here.
* Any other `Nav` may only have its type resolved.
This didn't seem to be too tricky to satisfy in any of the existing
codegen/linker backends.
Resolves: #131
The `Cau` abstraction originated from noting that one of the two primary
roles of the legacy `Decl` type was to be the subject of comptime
semantic analysis. However, the data stored in `Cau` has always had some
level of redundancy. While preparing for #131, I went to remove that
redundany, and realised that `Cau` now had exactly one field: `owner`.
This led me to conclude that `Cau` is, in fact, an unnecessary level of
abstraction over what are in reality *fundamentally different* kinds of
analysis unit (`AnalUnit`). Types, `Nav` vals, and `comptime`
declarations are all analyzed in different ways, and trying to treat
them as the same thing is counterproductive!
So, these 3 cases are now different alternatives in `AnalUnit`. To avoid
stealing bits from `InternPool`-based IDs, which are already a little
starved for bits due to the sharding datastructures, `AnalUnit` is
expanded to 64 bits (30 of which are currently unused). This doesn't
impact memory usage too much by default, because we don't store
`AnalUnit`s all too often; however, we do store them a lot under
`-fincremental`, so a non-trivial bump to peak RSS can be observed
there. This will be improved in the future when I made
`InternPool.DepEntry` less memory-inefficient.
`Zcu.PerThread.ensureCauAnalyzed` is split into 3 functions, for each of
the 3 new types of `AnalUnit`. The new logic is much easier to
understand, because it avoids conflating the logic of these
fundamentally different cases.
The new representation is often more compact. It is also more
straightforward to understand: for instance, `extern` is represented on
the `declaration` instruction itself rather than using a special
instruction. The same applies to `var`, making both of these far more
compact.
This commit also separates the type and value bodies of a `declaration`
instruction. This is a prerequisite for #131.
In general, `declaration` now directly encodes details of the syntax
form used, and the embedded ZIR bodies are for actual expressions. The
only exception to this is functions, where ZIR is effectively designed
as if we had #1717. `extern fn` declarations are modeled as
`extern const` with a function type, and normal `fn` definitions are
modeled as `const` with a `func{,_fancy,_inferred}` instruction. This
may change in the future, but improving on this was out of scope for
this commit.
This includes function aliases, but not function declarations.
Also, re-introduce a target check for function alignment which was
inadvertently removed in the prior commit.
This code was left over from the legacy Autodoc implementation. No
component of the compiler pipeline actually requires doc comments, so it
is a waste of time and space to store them in ZIR.
The previous commit cast doubt upon the initial report about macOS
kernel behavior, identifying another reason that ENOENT could be
returned from file creation.
However, it is demonstrable that ENOENT can be returned for both cases:
1. create file race
2. handle refers to deleted directory
This commit re-introduces the workaround for the file creation race on
macOS however it does not unconditionally retry - it first tries again
with O_EXCL to disambiguate the error condition that has occurred.
Previous commits
2b0929929d67e222ca6a9523a3a594ed456c4a51
4ea2f441df36cec61e1017f4d795d4037326c98c
had this text:
> There are no dir components, so you would think that this was
> unreachable, however we have observed on macOS two processes racing to
> do openat() with O_CREAT manifest in ENOENT.
This appears to have been a misunderstanding based on the issue
report #12138 and corresponding PR #12139 in which the steps to
reproduce removed the cache directory in a loop which also executed
detached Zig compiler processes.
There is no evidence for the macOS kernel bug however the ENOENT is
easily explained by the removal of the cache directory.
This commit reverts those commits, ultimately reporting the ENOENT as an
error rather than repeating the create file operation. However this
commit also adds an explicit error set to `std.Build.Cache.hit` as well
as changing the `failed_file_index` to a proper diagnostic field that
fully communicates what failed, leading to more informative error
messages on failure to check the cache.
The equivalent failure when occuring for AstGen performs a fatal process
kill, reasoning being that the compiler has an invariant of the cache
directory not being yanked out from underneath it while executing. This
could be made a more granular error in the future but I suspect such
thing is not valuable to pursue.
Related to #18340 but does not solve it.
The introduction of the `extended(astgen_error())` instruction allows a
`test` declaration to be unresolved, i.e. the declaration doesn't even
contain a `func`. I could modify AstGen to not do this, but it makes
more sense to just handle this case when collecting test functions.
Note that tests under incremental compilation are currently broken if
you ever remove all references to a test; this is tracked as a subtask
of #21165.
The previous commit exposed some bugs in incremental compilation. This
commit fixes those, and adds a little more logging for debugging
incremental compilation.
Also, allow `ast-check -t` to dump ZIR when there are non-fatal AstGen
errors.
This commit enhances AstGen to introduce a form of error resilience
which allows valid ZIR to be emitted even when AstGen errors occur.
When a non-fatal AstGen error (e.g. `appendErrorNode`) occurs, ZIR
generation is not affected; the error is added to `astgen.errors` and
ultimately to the errors stored in `extra`, but that doesn't stop us
getting valid ZIR. Fatal AstGen errors (e.g. `failNode`) are a bit
trickier. These errors return `error.AnalysisFail`, which is propagated
up the stack. In theory, any parent expression can catch this error and
handle it, continuing ZIR generation whilst throwing away whatever was
lost. For now, we only do this in one place: when creating declarations.
If a call to `fnDecl`, `comptimeDecl`, `globalVarDecl`, etc, returns
`error.AnalysisFail`, the `declaration` instruction is still created,
but its body simply contains the new `extended(astgen_error())`
instruction, which instructs Sema to terminate semantic analysis with a
transitive error. This means that a fatal AstGen error causes the
innermost declaration containing the error to fail, but the rest of the
file remains intact.
If a source file contains parse errors, or an `error.AnalysisFail`
happens when lowering the top-level struct (e.g. there is an error in
one of its fields, or a name has multiple declarations), then lowering
for the entire file fails. Alongside the existing `Zir.hasCompileErrors`
query, this commit introduces `Zir.loweringFailed`, which returns `true`
only in this case.
The end result here is that files with AstGen failures will almost
always still emit valid ZIR, and hence can undergo semantic analysis on
the parts of the file which are (from AstGen's perspective) valid. This
is a noteworthy improvement to UX, but the main motivation here is
actually incremental compilation. Previously, AstGen failures caused
lots of semantic analysis work to be thrown out, because all `AnalUnit`s
in the file required re-analysis so as to trigger necessary transitive
failures and remove stored compile errors which would no longer make
sense (because a fresh compilation of this code would not emit those
errors, as the units those errors applied to would fail sooner due to
referencing a failed file). Now, this case only applies when a file has
severe top-level errors, which is far less common than something like
having an unused variable.
Lastly, this commit changes a few errors in `AstGen` to become fatal
when they were previously non-fatal and vice versa. If there is still a
reasonable way to continue AstGen and lower to ZIR after an error, it is
non-fatal; otherwise, it is fatal. For instance, `comptime const`, while
redundant syntax, has a clear meaning we can lower; on the other hand,
using an undeclared identifer has no sane lowering, so must trigger a
fatal error.
This commit reworks how anonymous struct literals and tuples work.
Previously, an untyped anonymous struct literal
(e.g. `const x = .{ .a = 123 }`) was given an "anonymous struct type",
which is a special kind of struct which coerces using structural
equivalence. This mechanism was a holdover from before we used
RLS / result types as the primary mechanism of type inference. This
commit changes the language so that the type assigned here is a "normal"
struct type. It uses a form of equivalence based on the AST node and the
type's structure, much like a reified (`@Type`) type.
Additionally, tuples have been simplified. The distinction between
"simple" and "complex" tuple types is eliminated. All tuples, even those
explicitly declared using `struct { ... }` syntax, use structural
equivalence, and do not undergo staged type resolution. Tuples are very
restricted: they cannot have non-`auto` layouts, cannot have aligned
fields, and cannot have default values with the exception of `comptime`
fields. Tuples currently do not have optimized layout, but this can be
changed in the future.
This change simplifies the language, and fixes some problematic
coercions through pointers which led to unintuitive behavior.
Resolves: #16865
these tasks have some shared data dependencies so they cannot be done
simultaneously. Future work should untangle these data dependencies so
that more can be done in parallel.
for now this commit ensures correctness by making linker input parsing
and codegen tasks part of the same queue.