This was done by regex substitution with `sed`. I then manually went
over the entire diff and fixed any incorrect changes.
This diff also changes a lot of `callconv(.C)` to `callconv(.c)`, since
my regex happened to also trigger here. I opted to leave these changes
in, since they *are* a correct migration, even if they're not the one I
was trying to do!
`Sema.explainWhyValueContainsReferenceToComptimeVar` (concise name!)
adds notes to an error explaining how to get from a given `Value` to a
pointer to some `comptime var` (or a comptime field). Previously, this
error could be very opaque in any case where it wasn't obvious where the
comptime var pointer came from; particularly for type captures. Now, the
error notes explain this to the user.
This rewrite improves some error messages, hugely simplifies the logic,
and fixes several bugs. One of these bugs is technically a new rule
which Andrew and I agreed on: if a parameter has a comptime-only type
but is not declared `comptime`, then the corresponding call argument
should not be *evaluated* at comptime; only resolved. Implementing this
required changing how function types work a little, which in turn
required allowing a new kind of function coercion for some generic use
cases: function coercions are now allowed to implicitly *remove*
`comptime` annotations from parameters with comptime-only types. This is
okay because removing the annotation affects only the call site.
Resolves: #22262
`Zcu.PerThead.ensureTypeUpToDate` is set up in such a way that it only
returns the updated type the first time it is called. In general, that's
okay; however, the exception is that we want the function to continue
returning `error.AnalysisFail` when the type has been lost, or its
number of captures changed.
Therefore, the check for this case now happens before the up-to-date
success return.
For simplicity, the number of captures is now handled by intentionally
losing the instruction in `Zcu.mapOldZirToNew`, since there is nothing
to gain from tracking a type when old instances of it can never be
reused.
The old lowering was kind of neat, but it unintentionally allowed the
syntax `for (123) |_| { ... }`, and there wasn't really a way to fix
that. So, instead, we include both the start and the end of the range in
the `for_len` instruction (each operand to `for` now has *two* entries
in this multi-op instruction). This slightly increases the size of ZIR
for loops of predominantly indexables, but the difference is small
enough that it's not worth complicating ZIR to try and fix it.
Some sub-expressions should always be evaluated at comptime -- in
particular, type expressions, e.g. `E` in `E!T`. However, bugs in this
logic are easy to miss, because the parent scope is usually comptime
anyway!
This fixes a bug which exposed a compiler implementation detail (ZIR
alloc elision). Previously, `const` declarations with a runtime-known
value in a comptime scope were permitted only if AstGen was able to
elide the alloc in ZIR, since the error was reported by storing to the
comptime alloc.
This just adds a new instruction to also emit this error when the alloc
is elided.
To avoid this PR regressing error messages, most of the work here has
gone towards improving error notes for why code was comptime-evaluated.
ZIR `block_comptime` now stores a "comptime reason", the enum for which
is also used by Sema. There are two types in Sema:
* `ComptimeReason` represents the reason we started evaluating something
at comptime.
* `BlockComptimeReason` represents the reason a given block is evaluated
at comptime; it's either a `ComptimeReason` with an attached source
location, or it's because we're in a function which was called at
comptime (and that function's `Block` should be consulted for the
"parent" reason).
Every `Block` stores a `?BlockComptimeReason`. The old `is_comptime`
field is replaced with a trivial `isComptime()` method which returns
whether that reason is non-`null`.
Lastly, the handling for `block_comptime` has been simplified. It was
previously going through an unnecessary runtime-handling path; now, it
is a trivial sub block exited through a `break_inline` instruction.
Resolves: #22296
The new representation is often more compact. It is also more
straightforward to understand: for instance, `extern` is represented on
the `declaration` instruction itself rather than using a special
instruction. The same applies to `var`, making both of these far more
compact.
This commit also separates the type and value bodies of a `declaration`
instruction. This is a prerequisite for #131.
In general, `declaration` now directly encodes details of the syntax
form used, and the embedded ZIR bodies are for actual expressions. The
only exception to this is functions, where ZIR is effectively designed
as if we had #1717. `extern fn` declarations are modeled as
`extern const` with a function type, and normal `fn` definitions are
modeled as `const` with a `func{,_fancy,_inferred}` instruction. This
may change in the future, but improving on this was out of scope for
this commit.
(With the exception of x86 since that was available from the beginning.)
These were determined by analyzing the full, reconstructed Git history of the
Linux kernel here: https://landley.net/kdocs/fullhist
Currently, `zig ast-check` fails on ZON files, because it tries to
interpret the file as Zig source code. This commit introduces a new
verification pass, `std.zig.ZonGen`, which applies to an AST in ZON
mode.
Like `AstGen`, this pass also converts the AST into a more helpful
format. Rather than a sequence of instructions like `Zir`, the output
format of `ZonGen` is a new datastructure called `Zoir`. This type is
essentially a simpler form of AST, containing only the information
required for consumers of ZON. It is also far more compact than
`std.zig.Ast`, with the size generally being comparable to the size of
the well-formatted source file.
The emitted `Zoir` is currently not used aside from the `-t` option to
`ast-check` which causes it to be dumped to stdout. However, in future,
it can be used for comptime `@import` of ZON files, as well as for
simpler handling of files like `build.zig.zon`, and even by other parts
of the Zig Standard Library.
Resolves: #22078
This code was left over from the legacy Autodoc implementation. No
component of the compiler pipeline actually requires doc comments, so it
is a waste of time and space to store them in ZIR.
This is, at least today, a very broken target: It doesn't actually build either
musl or wasi-libc even if you use -lc. It does give you musl headers, but that's
it. Those headers are not terribly useful, however, without any implementation
code. You can sort of call some math functions because they just so happen to
have implementations in compiler-rt. But that's only true for a small subset,
and I don't think users should be relying on the ABI surface of a library that
is an implementation detail of the compiler.
Clearly, a freestanding-capable libc of sorts is a useful thing as evidenced by
newlib, picolibc, etc existing. However, calling it "musl" is misleading when it
isn't actually musl-compatible, nor can it ever be because the musl API surface
is inextricably tied to the Linux kernel. In the discussion on #20690, there was
agreement that once we split up the API and ABI components in the target string,
the API component should be about compatibility, not whether you literally get a
particular implementation of it. Also, we decided that Linux musl and wasi-libc
musl shouldn't use the same API tag precisely because they're not actually
compatible.
(And besides, how would any syscall even be implemented in freestanding? Who or
what would we be calling?)
So I think we should remove this triple for now. If we decide to reintroduce
something like this, especially once #2879 gets going, we should come up with a
bespoke name for it rather than using "musl".
Both of these instructions were previously under a special case in
`rvalue` which resulted in every reference to such an instruction adding
a new `ref` instruction. This had the effect that, for instance,
`&a != &a` for parameters. Deduplicating these `ref` instructions was
problematic for different reasons.
For `alloc_inferred`, the problem was that it's not valid to `ref` the
alloc until the allocation has been resolved (`resolve_inferred_alloc`),
but `AstGen.appendBodyWithFixups` would place the `ref` directly after
the `alloc_inferred`. This is solved by bringing
`resolve_inferred_alloc` in line with `make_ptr_const` by having it
*return* the final pointer, rather than modifying `sema.inst_map` of the
original `alloc_inferred`. That way, the `ref` refers to the
`resolve_inferred_alloc` instruction, so is placed immediately after it,
avoiding this issue.
For `param`, the problem is a bit trickier: `param` instructions live in
a body which must contain only `param` instructions, then a
`func{,_inferred,_fancy}`, then a `break_inline`. Moreover, `param`
instructions may be referenced not only by the function body, but also
by other parameters, the return type expression, etc. Each of these
bodies requires separate `ref` instructions. This is solved by pulling
entries out of `ref_table` after evaluating each component of the
function declaration, and appending the refs later on when actually
putting the bodies together. This gives way to another issue: if you
write `fn f(x: T) @TypeOf(x.foo())`, then since `x.foo()` takes a
reference to `x`, this `ref` instruction is now in a comptime context
(outside of the `@TypeOf` ZIR body), so emits a compile error. This is
solved by loosening the rules around `ref` instructions; because they
are not side-effecting, it is okay to allow `ref` of runtime values at
comptime, resulting in a runtime-known value in a comptime scope. We
already apply this mechanism in some cases; for instance, it's why
`runtime_array.len` works in a `comptime` context. In future, we will
want to give similar treatment to many operations in Sema: in general,
it's fine to apply runtime operations at comptime provided they don't
have side effects!
Resolves: #22140
The main change here is to partition tracked instructions found within a
declaration. It's very unlikely that, for instance, a `struct { ... }`
type declaration was intentionally turned into a reification or an
anonymous initialization, so it makes sense to track things in a few
different arrays.
In particular, this fixes an issue where a `func` instruction could
wrongly be mapped to something else if the types of function parameters
changed. This would cause huge problems further down the pipeline; we
expect that if a `declaration` is tracked, and it previously contained a
`func`/`func_inferred`/`func_fancy`, then this instruction is either
tracked to another `func`/`func_inferred`/`func_fancy` instruction, or
is lost.
Also, this commit takes the opportunity to rename the functions actually
doing this logic. `Zir.findDecls` was a name that might have made sense
at some point, but nowadays, it's definitely not finding declarations,
and it's not *exclusively* finding type declarations. Instead, the point
is to find instructions which we want to track; hence the new name,
`Zir.findTrackable`.
Lastly, a nice side effect of partitioning the output of `findTrackable`
is that `Zir.declIterator` no longer needs to accept input instructions
which aren't type declarations (e.g. `reify`, `func`).
This commit enhances AstGen to introduce a form of error resilience
which allows valid ZIR to be emitted even when AstGen errors occur.
When a non-fatal AstGen error (e.g. `appendErrorNode`) occurs, ZIR
generation is not affected; the error is added to `astgen.errors` and
ultimately to the errors stored in `extra`, but that doesn't stop us
getting valid ZIR. Fatal AstGen errors (e.g. `failNode`) are a bit
trickier. These errors return `error.AnalysisFail`, which is propagated
up the stack. In theory, any parent expression can catch this error and
handle it, continuing ZIR generation whilst throwing away whatever was
lost. For now, we only do this in one place: when creating declarations.
If a call to `fnDecl`, `comptimeDecl`, `globalVarDecl`, etc, returns
`error.AnalysisFail`, the `declaration` instruction is still created,
but its body simply contains the new `extended(astgen_error())`
instruction, which instructs Sema to terminate semantic analysis with a
transitive error. This means that a fatal AstGen error causes the
innermost declaration containing the error to fail, but the rest of the
file remains intact.
If a source file contains parse errors, or an `error.AnalysisFail`
happens when lowering the top-level struct (e.g. there is an error in
one of its fields, or a name has multiple declarations), then lowering
for the entire file fails. Alongside the existing `Zir.hasCompileErrors`
query, this commit introduces `Zir.loweringFailed`, which returns `true`
only in this case.
The end result here is that files with AstGen failures will almost
always still emit valid ZIR, and hence can undergo semantic analysis on
the parts of the file which are (from AstGen's perspective) valid. This
is a noteworthy improvement to UX, but the main motivation here is
actually incremental compilation. Previously, AstGen failures caused
lots of semantic analysis work to be thrown out, because all `AnalUnit`s
in the file required re-analysis so as to trigger necessary transitive
failures and remove stored compile errors which would no longer make
sense (because a fresh compilation of this code would not emit those
errors, as the units those errors applied to would fail sooner due to
referencing a failed file). Now, this case only applies when a file has
severe top-level errors, which is far less common than something like
having an unused variable.
Lastly, this commit changes a few errors in `AstGen` to become fatal
when they were previously non-fatal and vice versa. If there is still a
reasonable way to continue AstGen and lower to ZIR after an error, it is
non-fatal; otherwise, it is fatal. For instance, `comptime const`, while
redundant syntax, has a clear meaning we can lower; on the other hand,
using an undeclared identifer has no sane lowering, so must trigger a
fatal error.