This commit starts by making Zir.Inst.Index a nonexhaustive enum rather
than a u32 alias for type safety purposes, and the rest of the changes
are needed to get everything compiling again.
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.
This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for #16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.
As part of these changes, this commit also implements #17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.
This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!
In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.
Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.
Resolves: #16512Resolves: #17194
This also modifies AstGen so that struct types use 1 bit each from the
flags to communicate if there are nonzero inits, alignments, or comptime
fields. This allows adding a struct type to the InternPool without
looking ahead in memory to find out the answers to these questions,
which is easier for CPUs as well as for me, coding this logic right now.
AstGen emits an error when a closure over a known-runtime value crosses
a namespace boundary. This usually makes sense: however, this usage is
actually valid if the capture is within a `@TypeOf` operand. Sema
already has a special case to allow such closure within `@TypeOf` when
AstGen could not determine a value to be runtime-known. This commit
simply introduces analagous logic to AstGen to allow `var`s to cross
namespace boundaries within `@TypeOf`.
This change implements the following syntax into the compiler:
```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```
A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.
A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.
Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors
A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.
There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.
A follow-up commit will begin utilizing this syntax in the Zig compiler.
Resolves: #498
The `coerce_result_ptr` instruction is highly problematic and leads to
unintentional memory reinterpretation in some cases. It is more correct
to simply not forward result pointers through this builtin.
`coerce_result_ptr` is still used for struct and array initializations,
where it can still cause issues. Eliminating this usage will be a future
change.
Resolves: #16991
The following cast builtins did not previously work on vectors, and have
been made to:
* `@floatCast`
* `@ptrFromInt`
* `@intFromPtr`
* `@floatFromInt`
* `@intFromFloat`
* `@intFromBool`
Resolves: #16267
There are a couple concepts here worth understanding:
Key.UnionType - This type is available *before* resolving the union's
fields. The enum tag type, number of fields, and field names, field
types, and field alignments are not available with this.
InternPool.UnionType - This one can be obtained from the above type with
`InternPool.loadUnionType` which asserts that the union's enum tag type
has been resolved. This one has all the information available.
Additionally:
* ZIR: Turn an unused bit into `any_aligned_fields` flag to help
semantic analysis know whether a union has explicit alignment on any
fields (usually not).
* Sema: delete `resolveTypeRequiresComptime` which had the same type
signature and near-duplicate logic to `typeRequiresComptime`.
- Make opaque types not report comptime-only (this was inconsistent
between the two implementations of this function).
* Implement accepted proposal #12556 which is a breaking change.
The main motivation for this change is eliminating the `block_ptr`
result location and corresponding `store_to_block_ptr` ZIR instruction.
This is achieved through a simple pass over the AST before AstGen which
determines, for AST nodes which have a choice on whether to provide a
result location, which choice to make, based on whether the result
pointer is consumed non-trivially.
This eliminates so much logic from AstGen that we almost break even on
line count! AstGen no longer has to worry about instruction rewriting
based on whether or not a result location was consumed: it always knows
what to do ahead of time, which simplifies a *lot* of logic. This also
incidentally fixes a few random AstGen bugs related to result location
handling, leading to the changes in `test/` and `lib/std/`.
This opens the door to future RLS improvements by making them much
easier to implement correctly, and fixes many bugs. Most ZIR is made
more compact after this commit, mainly due to not having redundant
`store_to_block_ptr` instructions lying around, but also due to a few
bugs in the old system which are implicitly fixed here.
Some builtin types have a special InternPool index (e.g.
`.type_info_type`) so that AstGen can refer to them before semantic
analysis. Unfortunately, this previously led to a second index existing
to refer to the type once it was resolved, complicating Sema by having
the concept of an "unresolved" type index.
This change makes Sema modify these InternPool indices in-place to
contain the expanded representation when resolved. The analysis of the
corresponding decls is caught in `Module.semaDecl`, and a field is set
on Sema telling it which index to place struct/union/enum types at. This
system could break if `std.builtin` contained complex decls which
evaluate multiple struct types, but this will be caught by the
assertions in `InternPool.resolveBuiltinType`.
The AstGen result types which were disabled in 6917a8c have been
re-enabled.
Resolves: #16603
Well, this was a journey!
The original issue I was trying to fix is covered by the new behavior
test in array.zig: in essence, `ty` and `coerced_ty` result locations
were not correctly propagated.
While fixing this, I noticed a similar bug in struct inits: the type was
propagated to *fields* fine, but the actual struct init was
unnecessarily anonymous, which could lead to unnecessary copies. Note
that the behavior test added in struct.zig was already passing - the bug
here didn't change any easy-to-test behavior - but I figured I'd add it
anyway.
This is a little harder than it seems, because the result type may not
itself be an array/struct type: it could be an optional / error union
wrapper. A new ZIR instruction is introduced to unwrap these.
This is also made a little tricky by the fact that it's possible for
result types to be unknown at the time of semantic analysis (due to
`anytype` parameters), leading to generic poison. In these cases, we
must essentially downgrade to an anonymous initialization.
Fixing these issues exposed *another* bug, related to type resolution in
Sema. That issue is now tracked by #16603. As a temporary workaround for
this bug, a few result locations for builtin function operands have been
disabled in AstGen. This is technically a breaking change, but it's very
minor: I doubt it'll cause any breakage in the wild.
This commit does two things which seem unrelated at first, but,
together, solve a miscompilation, and potentially slightly speed up
compiler perf, at the expense of making #2765 trickier to implement in
the future.
Sema: avoid returning a false positive for whether an inferred error set
is comptime-known to be empty.
AstGen: mark function calls as not being interested in a result
location. This prevents the test case "ret_ptr doesn't cause own
inferred error set to be resolved" from being regressed. If we want to
accept and implement #2765 in the future, it will require solving this
problem a different way, but the principle of YAGNI tells us to go ahead
with this change.
Old ZIR looks like this:
%97 = ret_ptr()
%101 = store_node(%97, %100)
%102 = load(%97)
%103 = ret_is_non_err(%102)
New ZIR looks like this:
%97 = ret_type()
%101 = as_node(%97, %100)
%102 = ret_is_non_err(%101)
closes#15669
Fixes#16311
The actual cause of #16311 is the `start_is_zero` special case:
```zig
const range_len = if (end_val == .none or start_is_zero)
end_val
else
try parent_gz.addPlNode(.sub, input, Zir.Inst.Bin{
.lhs = end_val,
.rhs = start_val,
});
```
It only happens if the range start is 0. In that case we would not perform any type checking.
Only in the other cases coincidentally `.sub` performs type checking in Sema, but the errors are still rather poor:
```
$ zig test x.zig
x.zig:9:15: error: invalid operands to binary expression: 'Pointer' and 'Pointer'
for ("abc".."def") |val| {
~~~~~^~~~~~~
```
Note how it's the same as if I use `-`:
```
x.zig:9:11: error: invalid operands to binary expression: 'Pointer' and 'Pointer'
"abc" - "def";
~~~~~~^~~~~~~
```
Now after this PR, the errors are much clearer for both range start and end:
```
x.zig:9:10: error: expected type 'usize', found '*const [3:0]u8'
for ("abc".."def") |val| {
^~~~~
```
This is why I decided to use `.ty` instead of `.coerced_ty` for both range start and end rather than
just perform type checking in that `end_val == .none or start_is_zero` case.
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
This finishes the process of consolidating switch expressions in ZIR
into as simple and compact a representation as is possible. There are
now just two ZIR tags dedicated to switch expressions: switch_block and
switch_block_ref, with the latter being for an operand passed by
reference.
This is a follow-up to a previous commit which eliminated switch_capture
and switch_capture_ref. All captures are now handled directly by
`switch_block`, which has also eliminated some unnecessary Block data in
Sema.
These tags are unnecessary, as this information can be more efficiently
encoded within the switch_block instruction itself. We also use a neat
little trick to avoid needing a dummy instruction (like is used for
errdefer captures): since the switch_block itself cannot otherwise be
referenced within a prong, we can repurpose its index within prongs to
refer to the captured value.