* zir.Code: introduce a decls array. This is so that `decl_val` and
`decl_ref` instructions can refer to a Decl with a u32 and therefore
they can also store a source location. This is needed for proper
compile error reporting.
* astgen uses a hash map to avoid redundantly adding a Decl to the
decls array.
* fixed reporting "instruction illegal outside function body" instead
of the desired message "unable to resolve comptime value".
* astgen skips emitting dbg_stmt instructions in comptime scopes.
* astgen has some logic to avoid adding unnecessary type coercion
instructions for common values.
Introduce "inline" variants of ZIR tags:
* block => block_inline
* repeat => repeat_inline
* break => break_inline
* condbr => condbr_inline
The inline variants perform control flow at compile-time, and they
utilize the return value of `Sema.analyzeBody`.
`analyzeBody` now returns an Index, not a Ref, which is the ZIR index of
a break instruction. This effectively communicates both the intended
break target block as well as the operand, allowing parent blocks to
find out whether they, in turn, should return the break instruction up the
call stack, or accept the operand as the block's result and continue
analyzing instructions in the block.
Additionally:
* removed the deprecated ZIR tag `block_comptime`.
* removed `break_void_node` so that all break instructions use the same Data.
* zir.Code: remove the `root_start` and `root_len` fields. There is now
implied to be a block at index 0 for the root body. This is so that
`break_inline` has something to point at and we no longer need the
special instruction `break_flat`.
* implement source location byteOffset() for .node_offset_if_cond
.node_offset_for_cond is probably redundant and can be deleted.
We don't have `comptime var` supported yet, so this commit adds a test
that at least makes sure the condition is required to be comptime known
for `inline while`.
* Module.addBreak and addBreakVoid return zir.Inst.Index not Ref
because Index is the simpler type and we never need a Ref for these.
* astgen: make noreturn stuff return the unreachable_value and avoid
unnecessary calls to rvalue()
* breakExpr: avoid unnecessary access into the tokens array
* breakExpr: fix incorrect `@intCast` (previously this unsafely
casted an Index to a Ref)
* Introduce helper functions on Module.WipZirCode and zir.Code
* Move some logic around
* re-introduce ref_start_index
* prefer usize for local variables + `@intCast` at the end.
Empirically this is easier to optimize.
* Avoid using mem.{bytesAsSlice,sliceAsBytes} because it incurs an
unnecessary multiplication/division which may cause problems for the
optimizer.
* Use a regular enum, not packed, for `Ref`. Memory layout is
guaranteed for enums which specify their tag type. Packed enums have
ABI alignment of 1 byte which is too small.
This provides us greatly increased type safety and prevents the common
mistake of using a zir.Inst.Ref where a zir.Inst.Index was expected or
vice-versa. It also increases the ergonomics of using the typed values
which can be directly referenced with a Ref over the previous zir.Const
approach.
The main pain point is casting between a []Ref and []u32, which could be
alleviated in the future with a new std.mem function.
* comment out the failing stage2 test cases
(so that we can uncomment the ones that are newly passing with
further commits)
* Sema: implement negate, negatewrap
* astgen: implement field access, multiline string literals, and
character literals
* Module: when resolving an AST node into a byte offset, use the
main_tokens array, not the firstToken function
* add `Module.setBlockBody` and related functions
* redo astgen for `and` and `or` to use fewer ZIR instructions and
require less processing for comptime known values
* Sema: rework `analyzeBody` function. See the new doc comments in this
commit. Divides ZIR instructions up into 3 categories:
- always noreturn
- never noreturn
- sometimes noreturn
The current plan is to avoid using async and related features in the
stage2 compiler so that we can bootstrap before implementing them.
Having this untested and incomplete code in the codebase increases
friction while working on stage2, in particular when preforming
larger refactors such as the current zir memory layout rework.
Therefore remove all async related code, leaving only error messages
in astgen.
These were previously implemented as a sub/sub_wrap instruction with a
lhs of 0. Making this separate instructions however allows us to save
some memory as there is no need to store a lhs.
* free Module.Fn ZIR code when destroying the owner Decl
* unreachable_safe and unreachable_unsafe are collapsed into one ZIR
instruction with a safety flag.
* astgen: emit an unreachable instruction for unreachable literals
* don't forget to call deinit on ZIR code
* astgen: implement some builtin functions
We are now passing this test:
```zig
export fn _start() noreturn {}
```
```
test.zig:1:30: error: expected noreturn, found void
```
I ran into an issue where we get an integer overflow trying to compute
node index offsets from the containing Decl. The problem is that the
parser adds the Decl node after adding the child nodes. For some things,
it is easy to reserve the node index and then set it later, however, for
this case, it is not a trivial code change, because depending on tokens
after parsing the decl determines whether we want to add a new node or
not.
Possible strategies here:
1. Rework the parser code to make sure that Decl nodes are before
children nodes in the AST node array.
2. Use signed integers for Decl node offsets.
3. Just flip the order of subtraction and addition. Expect Decl Node
index to be greater than children Node indexes.
I opted for (3) because it seems like the simplest thing to do. We'll
want to unify the logic for computing the offsets though because if the
logic gets repeated, it will probably get repeated wrong.
There are some `@panic("TODO")` in there but I'm trying to get the
branch to the point where collaborators can jump in.
Next is to repair the seam between LazySrcLoc and codegen's expected
absolute file offsets.
Next up is reworking the seam between the LazySrcLoc emitted by Sema
and the byte offsets currently expected by codegen.
And then the big one: updating astgen.zig to use the new memory layout.
The memory layout for ZIR instructions is completely reworked. See
zir.zig for those changes. Some new types:
* `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating
each instruction independently, there is now a Tag and 8 bytes of
data available for all ZIR instructions. Small instructions fit
within these 8 bytes; larger ones use 4 bytes for an index into
`extra`. There is also `string_bytes` so that we can have 4 byte
references to strings. `zir.Inst.Tag` describes how to interpret
those 8 bytes of data.
- This is shared by all `Block` scopes.
* `Module.WipZirCode`: represents an in-progress `zir.Code`. In this
structure, the arrays are mutable, and get resized as we add/delete
things. There is extra state to keep track of things. This struct is
stored on the stack. Once it is finished, it produces an immutable
`zir.Code`, which will remain on the heap for the duration of a
function's existence.
- This is shared by all `GenZir` scopes.
* `Sema`: represents in-progress semantic analysis of a `zir.Code`.
This data is stored on the stack and is shared among all `Block`
scopes. It is now the main "self" argument to everything in the file
that was previously named `zir_sema.zig`.
Additionally, I moved some logic that was in `Module` into here.
`Module.Fn` now stores its parameter names inside the `zir.Code`,
instead of inside ZIR instructions. When the TZIR memory layout
reworking time comes, codegen will be able to reference this data
directly instead of duplicating it.
astgen.zig is (so far) almost entirely untouched, but nearly all of it
will need to be reworked to adhere to this new memory layout structure.
I have no benchmarks to report yet, as I am still working through
compile errors and fixing various things that I broke in this branch.
Overhaul of Source Locations:
Previously we used `usize` everywhere to mean byte offset, but sometimes
also mean other stuff. This was error prone and also made us do
unnecessary work, and store unnecessary bytes in memory.
Now there are more types involved into source locations, and more ways
to describe a source location.
* AllErrors.Message: embrace the assumption that files always have less
than 2 << 32 bytes.
* SrcLoc gets more complicated, to model more complicated source
locations.
* Introduce LazySrcLoc, which can model interesting source locations
with very little stored state. Useful for avoiding doing unnecessary
work when no compile errors occur.
Also, previously, we had `src: usize` on every ZIR instruction. This is
no longer the case. Each instruction now determines whether it even cares
about source location, and if so, how that source location is stored.
This requires more careful work inside `Sema`, but it results in fewer
bytes stored on the heap, without compromising accuracy and power of
compile error messages.
Miscellaneous:
* std.zig: string literals have more helpful result values for
reporting errors. There is now a lower level API and a higher level
API.
- side note: I noticed that the string literal logic needs some love.
There is some unnecessarily hacky code there.
* cut & pasted some TZIR logic that was in zir.zig to ir.zig. This
probably broke stuff and needs to get fixed.
* Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't
think this quite how this code will be organized. Need some more
careful planning about how to implement structs, unions, enums. They
need to be independent Decls, just like a top level function.