After this change, the frontend and backend cooperate to keep track of
which Decls are actually emitted into the machine code. When any backend
sees a `decl_ref` Value, it must mark the corresponding Decl `alive`
field to true.
This prevents unused comptime data from spilling into the output object
files. For example, if you do an `inline for` loop, previously, any
intermediate value calculations would have gone into the object file.
Now they are garbage collected immediately after the owner Decl has its
machine code generated.
In the frontend, when it is time to send a Decl to the linker, if it has
not been marked "alive" then it is deleted instead.
Additional improvements:
* Resolve type ABI layouts after successful semantic analysis of a
Decl. This is needed so that the backend has access to struct fields.
* Sema: fix incorrect logic in resolveMaybeUndefVal. It should return
"not comptime known" instead of a compile error for global variables.
* `Value.pointerDeref` now returns `null` in the case that the pointer
deref cannot happen at compile-time. This is true for global
variables, for example. Another example is if a comptime known
pointer has a hard coded address value.
* Binary arithmetic sets the requireRuntimeBlock source location to the
lhs_src or rhs_src as appropriate instead of on the operator node.
* Fix LLVM codegen for slice_elem_val which had the wrong logic for
when the operand was not a pointer.
As noted in the comment in the implementation of deleteUnusedDecl, a
future improvement will be to rework the frontend/linker interface to
remove the frontend's responsibility of calling allocateDeclIndexes.
I discovered some issues with the plan9 linker backend that are related
to this, and worked around them for now.
* AIR no longer has a `variables` array. Instead of the `varptr`
instruction, Sema emits a constant with a `decl_ref`.
* AIR no longer has a `ref` instruction. There is no longer any
instruction that takes a value and returns a pointer to it. If this
is desired, Sema must either create an anynomous Decl and return a
constant `decl_ref`, or in the case of a runtime value, emit an
`alloc` instruction, `store` the value to it, and then return the
`alloc`.
* The `ref_val` Value Tag is eliminated. `decl_ref` should be used
instead. Also added is `eu_payload_ptr` which points to the payload
of an error union, given an error union pointer.
In general, Sema should avoid calling `analyzeRef` if it can be helped.
For example in the case of field_val and elem_val, there should never be
a reason to create a temporary (alloc or decl). Recent previous commits
made progress along that front.
There is a new abstraction in Sema, which looks like this:
var anon_decl = try block.startAnonDecl();
defer anon_decl.deinit();
// here 'anon_decl.arena()` may be used
const decl = try anon_decl.finish(ty, val);
// decl is typically now used with `decl_ref`.
This pattern is used to upgrade `ref_val` usages to `decl_ref` usages.
Additional improvements:
* Sema: fix source location resolution for calling convention
expression.
* Sema: properly report "unable to resolve comptime value" for loads of
global variables. There is now a set of functions which can be
called if the callee wants to obtain the Value even if the tag is
`variable` (indicating comptime-known address but runtime-known value).
* Sema: `coerce` resolves builtin types before checking equality.
* Sema: fix `u1_type` missing from `addType`, making this type have a
slightly more efficient representation in AIR.
* LLVM backend: fix `genTypedValue` for tags `decl_ref` and `variable`
to properly do an LLVMConstBitCast.
* Remove unused parameter from `Value.toEnum`.
After this commit, some test cases are no longer passing. This is due to
the more principled approach to comptime references causing more
anonymous decls to get sent to the linker for codegen. However, in all
these cases the decls are not actually referenced by the runtime machine
code. A future commit in this branch will implement garbage collection
of decls so that unused decls do not get sent to the linker for codegen.
This will make the tests go back to passing.
- generic "struct:L:C" naming if rloc is NodeTypeStructValueField
- generic "struct:L:C" naming if rloc is NodeTypeFnCallExpr
- move some tests from test/behavior/misc to test/behavior/typename
closes#4330closes#9339
While the SysV ABI is not that complicated, LLVM does not allow us
direct access to enforce it. By mimicking the IR generated by clang,
we can trick LLVM into doing the right thing. This involves two main
additions:
1. `AGG` ABI class
This is not part of the spec, but since we have to track class per
eightbyte and not per struct, the current enum is not enough. I
considered adding multiple classes like: `INTEGER_INTEGER`,
`INTEGER_SSE`, `SSE_INTEGER`. However, all of those cases would trigger
the same code path so it's simpler to collapse into one. This class is
only used on SysV.
2. LLVM C ABI type
Clang uses different types in C ABI function signatures than the
original structs passed in, and does conversion. For example, this
struct: `{ i8, i8, float }` would use `{ i16, float }` at ABI boundaries.
When passed as an argument, it is instead split into two arguments `i16`
and `float`. Therefore, for every struct that passes ABI boundaries we
need to keep track of its corresponding ABI type. Here are some more
examples:
```
| Struct | ABI equivalent |
| { i8, i8, i8, i8 } | i32 |
| { float, float } | double |
| { float, i32, i8 } | { float, i64 } |
```
Then, we must update function calls, returns, parameter lists and inits
to properly convert back and forth as needed.
@select(
comptime T: type,
pred: std.meta.Vector(len, bool),
a: std.meta.Vector(len, T),
b: std.meta.Vector(len, T)
) std.meta.Vector(len, T)
Constructs a vector from a & b, based on the values in the predicate vector. For indices where the predicate value is true, the corresponding
element from the a vector is selected, and otherwise from b.
* There is now a main_pkg in addition to root_pkg. They are usually the
same. When using `zig test`, main_pkg is the user's source file and
root_pkg has the test runner.
* scanDecl no longer looks for test decls outside the package being
tested. honoring `--test-filter` is still TODO.
* test runner main function has a void return value rather than
`anyerror!void`
* Sema is improved to generate better AIR for for loops on slices.
* Sema: fix incorrect capacity calculation in zirBoolBr
* Sema: add compile errors for trying to use slice fields as an lvalue.
* Sema: fix type coercion for error unions
* Sema: fix analyzeVarRef generating garbage AIR
* C codegen: fix renderValue for error unions with 0 bit payload
* C codegen: implement function pointer calls
* CLI: fix usage text
Adds 4 new AIR instructions:
* slice_len, slice_ptr: to get the ptr and len fields of a slice.
* slice_elem_val, ptr_slice_elem_val: to get the element value of
a slice, and a pointer to a slice.
AstGen gains a new functionality:
* One of the unused flags of struct decls is now used to indicate
structs that are known to have non-zero size based on the AST alone.
Some macros (for example any macro that uses token pasting) cannot be
directly translated to Zig, but may nevertheless still admit a Zig
implementation. This provides a mechanism for matching macros against
templates and mapping them to functions implemented in c_translation.zig.
A macro matches a template if it contains the same sequence of tokens, except
that the name and parameters may be renamed. No attempt is made to
semantically analyze the macro. For example the following two macros are
considered equivalent:
```C
```
But the following two are not:
```C
```
Use `@` syntax to escape `_` when used as an identifier.
Remove the stage1 astgen prohibition against assigning from `_`
Note: there a few stage1 bugs preventing `_` from being used as an identifier
for a local variable or function parameter; these will be fixed by stage2.
They are unlikely to arise in real C code since identifiers starting with
underscore are reserved for the implementation.
Amends b009aca38a861f74fd5378db19c65db286ad397e.
The PR predated the introduction of unused variable/constant checks,
thus the build checks weren't reporting this failure until later when
merged into master.
* less branching by passing parameters in the main op code switch.
* properly pass the target when asking the type system for int info.
* handle u8, i16, etc when it is represented using
int_unsigned/int_signed tag.
* compile error instead of assertion failure for unimplemented cases
(greater than 64 bits integer).
* control flow cleanups
* zig.h: expand macros into inline functions
* reduce the complexity of the test case by making it one test case
that calls multiple functions. Also fix the problem of c_int max
value mismatch between host and target.
* Inferred error sets are stored in the return Type of the function,
owned by the Module.Fn. So it cleans up that memory in deinit().
* Sema: update the inferred error set in zirRetErrValue
- Update relevant code in wrapErrorUnion
* C backend: improve some some instructions to take advantage of
liveness analysis to avoid being emitted when unused.
* C backend: when an error union has a payload type with no runtime
bits, emit the error union as the same type as the error set.
* implement enough of ret_err_value to pass wasm tests
* only do the proper `@panic` implementation for the backends which
support it, which is currently only the C backend. The other backends
will see `@breakpoint(); unreachable;` same as before.
- I plan to do AIR memory layout reworking as a prerequisite to
fixing other backends, because that will help me put all the
constants up front, which will allow the codegen to lower to memory
without jumps.
* `@panic` is implemented using anon decls for the message. Makes it
easier on the backends. Might want to look into re-using decls for
this in the future.
* implement DWARF .debug_info for pointer-like optionals.
* ZIR: add two instructions:
- ret_err_value_code
- ret_err_value
* AstGen: add countDefers and utilize it to emit more efficient ZIR for
return expressions in the presence of defers.
* AstGen: implement |err| payloads for `errdefer` syntax.
- There is not an "unused capture" error for it yet.
* AstGen: `return error.Foo` syntax gets a hot path in return
expressions, using the new ZIR instructions. This also is part of
implementing inferred error sets, since we need to tell Sema to add
an error value to the inferred error set before it gets coerced.
* Sema: implement `@setCold`.
- Implement `@setCold` support for C backend.
* `@panic` and regular safety panics such as `unreachable` now properly
invoke `std.builtin.panic`.
* C backend: improve pointer and function value rendering.
* C linker: fix redundant typedefs.
* Add Type.error_set_inferred.
* Fix Value.format for enum_literal, enum_field_index, bytes.
* Remove the C backend test that checks for identical text
I measured a 14% reduction in Total ZIR Bytes from master branch
for std/os.zig.
It now displays the byte with proper printability handling. This makes
the relevant compile error test case no longer a regression in quality
from stage1 to stage2.
* Implement "initializing array with struct syntax"
* Implement "'_' used as an identifier without @\"_\" syntax"
* Fix source location of "missing parameter name"
* Update test cases where appropriate
In order to not regress the quality of compile errors, some improvements
had to be made.
* std.zig.parseCharLiteral is improved to return more detailed parse
failure information.
* tokenizer is improved to handle null bytes in the middle of strings,
character literals, and line comments.
* validating how many unicode escape digits in string literals is moved
to std.zig.parseStringLiteral rather than handled in the tokenizer.
* when a tokenizer error occurs, if the reported token is the 'invalid'
tag, an error note is added to point to the invalid byte location.
Further improvements would be:
- Mention the expected set of allowed bytes at this location.
- Display the invalid byte (if printable, print it, otherwise
escape-print it).
The motivation for this commit is that there exists source files which
produce ast-check errors, but crash stage1 or otherwise trigger stage1
bugs. Previously to this commit, Zig would run AstGen, collect the
compile errors, run stage1, report stage1 compile errors and exit if
any, and then report AstGen compile errors.
The main change in this commit is to report AstGen errors prior to
invoking stage1, and in fact if any AstGen errors occur, do not invoke
stage1 at all.
This caused most of the compile error tests to fail due to things such
as unused local variables and mismatched stage1/stage2 error messages.
It was taking a long time to update the test cases one-by-one, so I
took this opportunity to unify the stage1 and stage2 testing harness,
specifically with regards to compile errors. In this way we can start
keeping track of which tests pass for 1, 2, or both.
`zig build test-compile-errors` no longer works; it is now integrated
into `zig build test-stage2`.
This is one step closer to executing compile error tests in parallel; in
fact the ThreadPool object is already in scope.
There are some cases where the stage1 compile errors were actually
better; those are left failing in this commit, to be addressed in a
follow-up commit.
Other changes in this commit:
* build.zig: improve support for -Dstage1 used with the test step.
* AstGen: minor cosmetic changes to error messages.
* stage2: add -fstage1 and -fno-stage1 flags. This now allows one to
download a binary of the zig compiler and use the llvm backend of
self-hosted. This was also needed for hooking up the test harness.
However, I realized that stage1 calls exit() and also has memory
leaks, so had to complicate the test harness by not using this flag
after all and instead invoking as a child process.
- These CLI flags will disappear once we start shipping the
self-hosted compiler as the main compiler. Until then, they can be
used to try out the work-in-progress stage2.
* stage2: select the LLVM backend by default for release modes, as long
as the target architecture is supported by LLVM.
* test harness: support setting the optimize mode
When a local variable has an initialization expression of type
'noreturn', emit a compile error. This brings this branch closer
to parity with master branch.
* AstGen: implement "unreachable code" error for blocks. This works at
the statement level.
* stage1: remove the "unreachable code" error implementation, which
means removing the `is_gen` field from IrInstSrc. This is one small
step towards a smaller memory footprint for stage1. The benefits
won't be realized until a future commit because this flag took
advantage of padding.
There may be a regression here with "union has no associated enum"
error, and there is a regression with the following code:
```zig
const a = noreturn;
```
A future commit will address these regressions.