Namespace types (`struct`, `enum`, `union`, `opaque`) do not use
structural equality - equivalence is based on their Decl index (and soon
will change to AST node + captures). However, we previously stored all
other information in the corresponding `InternPool.Key` anyway. For
logical consistency, it makes sense to have the key only be the true key
(that is, the Decl index) and to load all other data through another
function. This introduces those functions, by the name of
`loadStructType` etc. It's a big diff, but most of it is no-brainer
changes.
In future, it might be nice to eliminate a bunch of the loaded state in
favour of accessor functions on the `LoadedXyzType` types (like how we
have `LoadedUnionType.size()`), but that can be explored at a later
date.
This commit eliminates the `dbg_block_{begin,end}` instructions from
both ZIR and AIR. Instead, lexical scoping of `dbg_var_{ptr,val}`
instructions is decided based on the AIR block they exist within. This
is a much more robust system, and also results in a huge drop in ZIR
bytes - around 7% for Sema.zig.
This required some enhancements to Sema to prevent elision of blocks
when they are required for debug variable scoping. This can be observed
by looking at the AIR for the following simple test program with and
without `-fstrip`:
```zig
export fn f() void {
{
var a: u32 = 0;
_ = &a;
}
{
var a: u32 = 0;
_ = &a;
}
}
```
When `-fstrip` is passed, no AIR blocks are generated. When `-fno-strip`
is passed, the ZIR blocks are lowered to true AIR blocks to give correct
lexical scoping to the debug vars.
The changes here incidentally reolve #19060. A corresponding behavior
test has been added.
Resolves: #19060
This reverts commit 7161ed79c4abcaccdd56fe0b4fbd3d93472d41b8, reversing
changes made to 3f2a65594e1d3c0a4f4943a4ea522e8405db81e0.
Unfortunately, this sat in the PR queue too long and the merge broke the
zig1.wasm bootstrap process.
The motivating problem here was a memory leak in the hash maps of
Module.Namespace.
The commit deletes more of the legacy incremental compilation
implementation. It had things like use of orderedRemove and trying to do
too much OOP-style creation and deletion of objects.
Instead, this commit iterates over all the namespaces on Module deinit
and calls deinit on the hash map fields. This logic is much simpler to
reason about.
Similarly, change global inline assembly to an array hash map since
iterating over the values is a primary use of it, and clean up the
remaining values on Module deinit, solving another memory leak.
After this there are no more memory leaks remaining when using the
x86 backend in a libc-less compiler.
This was regressed in d657b6c0e2ab7c47f5416dc4df1abb2bfbecd4b6, when
the comptime memory model for unions was changed to allow them to have
no defined tag type.
Changes:
- Add `isMangledIdent` to determine if `fmtIdent` would make any edits to the identifier
- Any function that has a mangled identifier is referred to using the mangled identifer
within the current file, but if it is exported the first export will be with the non-mangled name.
- Add `zig_import` to import a symbol under a different name
- Add a level of indirection to float function names. Now, they are referred to as
`zig_float_fn_<float type>_<operation>`. The definitions in zig.h are wrapped
with `zig_import` to import the symbol under the real name.
The specific problem that sparked this change was the combination of
`zig_libc_name_f80(name) __##name##x` with the input `fma`, resulting
in `__fmax`, which is a new intrinsic in recent versions of cl.exe.
With the above changes in place, compiler_rt can output the following:
```
static zig_weak_linkage_fn zig_f80 zig_e___fmax(zig_f80, zig_f80, zig_f80);
zig_export(zig_weak_linkage_fn zig_f80 zig_e___fmax(zig_f80, zig_f80, zig_f80), __fmax, "__fmax");
```
Within compiler_rt, `zig_e___fmax` is used to refer to the function, but consumers
will import `__fmax`, which maps to their `zig_float_fn_f80_fma` definition from zig.h.
The main goal of this commit is to remove the `runtime_value` field from
`InternPool.Key` (and its associated representation), but there are a
few dominos. Specifically, this mostly eliminates the "maybe runtime"
concept from value resolution in Sema: so some resolution functions like
`resolveMaybeUndefValAllowVariablesMaybeRuntime` are gone. This required
a small change to struct/union/array initializers, to no longer
use `runtime_value` if a field was a `variable` - I'm not convinced this
case was even reachable, as `variable` should only ever exist as the
trivial value of a global runtime `var` decl.
Now, the only case in which a `Sema.resolveMaybeUndefVal`-esque function
can return the `variable` key is `resolveMaybeUndefValAllowVariables`,
which is directly called from `Sema.resolveInstValueAllowVariables`
(previously `Sema.resolveInstValue`), which is only used for resolving
the value of a Decl from `Module.semaDecl`.
While changing these functions, I also slightly reordered and
restructured some of them, and updated their doc comments.
Commit 5393e56500d499753dbc39704c0161b47d1e4d5c has a flaw pointed out
by @mlugg: the `ty` field of pointer values changes when comptime values
are pointer-casted. This commit introduces a new encoding which
additionally stores the "original pointer type" which is used to store
the alignment of the anonymous decl, and potentially other information
in the future such as section and pointer address space. However, this
new encoding is only used when the original pointer type differs from
the casted pointer type in a meaningful way.
I was able to make the LLVM backend and the C backend lower anonymous
decls with the appropriate alignment, however I will need some help
figuring out how to do this for the backends that lower anonymous decls
via src/codegen.zig and the wasm backend.
This hashmap value used to be assigned much later during `flushModule`,
which never happens if an error is returned by the backend, and
attempting to the undefined values would cause a crash.
Introduce the new mechanism needed to render anonymous decls to C code
that the frontend is now using.
The current strategy is to collect the set of used anonymous decls into
one ArrayHashMap for the entire compilation, and then render them during
flush().
In the future this may need to be adjusted for incremental compilation
purposes, so that removing a Decl from decl_table means that newly
unused anonymous decls are no longer rendered. However, let's do one
thing at a time. The only goal of this branch is to stop using
Module.Decl objects for unnamed constants.
Start keeping track of dependencies on anon decls for dependency
ordering during flush()
Currently this causes use of undefined symbols because these
dependencies need to get rendered into the output.
Instead of explicitly creating a `Module.Decl` object for each anonymous
declaration, each `InternPool.Index` value is implicitly understood to
be an anonymous declaration when encountered by backend codegen.
The memory management strategy for these anonymous decls then becomes to
garbage collect them along with standard InternPool garbage.
In the interest of a smooth transition, this commit only implements this
new scheme for string literals and leaves all the previous mechanisms in
place.
* add nested packed struct/union behavior tests
* use ptr_info.packed_offset rather than trying to duplicate the logic from Sema.structFieldPtrByIndex()
* use the container_ptr_info.packed_offset to account for non-aligned nested structs.
* dedup type.packedStructFieldBitOffset() and module.structPackedFieldBitOffset()
When the compiler's state lives through multiple Compilation.update()
calls, the C backend stores the rendered C source code for each
decl code body and forward declarations.
With this commit, the state is still stored, but it is managed in one
big array list in link/C.zig rather than many array lists, one for each
decl. This means simpler serialization and deserialization.
When struct types have no field names, the names are implicitly
understood to be strings corresponding to the field indexes in
declaration order. It used to be the case that a NullTerminatedString
would be stored for each field in this case, however, now, callers must
handle the possibility that there are no names stored at all. This
commit introduces `legacyStructFieldName`, a function to fake the
previous behavior. Probably something better could be done by reworking
all the callsites of this function.
Structs were previously using `SegmentedList` to be given indexes, but
were not actually backed by the InternPool arrays.
After this, the only remaining uses of `SegmentedList` in the compiler
are `Module.Decl` and `Module.Namespace`. Once those last two are
migrated to become backed by InternPool arrays as well, we can introduce
state serialization via writing these arrays to disk all at once.
Unfortunately there are a lot of source code locations that touch the
struct type API, so this commit is still work-in-progress. Once I get it
compiling and passing the test suite, I can provide some interesting
data points such as how it affected the InternPool memory size and
performance comparison against master branch.
I also couldn't resist migrating over a bunch of alignment API over to
use the log2 Alignment type rather than a mismash of u32 and u64 byte
units with 0 meaning something implicitly different and special at every
location. Turns out you can do all the math you need directly on the
log2 representation of alignments.
Instead of using actual slices for InternPool.Key.AnonStructType, this
commit changes to use Slice types instead, which store a
long-lived index rather than a pointer.
This is a follow-up to 7ef1eb1c27754cb0349fdc10db1f02ff2dddd99b.