16 Commits

Author SHA1 Message Date
David Rubin
1c1feba08e
remove mod aliases for Zcus 2024-08-25 15:17:40 -07:00
David Rubin
b4bb64ce78
sema: rework type resolution to use Zcu when possible 2024-08-25 15:16:42 -07:00
mlugg
548a087faf
compiler: split Decl into Nav and Cau
The type `Zcu.Decl` in the compiler is problematic: over time it has
gained many responsibilities. Every source declaration, container type,
generic instantiation, and `@extern` has a `Decl`. The functions of
these `Decl`s are in some cases entirely disjoint.

After careful analysis, I determined that the two main responsibilities
of `Decl` are as follows:
* A `Decl` acts as the "subject" of semantic analysis at comptime. A
  single unit of analysis is either a runtime function body, or a
  `Decl`. It registers incremental dependencies, tracks analysis errors,
  etc.
* A `Decl` acts as a "global variable": a pointer to it is consistent,
  and it may be lowered to a specific symbol by the codegen backend.

This commit eliminates `Decl` and introduces new types to model these
responsibilities: `Cau` (Comptime Analysis Unit) and `Nav` (Named
Addressable Value).

Every source declaration, and every container type requiring resolution
(so *not* including `opaque`), has a `Cau`. For a source declaration,
this `Cau` performs the resolution of its value. (When #131 is
implemented, it is unsolved whether type and value resolution will share
a `Cau` or have two distinct `Cau`s.) For a type, this `Cau` is the
context in which type resolution occurs.

Every non-`comptime` source declaration, every generic instantiation,
and every distinct `extern` has a `Nav`. These are sent to codegen/link:
the backends by definition do not care about `Cau`s.

This commit has some minor technically-breaking changes surrounding
`usingnamespace`. I don't think they'll impact anyone, since the changes
are fixes around semantics which were previously inconsistent (the
behavior changed depending on hashmap iteration order!).

Aside from that, this changeset has no significant user-facing changes.
Instead, it is an internal refactor which makes it easier to correctly
model the responsibilities of different objects, particularly regarding
incremental compilation. The performance impact should be negligible,
but I will take measurements before merging this work into `master`.

Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com>
Co-authored-by: Jakub Konka <kubkon@jakubkonka.com>
2024-08-11 07:29:41 +01:00
mlugg
f84a4953d2
Value: eliminate static recursion loop from value printing 2024-07-16 11:38:21 +01:00
Jacob Young
667b4f9054 Zcu: cache fully qualified name on Decl
This avoids needing to mutate the intern pool from backends.
2024-07-10 11:10:49 -04:00
Jacob Young
525f341f33 Zcu: introduce PerThread and pass to all the functions 2024-07-07 22:59:52 -04:00
mlugg
0e5335aaf5
compiler: rework type resolution, fully resolve all types
I'm so sorry.

This commit was just meant to be making all types fully resolve by
queueing resolution at the moment of their creation. Unfortunately, a
lot of dominoes ended up falling. Here's what happened:

* I added a work queue job to fully resolve a type.
* I realised that from here we could eliminate `Sema.types_to_resolve`
  if we made function codegen a separate job. This is desirable for
  simplicity of both spec and implementation.
* This led to a new AIR traversal to detect whether any required type is
  unresolved. If a type in the AIR failed to resolve, then we can't run
  codegen.
* Because full type resolution now occurs by the work queue job, a bug
  was exposed whereby error messages for type resolution were associated
  with the wrong `Decl`, resulting in duplicate error messages when the
  type was also resolved "by" its owner `Decl` (which really *all*
  resolution should be done on).
* A correct fix for this requires using a different `Sema` when
  performing type resolution: we need a `Sema` owned by the type. Also
  note that this fix is necessary for incremental compilation.
* This means a whole bunch of functions no longer need to take `Sema`s.
  * First-order effects: `resolveTypeFields`, `resolveTypeLayout`, etc
  * Second-order effects: `Type.abiAlignmentAdvanced`, `Value.orderAgainstZeroAdvanced`, etc

The end result of this is, in short, a more correct compiler and a
simpler language specification. This regressed a few error notes in the
test cases, but nothing that seems worth blocking this change.

Oh, also, I ripped out the old code in `test/src/Cases.zig` which
introduced a dependency on `Compilation`. This dependency was
problematic at best, and this code has been unused for a while. When we
re-enable incremental test cases, we must rewrite their executor to use
the compiler server protocol.
2024-07-04 21:01:42 +01:00
mlugg
2f0f1efa6f
compiler: type.zig -> Type.zig 2024-07-04 21:01:42 +01:00
Andrew Kelley
0fcd59eada rename src/Module.zig to src/Zcu.zig
This patch is a pure rename plus only changing the file path in
`@import` sites, so it is expected to not create version control
conflicts, even when rebasing.
2024-06-22 22:59:56 -04:00
mlugg
1eaeb4a0a8
Zcu: rework source locations
`LazySrcLoc` now stores a reference to the "base AST node" to which it
is relative. The previous tagged union is `LazySrcLoc.Offset`. To make
working with this structure convenient, `Sema.Block` contains a
convenience `src` method which takes an `Offset` and returns a
`LazySrcLoc`.

The "base node" of a source location is no longer given by a `Decl`, but
rather a `TrackedInst` representing either a `declaration`,
`struct_decl`, `union_decl`, `enum_decl`, or `opaque_decl`. This is a
more appropriate model, and removes an unnecessary responsibility from
`Decl` in preparation for the upcoming refactor which will split it into
`Nav` and `Cau`.

As a part of these `Decl` reworks, the `src_node` field is eliminated.
This change aids incremental compilation, and simplifies `Decl`. In some
cases -- particularly in backends -- the source location of a
declaration is desired. This was previously `Decl.srcLoc` and worked for
any `Decl`. Now, it is `Decl.navSrcLoc` in reference to the upcoming
refactor, since the set of `Decl`s this works for precisely corresponds
to what will in future become a `Nav` -- that is, source-level
declarations and generic function instantiations, but *not* type owner
Decls.

This commit introduces more tags to `LazySrcLoc.Offset` so as to
eliminate the concept of `error.NeededSourceLocation`. Now, `.unneeded`
should only be used to assert that an error path is unreachable. In the
future, uses of `.unneeded` can probably be replaced with `undefined`.

The `src_decl` field of `Sema.Block` no longer has a role in type
resolution. Its main remaining purpose is to handle namespacing of type
names. It will be eliminated entirely in a future commit to remove
another undue responsibility from `Decl`.

It is worth noting that in future, the `Zcu.SrcLoc` type should probably
be eliminated entirely in favour of storing `Zcu.LazySrcLoc` values.
This is because `Zcu.SrcLoc` is not valid across incremental updates,
and we want to be able to reuse error messages from previous updates
even if the source file in question changed. The error reporting logic
should instead simply resolve the location from the `LazySrcLoc` on the
fly.
2024-06-15 00:57:52 +01:00
mlugg
21a6a1b0f2 Sema: cap depth of value printing in type names
Certain types (notably, `std.ComptimeStringMap`) were resulting in excessively
long type names when instantiated, which in turn resulted in excessively long
symbol names. These are problematic for two reasons:

* Symbol names are sometimes read by humans -- they ought to be readable.
* Some other applications (looking at you, xcode) trip on very long symbol names.

To work around this for now, we cap the depth of value printing at 1, as opposed
to the normal 3. This doesn't guarantee anything -- there could still be, for
instance, an incredibly long aggregate -- but it works around the issue in
practice for the time being.
2024-04-17 22:47:54 -07:00
mlugg
d0e74ffe52
compiler: rework comptime pointer representation and access
We've got a big one here! This commit reworks how we represent pointers
in the InternPool, and rewrites the logic for loading and storing from
them at comptime.

Firstly, the pointer representation. Previously, pointers were
represented in a highly structured manner: pointers to fields, array
elements, etc, were explicitly represented. This works well for simple
cases, but is quite difficult to handle in the cases of unusual
reinterpretations, pointer casts, offsets, etc. Therefore, pointers are
now represented in a more "flat" manner. For types without well-defined
layouts -- such as comptime-only types, automatic-layout aggregates, and
so on -- we still use this "hierarchical" structure. However, for types
with well-defined layouts, we use a byte offset associated with the
pointer. This allows the comptime pointer access logic to deal with
reinterpreted pointers far more gracefully, because the "base address"
of a pointer -- for instance a `field` -- is a single value which
pointer accesses cannot exceed since the parent has undefined layout.
This strategy is also more useful to most backends -- see the updated
logic in `codegen.zig` and `codegen/llvm.zig`. For backends which do
prefer a chain of field and elements accesses for lowering pointer
values, such as SPIR-V, there is a helpful function in `Value` which
creates a strategy to derive a pointer value using ideally only field
and element accesses. This is actually more correct than the previous
logic, since it correctly handles pointer casts which, after the dust
has settled, end up referring exactly to an aggregate field or array
element.

In terms of the pointer access code, it has been rewritten from the
ground up. The old logic had become rather a mess of special cases being
added whenever bugs were hit, and was still riddled with bugs. The new
logic was written to handle the "difficult" cases correctly, the most
notable of which is restructuring of a comptime-only array (for
instance, converting a `[3][2]comptime_int` to a `[2][3]comptime_int`.
Currently, the logic for loading and storing work somewhat differently,
but a future change will likely improve the loading logic to bring it
more in line with the store strategy. As far as I can tell, the rewrite
has fixed all bugs exposed by #19414.

As a part of this, the comptime bitcast logic has also been rewritten.
Previously, bitcasts simply worked by serializing the entire value into
an in-memory buffer, then deserializing it. This strategy has two key
weaknesses: pointers, and undefined values. Representations of these
values at comptime cannot be easily serialized/deserialized whilst
preserving data, which means many bitcasts would become runtime-known if
pointers were involved, or would turn `undefined` values into `0xAA`.
The new logic works by "flattening" the datastructure to be cast into a
sequence of bit-packed atomic values, and then "unflattening" it; using
serialization when necessary, but with special handling for `undefined`
values and for pointers which align in virtual memory. The resulting
code is definitely slower -- more on this later -- but it is correct.

The pointer access and bitcast logic required some helper functions and
types which are not generally useful elsewhere, so I opted to split them
into separate files `Sema/comptime_ptr_access.zig` and
`Sema/bitcast.zig`, with simple re-exports in `Sema.zig` for their small
public APIs.

Whilst working on this branch, I caught various unrelated bugs with
transitive Sema errors, and with the handling of `undefined` values.
These bugs have been fixed, and corresponding behavior test added.

In terms of performance, I do anticipate that this commit will regress
performance somewhat, because the new pointer access and bitcast logic
is necessarily more complex. I have not yet taken performance
measurements, but will do shortly, and post the results in this PR. If
the performance regression is severe, I will do work to to optimize the
new logic before merge.

Resolves: #19452
Resolves: #19460
2024-04-17 13:41:25 +01:00
Jacob Young
7611d90ba0 InternPool: remove slice from byte aggregate keys
This deletes a ton of lookups and avoids many UAF bugs.

Closes #19485
2024-04-08 13:24:08 -04:00
Jacob Young
5a41704f7e cbe: rewrite CType
Closes #14904
2024-03-30 20:50:48 -04:00
mlugg
951fc09a7e
print_value: improve value printing
Notably, this improves string printing from
`@as(*[5:0]u8, &@as([5:0]u8, "hello".*))` to `@as(*[5:0]u8, "hello")`,
omitting the pointless ref-deref pair.
2024-03-26 17:06:14 +00:00
mlugg
2a245e3b78
compiler: eliminate TypedValue
The only logic which remained in this file was the Value printing logic.
This has been moved into a new `print_value.zig`.
2024-03-26 13:48:07 +00:00