This commit reworks how anonymous struct literals and tuples work.
Previously, an untyped anonymous struct literal
(e.g. `const x = .{ .a = 123 }`) was given an "anonymous struct type",
which is a special kind of struct which coerces using structural
equivalence. This mechanism was a holdover from before we used
RLS / result types as the primary mechanism of type inference. This
commit changes the language so that the type assigned here is a "normal"
struct type. It uses a form of equivalence based on the AST node and the
type's structure, much like a reified (`@Type`) type.
Additionally, tuples have been simplified. The distinction between
"simple" and "complex" tuple types is eliminated. All tuples, even those
explicitly declared using `struct { ... }` syntax, use structural
equivalence, and do not undergo staged type resolution. Tuples are very
restricted: they cannot have non-`auto` layouts, cannot have aligned
fields, and cannot have default values with the exception of `comptime`
fields. Tuples currently do not have optimized layout, but this can be
changed in the future.
This change simplifies the language, and fixes some problematic
coercions through pointers which led to unintuitive behavior.
Resolves: #16865
these tasks have some shared data dependencies so they cannot be done
simultaneously. Future work should untangle these data dependencies so
that more can be done in parallel.
for now this commit ensures correctness by making linker input parsing
and codegen tasks part of the same queue.
If the "is darwin" check is moved below the libc_installation check
below, error.LibCInstallationMissingCrtDir is returned from
lci.resolveCrtPaths().
This should be revisited because it makes sense to check
libc_installation first even on darwin.
Anyway for now this more closely matches logic from master branch.
* Compilation.objects changes to Compilation.link_inputs which stores
objects, archives, windows resources, shared objects, and strings
intended to be put directly into the dynamic section. Order is now
preserved between all of these kinds of linker inputs. If it is
determined the order does not matter for a particular kind of linker
input, that item should be moved to a different array.
* rename system_libs to windows_libs
* untangle library lookup from CLI types
* when doing library lookup, instead of using access syscalls, go ahead
and open the files and keep the handles around for passing to the
cache system and the linker.
* during library lookup and cache file hashing, use positioned reads to
avoid affecting the file seek position.
* library directories are opened in the CLI and converted to Directory
objects, warnings emitted for those that cannot be opened.
along with the relevant logic, making the libraries within subject to
the same search criteria as all the other libraries.
this unfortunately means doing file system access on all .so files when
targeting ELF to determine if they are linker scripts, however, I have a
plan to address this.
The re-analysis here is a little coarse; it'd be nice in the future to
have a way for an AstGen failure to preserve *all* analysis which
depends on the last success, and just hide the compile errors which
depend on it somehow. But I'm not sure how we'd achieve that, so this
works fine for now.
Resolves: #21223
This not only simplifies the error bundling logic, but also improves
efficiency by allowing the result to be cached between, for instance,
multiple calls to `totalErrorCount`.
Make shared_objects a StringArrayHashMap so that deduping does not
need to happen in flush. That deduping code also was using an O(N^2)
algorithm, which is not allowed in this codebase. There is another
violation of this rule in resolveSymbols but this commit does not
address it.
This required reworking shared object parsing, breaking it into
independent components so that we could access soname earlier.
Shared object parsing had a few problems that I noticed and fixed in
this commit:
* Many instances of incorrect use of align(1).
* `shnum * @sizeOf(elf.Elf64_Shdr)` can overflow based on user data.
* `@divExact` can cause illegal behavior based on user data.
* Strange versyms logic that wasn't present in mold nor lld. The logic
was not commented and there is no git blame information in ziglang/zig
nor kubkon/zld. I changed it to match mold and lld instead.
* Use of ArrayList for slices of memory that are never resized.
* finding DT_VERDEFNUM in a different loop than finding DT_SONAME.
Ultimately I think we should follow mold's lead and ignore this
integer, relying on null termination instead.
* Doing logic based on VER_FLG_BASE rather than ignoring it like mold
and LLD do. No comment explaining why the behavior is different.
* Mutating the original ELF symbols rather than only storing the mangled
name on the new Symbol struct.
I noticed something that I didn't try to address in this commit: Symbol
stores a lot of redundant information that is already present in the ELF
symbols. I suspect that the codebase could benefit from reworking Symbol
to not store redundant information.
Additionally:
* Add some type safety to std.elf.
* Eliminate 1-3 file system reads for determining the kind of input
files, by taking advantage of file name extension and handling error
codes properly.
* Move more error handling methods to link.Diags and make them
infallible and thread-safe
* Make the data dependencies obvious in the parameters of
parseSharedObject. It's now clear that the first two steps (Header and
Parsed) can be done during the main Compilation pipeline, rather than
waiting for flush().
By organizing linker diagnostics into this struct, it becomes possible
to share more code between linker backends, and more importantly it
becomes possible to pass only the Diag struct to some functions, rather
than passing the entire linker state object in. This makes data
dependencies more obvious, making it easier to rearrange code and to
multithread.
Also fix MachO code abusing an atomic variable. Not only was it using
the wrong atomic operation, it is unnecessary additional state since
the state is already being protected by a mutex.
When errors occurred during flush(), incremental cache mode was still
writing a successful cache manifest, making subsequent compilations fail
because they would get a cache hit only to find invalid data.
Embrace the Path abstraction, doing more operations based on directory
handles rather than absolute file paths. Most of the diff noise here
comes from this one.
Fix sorting of crtbegin/crtend atoms. Previously it would look at all
path components for those strings.
Make the C runtime path detection partially a pure function, and move
some logic to glibc.zig where it belongs.
flush() must not do anything more than necessary. Determining the type
of input files must be done only once, before flush. Fortunately, we
don't even need any file system accesses to do this since that
information is statically known in most cases, and in the rest of the
cases can be determined by file extension alone.
This commit also updates the nearby code to conform to the convention
for error handling where there is exactly one error code to represent
the fact that error messages have already been emitted. This had the
side effect of improving the error message for a linker script parse
error.
"positionals" is not a linker concept; it is a command line interface
concept. Zig's linker implementation should not mention "positionals".
This commit deletes that array list in favor of directly making function
calls, eliminating that heap allocation during flush().
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.
This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
This type is exactly the same as std.Build.Cache.Path, except for
one function which is not used anymore. Therefore we can replace
it without consequences.
A compilation build step for which the binary is not required could not
be compiled previously. There were 2 issues that caused this:
- The compiler communicated only the results of the emitted binary and
did not properly communicate the result if the binary was not emitted.
This is fixed by communicating the final hash of the artifact path (the
hash of the corresponding /o/<hash> directory) and communicating this
instead of the entire path. This changes the zig build --listen protocol
to communicate hashes instead of paths, and emit_bin_path is accordingly
renamed to emit_digest.
- There was an error related to the default llvm object path when
CacheUse.Whole was selected. I'm not really sure why this didn't manifest
when the binary is also emitted.
This was fixed by improving the path handling related to flush() and
emitLlvmObject().
In general, this commit also improves some of the path handling throughout
the compiler and standard library.
This function now has to allocate anyway to resolve references, so we
may as well just build the error bundle and check its length.
Also remove some unnecessary calls of this function for efficiency.
Another big commit, sorry! This commit makes all fixes necessary for
incremental updates of the compiler itself (specifically, adding a
breakpoint to `zirCompileLog`) to succeed, at least on the frontend.
The biggest change here is a reform to how types are handled. It works
like this:
* When a type is first created in `zirStructDecl` etc, its namespace is
scanned. If the type requires resolution, an `interned` dependency is
declared for the containing `AnalUnit`.
* `zirThis` also declared an `interned` dependency for its `AnalUnit` on
the namespace's owner type.
* If the type's namespace changes, the surrounding source declaration
changes hash, so `zirStructDecl` etc will be hit again. We check
whether the namespace has been scanned this generation, and re-scan it
if not.
* Namespace lookups also check whether the namespace in question
requires a re-scan based on the generation. This is because there's no
guarantee that the `zirStructDecl` is re-analyzed before the namespace
lookup is re-analyzed.
* If a type's structure (essentially its fields) change, then the type's
`Cau` is considered outdated. When the type is re-analyzed due to
being outdated, or the `zirStructDecl` is re-analyzed by being
transitively outdated, or a corresponding `zirThis` is re-analyzed by
being transitively outdated, the struct type is recreated at a new
`InternPool` index. The namespace's owner is updated (but not
re-scanned, since that is handled by the mechanisms above), and the
old type, while remaining a valid `Index`, is removed from the map
metadata so it will never be found by lookups. `zirStructDecl` and
`zirThis` store an `interned` dependency on the *new* type.