The parse of `fn foo(a: switch (...) { ... })` was previously handled
incorrectly; `a` was treated as both the parameter name and a label.
The same issue exists for `for` and `while` expressions -- they should
be fixed too, and the grammar amended appropriately. This commit does
not do this: it only aims to avoid introducing regressions from labeled
switch syntax.
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.
This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
Grepping for `NO_THUMB` in glibc suggests that glibc does not actually support
pure Thumb-2 mode. This is the mode that is implied by these target triples;
mixed Arm/Thumb mode should just use the regular `arm*-linux-gnueabi*` triples.
Implements the accepted proposal to introduce `@branchHint`. This
builtin is permitted as the first statement of a block if that block is
the direct body of any of the following:
* a function (*not* a `test`)
* either branch of an `if`
* the RHS of a `catch` or `orelse`
* a `switch` prong
* an `or` or `and` expression
It lowers to the ZIR instruction `extended(branch_hint(...))`. When Sema
encounters this instruction, it sets `sema.branch_hint` appropriately,
and `zirCondBr` etc are expected to reset this value as necessary. The
state is on `Sema` rather than `Block` to make it automatically
propagate up non-conditional blocks without special handling. If
`@panic` is reached, the branch hint is set to `.cold` if none was
already set; similarly, error branches get a hint of `.unlikely` if no
hint is explicitly provided. If a condition is comptime-known, `cold`
hints from the taken branch are allowed to propagate up, but other hints
are discarded. This is because a `likely`/`unlikely` hint just indicates
the direction this branch is likely to go, which is redundant
information when the branch is known at comptime; but `cold` hints
indicate that control flow is unlikely to ever reach this branch,
meaning if the branch is always taken from its parent, then the parent
is also unlikely to ever be reached.
This branch information is stored in AIR `cond_br` and `switch_br`. In
addition, `try` and `try_ptr` instructions have variants `try_cold` and
`try_ptr_cold` which indicate that the error case is cold (rather than
just unlikely); this is reachable through e.g. `errdefer unreachable` or
`errdefer @panic("")`.
A new API `unwrapSwitch` is introduced to `Air` to make it more
convenient to access `switch_br` instructions. In time, I plan to update
all AIR instructions to be accessed via an `unwrap` method which returns
a convenient tagged union a la `InternPool.indexToKey`.
The LLVM backend lowers branch hints for conditional branches and
switches as follows:
* If any branch is marked `unpredictable`, the instruction is marked
`!unpredictable`.
* Any branch which is marked as `cold` gets a
`llvm.assume(i1 true) [ "cold"() ]` call to mark the code path cold.
* If any branch is marked `likely` or `unlikely`, branch weight metadata
is attached with `!prof`. Likely branches get a weight of 2000, and
unlikely branches a weight of 1. In `switch` statements, un-annotated
branches get a weight of 1000 as a "middle ground" hint, since there
could be likely *and* unlikely *and* un-annotated branches.
For functions, a `cold` hint corresponds to the `cold` function
attribute, and other hints are currently ignored -- as far as I can tell
LLVM doesn't really have a way to lower them. (Ideally, we would want
the branch hint given in the function to propagate to call sites.)
The compiler and standard library do not yet use this new builtin.
Resolves: #21148
* Indices of referenced captures
* Line and column of `@src()`
The second point aligns with a reversal of the "incremental compilation"
section of https://github.com/ziglang/zig/issues/2029#issuecomment-645793168.
This reversal was already done as #17688 (46a6d50), with the idea to
push incremental compilation down the line. My proposal is to keep it as
comptime-known, and simply re-analyze uses of `@src()` whenever their
line/column change.
I think this decision is reasonable for a few reasons:
* The Zig compiler is quite fast. Occasionally re-analyzing a few
functions containing `@src()` calls is perfectly acceptable and won't
noticably impact update times.
* The system described by Andrew in #2029 is currently vaporware.
* The system described by Andrew in #2029 is non-trivial to implement.
In particular, it requires some way to have backends update a single
global in certain cases, without re-doing semantic analysis. There is
no other part of incremental compilation which requires this.
* Having `@src().line` be comptime-known is useful. For instance, #17688
was justified by broken Tracy integration because the source line
couldn't be comptime-known.
* std.c.darwin: add missing CPUFAMILY fields
* std.zig.system.detectNativeCpuAndFeatures: add missing darwin fields
* add comment so the prong isnt lost and easily discoverable during next llvm upgrade
A compilation build step for which the binary is not required could not
be compiled previously. There were 2 issues that caused this:
- The compiler communicated only the results of the emitted binary and
did not properly communicate the result if the binary was not emitted.
This is fixed by communicating the final hash of the artifact path (the
hash of the corresponding /o/<hash> directory) and communicating this
instead of the entire path. This changes the zig build --listen protocol
to communicate hashes instead of paths, and emit_bin_path is accordingly
renamed to emit_digest.
- There was an error related to the default llvm object path when
CacheUse.Whole was selected. I'm not really sure why this didn't manifest
when the binary is also emitted.
This was fixed by improving the path handling related to flush() and
emitLlvmObject().
In general, this commit also improves some of the path handling throughout
the compiler and standard library.
This replaces the constant `Zir.Inst.Ref` tags (and the analagous tags
in `Air.Inst.Ref`, `InternPool.Index`) referring to types in
`std.builtin` with a ZIR instruction `extended(builtin_type(...))` which
instructs Sema to fetch such a type, effectively as if it were a
shorthand for the ZIR for `@import("std").builtin.xyz`.
Previously, this was achieved through constant tags in `Ref`. The
analagous `InternPool` indices began as `simple_type` values, and were
later rewritten to the correct type information. This system was kind of
brittle, and more importantly, isn't compatible with incremental
compilation of std, since incremental compilation relies on the ability
to recreate types at different indices when they change. Replacing the
old system with this instruction slightly increases the size of ZIR, but
it simplifies logic and allows incremental compilation to work correctly
on the standard library.
This shouldn't have a significant impact on ZIR size or compiler
performance, but I will take measurements in the PR to confirm this.
The old logic here had bitrotted, largely because there were some
incorrect `else` cases. This is now implemented correctly for all
current ZIR instructions. This prevents instructions being lost in
incremental updates, which is important for updates to be minimal.
* new .zig-cache subdirectory: 'v'
- stores coverage information with filename of hash of PCs that want
coverage. This hash is a hex encoding of the 64-bit coverage ID.
* build runner
* fixed bug in file system inputs when a compile step has an
overridden zig_lib_dir field set.
* set some std lib options optimized for the build runner
- no side channel mitigations
- no Transport Layer Security
- no crypto fork safety
* add a --port CLI arg for choosing the port the fuzzing web interface
listens on. it defaults to choosing a random open port.
* introduce a web server, and serve a basic single page application
- shares wasm code with autodocs
- assets are created live on request, for convenient development
experience. main.wasm is properly cached if nothing changes.
- sources.tar comes from file system inputs (introduced with the
`--watch` feature)
* receives coverage ID from test runner and sends it on a thread-safe
queue to the WebServer.
* test runner
- takes a zig cache directory argument now, for where to put coverage
information.
- sends coverage ID to parent process
* fuzzer
- puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log
- computes coverage_id and makes it available with
`fuzzer_coverage_id` exported function.
- the memory-mapped coverage file is now namespaced by the coverage id
in hex encoding, in `.zig-cache/v`
* tokenizer
- add a fuzz test to check that several properties are upheld
This target triple was weird on multiple levels:
* The `ilp32` ABI is the soft float ABI. This is not the main ABI we want to
support on RISC-V; rather, we want `ilp32d`.
* `gnuilp32` is a bespoke tag that was introduced in Zig. The rest of the world
just uses `gnu` for RISC-V target triples.
* `gnu_ilp32` is already the name of an ILP32 ABI used on AArch64. `gnuilp32` is
too easy to confuse with this.
* We don't use this convention for `riscv64-linux-gnu`.
* Supporting all RISC-V ABIs with this convention will result in combinatorial
explosion; see #20690.
I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.
Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.
Other changes included in this commit:
* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
runtime assertion in favor of using the type system.
After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.