This target triple was weird on multiple levels:
* The `ilp32` ABI is the soft float ABI. This is not the main ABI we want to
support on RISC-V; rather, we want `ilp32d`.
* `gnuilp32` is a bespoke tag that was introduced in Zig. The rest of the world
just uses `gnu` for RISC-V target triples.
* `gnu_ilp32` is already the name of an ILP32 ABI used on AArch64. `gnuilp32` is
too easy to confuse with this.
* We don't use this convention for `riscv64-linux-gnu`.
* Supporting all RISC-V ABIs with this convention will result in combinatorial
explosion; see #20690.
I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.
Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.
Other changes included in this commit:
* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
runtime assertion in favor of using the type system.
After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.
This is a misfeature that we inherited from LLVM:
* https://reviews.llvm.org/D61259
* https://reviews.llvm.org/D61939
(`aarch64_32` and `arm64_32` are equivalent.)
I truly have no idea why this triple passed review in LLVM. It is, to date, the
*only* tag in the architecture component that is not, in fact, an architecture.
In reality, it is just an ILP32 ABI for AArch64 (*not* AArch32).
The triples that use `aarch64_32` look like `aarch64_32-apple-watchos`. Yes,
that triple is exactly what you think; it has no ABI component. They really,
seriously did this.
Since only Apple could come up with silliness like this, it should come as no
surprise that no one else uses `aarch64_32`. Later on, a GNU ILP32 ABI for
AArch64 was developed, and support was added to LLVM:
* https://reviews.llvm.org/D94143
* https://reviews.llvm.org/D104931
Here, sanity seems to have prevailed, and a triple using this ABI looks like
`aarch64-linux-gnu_ilp32` as you would expect.
As can be seen from the diffs in this commit, there was plenty of confusion
throughout the Zig codebase about what exactly `aarch64_32` was. So let's just
remove it. In its place, we'll use `aarch64-watchos-ilp32`,
`aarch64-linux-gnuilp32`, and so on. We'll then translate these appropriately
when talking to LLVM. Hence, this commit adds the `ilp32` ABI tag (we already
have `gnuilp32`).
This was used for LoongArch64, where:
* `gnuf64` -> `ilp32d` / `lp64d` (full hard float)
* `gnuf32` -> `ilp32f` / `lp64f` (hard float for `f32` only)
* `gnusf` -> `ilp32` / `lp64` (soft float)
But Loongson eventually settled on just `gnu` for the first case since that's
what most people will actually be targeting outside embedded scenarios. The
`gnuf32` and `gnusf` specifiers remain in use.
Changes the `make` function signature to take an options struct, which
additionally includes `watch: bool`. I intentionally am not exposing
this information to configure phase logic.
Also adds global zig cache to the compiler cache prefixes.
Closes#20600
Updates the build runner to unconditionally require a zig lib directory
parameter. This parameter is needed in order to correctly understand
file system inputs from zig compiler subprocesses, since they will refer
to "the zig lib directory", and the build runner needs to place file
system watches on directories in there.
The build runner's fanotify file watching implementation now accounts
for when two or more Cache.Path instances compare unequal but ultimately
refer to the same directory in the file system.
Breaking change: std.Build no longer has a zig_lib_dir field. Instead,
there is the Graph zig_lib_directory field, and individual Compile steps
can still have their zig lib directories overridden. I think this is
unlikely to break anyone's build in practice.
The compiler now sends a "file_system_inputs" message to the build
runner which shares the full set of files that were added to the cache
system with the build system, so that the build runner can watch
properly and redo the Compile step. This is implemented for whole cache
mode but not yet for incremental cache mode.
Since we track `reify` instructions across incremental updates, it is
acceptable to treat it as the baseline for a relative source location.
This turns out to be a good idea, since it makes it easy to define the
source location for a reified type.
🦀 src_decl is gone 🦀
This commit eliminates the `src_decl` field from `Sema.Block`. This
change goes further to eliminating unnecessary responsibilities of
`Decl` in preparation for its major upcoming refactor.
The two main remaining reponsibilities had to do with namespace types:
`src_decl` was used to determine their line number and their name. The
former use case is solved by storing the line number alongside type
declarations (and reifications) in ZIR; this is actually more correct,
since previously the line number assigned to the type was really the
line number of the source declaration it was syntactically contained
within, which does not necessarily line up. Consequently, this change
makes debug info for namespace types more correct, although I am not
sure how debuggers actually utilize this line number, if at all. Naming
types was solved by a new field on `Block`, called `type_name_ctx`. In a
sense, it represents the "namespace" we are currently within, including
comptime function calls etc. We might want to revisit this in future,
since the type naming rules seem to be a bit hand-wavey right now.
As far as I can tell, there isn't any more preliminary work needed for
me to start work on the behemoth task of splitting `Zcu.Decl` into the
new `Nav` (Named Addressable Value) and `Cau` (Comptime Analysis Unit)
types. This will be a sweeping change, impacting essentially every part
of the pipeline after `AstGen`.
This is in preparation for some upcoming changes to how we represent
source locations in the compiler. The bulk of the change here is dealing
with the removal of `src()` methods from `Zir` types.
The justification for using relative source nodes in ZIR is that it
allows source locations -- which may be serialized across incremental
updates -- to be relative to the source location of their containing
declaration. However, having those "baseline" instructions themselves be
relative to their own parent is counterproductive, since the source
location updating problem is only being moved to `Decl`. Storing the
absolute node here instead makes more sense, since it allows for this
source location update logic to be elided entirely in the future by
storing a `TrackedInst.Index` to resolve a source location relative to
rather than a `Decl.Index`.
Deprecated aliases that are now compile errors:
- `std.fs.MAX_PATH_BYTES` (renamed to `std.fs.max_path_bytes`)
- `std.mem.tokenize` (split into `tokenizeAny`, `tokenizeSequence`, `tokenizeScalar`)
- `std.mem.split` (split into `splitSequence`, `splitAny`, `splitScalar`)
- `std.mem.splitBackwards` (split into `splitBackwardsSequence`, `splitBackwardsAny`, `splitBackwardsScalar`)
- `std.unicode`
+ `utf16leToUtf8Alloc`, `utf16leToUtf8AllocZ`, `utf16leToUtf8`, `fmtUtf16le` (all renamed to have capitalized `Le`)
+ `utf8ToUtf16LeWithNull` (renamed to `utf8ToUtf16LeAllocZ`)
- `std.zig.CrossTarget` (moved to `std.Target.Query`)
Deprecated `lib/std/std.zig` decls were deleted instead of made a `@compileError` because the `refAllDecls` in the test block would trigger the `@compileError`. The deleted top-level `std` namespaces are:
- `std.rand` (renamed to `std.Random`)
- `std.TailQueue` (renamed to `std.DoublyLinkedList`)
- `std.ChildProcess` (renamed/moved to `std.process.Child`)
This is not exhaustive. Deprecated aliases that I didn't touch:
+ `std.io.*`
+ `std.Build.*`
+ `std.builtin.Mode`
+ `std.zig.c_translation.CIntLiteralRadix`
+ anything in `src/`
The surrogate code points U+D800 to U+DFFF are valid code points but are not Unicode scalar values. This commit makes the error message more accurately reflect what is actually allowed in `\u` escape sequences.
From https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf:
> D71 High-surrogate code point: A Unicode code point in the range U+D800 to U+DBFF.
> D73 Low-surrogate code point: A Unicode code point in the range U+DC00 to U+DFFF.
>
> 3.9 Unicode Encoding Forms
> D76 Unicode scalar value: Any Unicode code point except high-surrogate and low-surrogate code points.
Related: #20270
These instructions are not emitted by AstGen. They also would have no
effect even if they did appear in ZIR: the Sema handling for these
instructions creates a Decl which the name strategy is applied to, and
proceeds to never use it. This pointless CPU heater is now gone, saving
2 ZIR tags in the process.
Had constrained the `aarch64_be` target, but not `aarch64`. This
constraint is necessary because earlier versions of glibc do not support
the aarch64 architecture.
Also, skip unsupported test cases.
This reverts commit a7de02e05216db9a04e438703ddf1b6b12f3fbef.
This did not implement the accepted proposal, and I did not sign off
on the changes. I would like a chance to review this, please.