I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.
Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.
Other changes included in this commit:
* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
runtime assertion in favor of using the type system.
After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.
* Elaborate on the sub-variants of Variant I.
* Clarify the use of the TCB term.
* Rename a bunch of stuff to be more accurate/descriptive.
* Follow Zig's style around namespacing more.
* Use a structure for the ABI TCB.
No functional change intended.
The code would cause LLVM to emit a jump table for the switch in the loop over
the dynamic tags. That jump table was far enough away that the compiler decided
to go through the GOT, which would of course break at this early stage as we
haven't applied MIPS's local GOT relocations yet, nor can we until we've walked
through the _DYNAMIC array.
The first attempt at rewriting this used code like this:
var sorted_dynv = [_]elf.Addr{0} ** elf.DT_NUM;
But this is also problematic as it results in a memcpy() call. Instead, we
explicitly initialize it to undefined and use a loop of volatile stores to
clear it.
In https://github.com/ziglang/zig/pull/19641, this binding changed from `[*]u16` to `LPWSTR` which made it a sentinel-terminated pointer. This introduced a compiler error in the `std.os.windows.GetModuleFileNameW` wrapper since it takes a `[*]u16` pointer. This commit changes the binding back to what it was before instead of introducing a breaking change to `std.os.windows.GetModuleFileNameW`
Related: https://github.com/ziglang/zig/issues/20858
Accesses to this global variable can require relocations on some platforms (e.g.
MIPS). If we do it before PIE relocations have been applied, we'll crash.
It's actually important for the ABI that r25 (t9) contains the address of the
called function, so that this standard prologue sequence works:
lui $2, %hi(_gp_disp)
addiu $2, $2, %lo(_gp_disp)
addu $gp, $2, $t9
(This is a bit similar to the ToC situation on powerpc that was fixed in
7bc78967b400322a0fc5651f37a1b0428c37fb9d.)