I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.
Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.
Other changes included in this commit:
* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
runtime assertion in favor of using the type system.
After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.
This test was originally introduced in 5f38d6e2e97829ed74f06a96b5d07a2c68516063, where it looked like this:
test "cast *[1][*]const u8 to [*]const ?[*]const u8" {
const window_name = [1][*]const u8{c"window name"};
const x: [*]const ?[*]const u8 = &window_name;
assert(mem.eql(u8, std.cstr.toSliceConst(x[0].?), "window name"));
}
Over the years, this has become more and more obfuscated, to the point that the verbosity of the `expect` call overshadows the point of the example. This commit intends to update this test to match the spirit of the original version of the test, while shedding the obfuscation.
In https://github.com/ziglang/zig/pull/19641, this binding changed from `[*]u16` to `LPWSTR` which made it a sentinel-terminated pointer. This introduced a compiler error in the `std.os.windows.GetModuleFileNameW` wrapper since it takes a `[*]u16` pointer. This commit changes the binding back to what it was before instead of introducing a breaking change to `std.os.windows.GetModuleFileNameW`
Related: https://github.com/ziglang/zig/issues/20858