Before this commit, the WTF-16 command line string would be converted to WTF-8 in `init`, and then a second buffer of the WTF-8 size + 1 would be allocated to store the parsed arguments. The converted WTF-8 command line would then be parsed and the relevant bytes would be copied into the argument buffer before being returned.
After this commit, only the WTF-8 size of the WTF-16 string is calculated (without conversion) which is then used to allocate the buffer for the parsed arguments. Parsing is then done on the WTF-16 slice directly, with the arguments being converted to WTF-8 on-the-fly.
This has a few (minor) benefits:
- Cuts the amount of memory allocated by ArgIteratorWindows in half (or better)
- Makes the total amount of memory allocated by ArgIteratorWindows predictable, since, before, the upfront `wtf16LeToWtf8Alloc` call could end up allocating more-memory-than-necessary temporarily due to its internal use of an ArrayList. Now, the amount of memory allocated is always exactly `calcWtf8Len(cmd_line) + 1`.
Deprecated aliases that are now compile errors:
- `std.fs.MAX_PATH_BYTES` (renamed to `std.fs.max_path_bytes`)
- `std.mem.tokenize` (split into `tokenizeAny`, `tokenizeSequence`, `tokenizeScalar`)
- `std.mem.split` (split into `splitSequence`, `splitAny`, `splitScalar`)
- `std.mem.splitBackwards` (split into `splitBackwardsSequence`, `splitBackwardsAny`, `splitBackwardsScalar`)
- `std.unicode`
+ `utf16leToUtf8Alloc`, `utf16leToUtf8AllocZ`, `utf16leToUtf8`, `fmtUtf16le` (all renamed to have capitalized `Le`)
+ `utf8ToUtf16LeWithNull` (renamed to `utf8ToUtf16LeAllocZ`)
- `std.zig.CrossTarget` (moved to `std.Target.Query`)
Deprecated `lib/std/std.zig` decls were deleted instead of made a `@compileError` because the `refAllDecls` in the test block would trigger the `@compileError`. The deleted top-level `std` namespaces are:
- `std.rand` (renamed to `std.Random`)
- `std.TailQueue` (renamed to `std.DoublyLinkedList`)
- `std.ChildProcess` (renamed/moved to `std.process.Child`)
This is not exhaustive. Deprecated aliases that I didn't touch:
+ `std.io.*`
+ `std.Build.*`
+ `std.builtin.Mode`
+ `std.zig.c_translation.CIntLiteralRadix`
+ anything in `src/`
This makes it so that any other threads which are writing to stderr have
a chance to finish before the process terminates. It also clears the
terminal in case any progress has been written to stderr, while still
accomplishing the goal of not waiting until the update thread exits.
On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.
https://github.com/ziglang/zig/pull/18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.
This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).
The motivation here is roughly the same as when the same change was made in Rust (https://github.com/rust-lang/rust/pull/87580), that is (paraphrased):
- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward
Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
Windows paths now use WTF-16 <-> WTF-8 conversion everywhere, which is lossless. Previously, conversion of ill-formed UTF-16 paths would either fail or invoke illegal behavior.
WASI paths must be valid UTF-8, and the relevant function calls have been updated to handle the possibility of failure due to paths not being encoded/encodable as valid UTF-8.
Closes#18694Closes#1774Closes#2565
This adds `ArgIteratorWindows`, which faithfully replicates the quoting and escaping behavior observed in `CommandLineToArgvW` and should make Zig applications play better with processes that abuse these quirks.
Finish the work started in 4c4fb839972f66f55aa44fc0aca5f80b0608c731.
Now the compiler compiles again.
Wire up dependency tree fetching code in the CLI for `zig build`.
Everything is hooked up except for `createDependenciesModule` is not yet
implemented.
- Adds `illumos` to the `Target.Os.Tag` enum. A new function,
`isSolarish` has been added that returns true if the tag is either
Solaris or Illumos. This matches the naming convention found in Rust's
`libc` crate[1].
- Add the tag wherever `.solaris` is being checked against.
- Check for the C pre-processor macro `__illumos__` in CMake to set the
proper target tuple. Illumos distros patch their compilers to have
this in the "built-in" set (verified with `echo | cc -dM -E -`).
Alternatively you could check the output of `uname -o`.
Right now, both Solaris and Illumos import from `c/solaris.zig`. In the
future it may be worth putting the shared ABI bits in a base file, and
mixing that in with specific `c/solaris.zig`/`c/illumos.zig` files.
[1]: 6e02a329a2/src/unix/solarish
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
- fix getdents return type usize → c_int
- special-case process.zig to use sysctl instead of sysctlbyname
- use struct/field pattern for sysctl HW_* constants
`GetPhysicallyInstalledSystemMemory` uses SMBios to grab the physical
memory size which can lead to unecessary allocation and inacurate
representation of the total memory. Using `System_Basic_Information`
help to retrieve the physical memory which is not reserved for the
kernel/tables. This aligns better with the linux side as `/proc/meminfo`
does the same thing.
Notably the Darwin (XNU) kernel the maxrss field is number of bytes
and not kilobytes (kibibytes) like other platforms (e.g. Linux, BSD).
watchOS and tvOS are not supported because they do not have the ability
to spawn a child process. iOS is enabled but due to OS sandboxing it
should fail with a permission error.
This unit test tested the command line arguments which were passed to
the test runner, which is not really something that unit tests are
supposed to observe.
The proper way to test command line argument parsing is with a
standalone test, where the set of command line arguments being tested
for are also being controlled by the test itself.
Previously, this API had pid, to be used on POSIX systems, and handle,
to be used on Windows.
This commit unifies the API, defining an Id type that is either the pid
or the HANDLE depending on the target OS.
This commit also prepares for the future by allowing one to import via
`std.process.Child` which is the fully qualified namespace that I intend
to migrate to in the future.
I was testing this with wazero, which defaults to not propagate any env
variables. This ensures we don't try to allocate zero length buffers
when there are no results from either function.
Signed-off-by: Adrian Cole <adrian@tetrate.io>