This commit flips usage of PathType.isSep from requiring the caller to convert to native to assuming the input is LE encoded, which is a breaking change. This makes usage a bit nicer, though, and moves the endian conversion work from runtime to comptime.
Previously, fs.path handled a few of the Windows path types, but not all of them, and only a few of them correctly/consistently. This commit aims to make `std.fs.path` correct and consistent in handling all possible Win32 path types.
This commit also slightly nudges the codebase towards a separation of Win32 paths and NT paths, as NT paths are not actually distinguishable from Win32 paths from looking at their contents alone (i.e. `\Device\Foo` could be an NT path or a Win32 rooted path, no way to tell without external context). This commit formalizes `std.fs.path` being fully concerned with Win32 paths, and having no special detection/handling of NT paths.
Resources on Windows path types, and Win32 vs NT paths:
- https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html
- https://chrisdenton.github.io/omnipath/Overview.html
- https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
API additions/changes/deprecations
- `std.os.windows.getWin32PathType` was added (it is analogous to `RtlDetermineDosPathNameType_U`), while `std.os.windows.getNamespacePrefix` and `std.os.windows.getUnprefixedPathType` were deleted. `getWin32PathType` forms the basis on which the updated `std.fs.path` functions operate.
- `std.fs.path.parsePath`, `std.fs.path.parsePathPosix`, and `std.fs.path.parsePathWindows` were added, while `std.fs.path.windowsParsePath` was deprecated. The new `parsePath` functions provide the "root" and the "kind" of a path, which is platform-specific. The now-deprecated `windowsParsePath` did not handle all possible path types, while the new `parsePathWindows` does.
- `std.fs.path.diskDesignator` has been deprecated in favor of `std.fs.path.parsePath`, and same deal with `diskDesignatorWindows` -> `parsePathWindows`
- `relativeWindows` is now a compile error when *not* targeting Windows, while `relativePosix` is now a compile error when targeting Windows. This is because those functions read/use the CWD path which will behave improperly when used from a system with different path semantics (e.g. calling `relativePosix` from a Windows system with a CWD like `C:\foo\bar` will give you a bogus result since that'd be treated as a single relative component when using POSIX semantics). This also allows `relativeWindows` to use Windows-specific APIs for getting the CWD and environment variables to cut down on allocations.
- `componentIterator`/`ComponentIterator.init` have been made infallible. These functions used to be able to error on UNC paths with an empty server component, and on paths that were assumed to be NT paths, but now:
+ We follow the lead of `RtlDetermineDosPathNameType_U`/`RtlGetFullPathName_U` in how it treats a UNC path with an empty server name (e.g. `\\\share`) and allow it, even if it'll be invalid at the time of usage
+ Now that `std.fs.path` assumes paths are Win32 paths and not NT paths, we don't have to worry about NT paths
Behavior changes
- `std.fs.path` generally: any combinations of mixed path separators for UNC paths are universally supported, e.g. `\/server/share`, `/\server\share`, `/\server/\\//share` are all seen as equivalent UNC paths
- `resolveWindows` handles all path types more appropriately/consistently.
+ `//` and `//foo` used to be treated as a relative path, but are now seen as UNC paths
+ If a rooted/drive-relative path cannot be resolved against anything more definite, the result will remain a rooted/drive-relative path.
+ I've created [a script to generate the results of a huge number of permutations of different path types](https://gist.github.com/squeek502/9eba7f19cad0d0d970ccafbc30f463bf) (the result of running the script is also included for anyone that'd like to vet the behavior).
- `dirnameWindows` now treats the drive-relative root as the dirname of a drive-relative path with a component, e.g. `dirname("C:foo")` is now `C:`, whereas before it would return null. `dirnameWindows` also handles local device paths appropriately now.
- `basenameWindows` now handles all path types more appropriately. The most notable change here is `//a` being treated as a partial UNC path now and therefore `basename` will return `""` for it, whereas before it would return `"a"`
- `relativeWindows` will now do its best to resolve against the most appropriate CWD for each path, e.g. relative for `D:foo` will look at the CWD to check if the drive letter matches, and if not, look at the special environment variable `=D:` to get the shell-defined CWD for that drive, and if that doesn't exist, then it'll resolve against `D:\`.
Implementation details
- `resolveWindows` previously looped through the paths twice to build up the relevant info before doing the actual resolution. Now, `resolveWindows` iterates backwards once and keeps track of which paths are actually relevant using a bit set, which also allows it to break from the loop when it's no longer possible for earlier paths to matter.
- A standalone test was added to test parts of `relativeWindows` since the CWD resolution logic depends on CWD information from the PEB and environment variables
Edge cases worth noting
- A strange piece of trivia that I found out while working on this is that it's technically possible to have a drive letter that it outside the intended A-Z range, or even outside the ASCII range entirely. Since we deal with both WTF-8 and WTF-16 paths, `path[0]`/`path[1]`/`path[2]` will not always refer to the same bits of information, so to get consistent behavior, some decision about how to deal with this edge case had to be made. I've made the choice to conform with how `RtlDetermineDosPathNameType_U` works, i.e. treat the first WTF-16 code unit as the drive letter. This means that when working with WTF-8, checking for drive-relative/drive-absolute paths is a bit more complicated. For more details, see the lengthy comment in `std.os.windows.getWin32PathType`
- `relativeWindows` will now almost always be able to return either a fully-qualified absolute path or a relative path, but there's one scenario where it may return a rooted path: when the CWD gotten from the PEB is not a drive-absolute or UNC path (if that's actually feasible/possible?). An alternative approach to this scenario might be to resolve against the `HOMEDRIVE` env var if available, and/or default to `C:\` as a last resort in order to guarantee the result of `relative` is never a rooted path.
- Partial UNC paths (e.g. `\\server` instead of `\\server\share`) are a bit awkward to handle, generally. Not entirely sure how best to handle them, so there may need to be another pass in the future to iron out any issues that arise. As of now the behavior is:
+ For `relative`, any part of a UNC disk designator is treated as the "root" and therefore isn't applicable for relative paths, e.g. calling `relative` with `\\server` and `\\server\share` will result in `\\server\share` rather than just `share` and if `relative` is called with `\\server\foo` and `\\server\bar` the result will be `\\server\bar` rather than `..\bar`
+ For `resolve`, any part of a UNC disk designator is also treated as the "root", but relative and rooted paths are still elligable for filling in missing portions of the disk designator, e.g. `resolve` with `\\server` and `foo` or `\foo` will result in `\\server\foo`
Fixes#25703Closes#25702
added adapter to AnyWriter and GenericWriter to help bridge the gap
between old and new API
make std.testing.expectFmt work at compile-time
std.fmt no longer has a dependency on std.unicode. Formatted printing
was never properly unicode-aware. Now it no longer pretends to be.
Breakage/deprecations:
* std.fs.File.reader -> std.fs.File.deprecatedReader
* std.fs.File.writer -> std.fs.File.deprecatedWriter
* std.io.GenericReader -> std.io.Reader
* std.io.GenericWriter -> std.io.Writer
* std.io.AnyReader -> std.io.Reader
* std.io.AnyWriter -> std.io.Writer
* std.fmt.format -> std.fmt.deprecatedFormat
* std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
* std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
* std.fmt.fmtSliceHexLower -> {x}
* std.fmt.fmtSliceHexUpper -> {X}
* std.fmt.fmtIntSizeDec -> {B}
* std.fmt.fmtIntSizeBin -> {Bi}
* std.fmt.fmtDuration -> {D}
* std.fmt.fmtDurationSigned -> {D}
* {} -> {f} when there is a format method
* format method signature
- anytype -> *std.io.Writer
- inferred error set -> error{WriteFailed}
- options -> (deleted)
* std.fmt.Formatted
- now takes context type explicitly
- no fmt string
It caused an assertion failure when building Zig from source.
This reverts commit 0595feb34128db22fbebea843af929de3ede8190, reversing
changes made to 744771d3303e122474a72c8a94b14fe1e9fb480c.
closes#22566closes#22568
Windows paths now use WTF-16 <-> WTF-8 conversion everywhere, which is lossless. Previously, conversion of ill-formed UTF-16 paths would either fail or invoke illegal behavior.
WASI paths must be valid UTF-8, and the relevant function calls have been updated to handle the possibility of failure due to paths not being encoded/encodable as valid UTF-8.
Closes#18694Closes#1774Closes#2565
Before this commit, there were three issues with the makePath implementation:
1. The component iteration did not 'collapse' consecutive path separators; instead, it would treat `a/b//c` as `a/b//c`, `a/b/`, `a/b`, and `a`.
2. Trailing path separators led to redundant `makeDir` calls, e.g. with the path `a/b/` (if `a` doesn't exist), it would try to create `a/b/`, then try `a/b`, then try `a`, then try `a/b`, and finally try `a/b/` again.
3. The iteration did not treat the root of a path specially, so on Windows it could attempt to make a directory with a path like `X:` for something like `X:\a\b\c` if the `X:\` drive doesn't exist. This didn't lead to any problems that I could find, but there's no reason to try to make a drive letter as a directory (or any other root path).
This commit fixes all three issues by introducing a ComponentIterator that is root-aware and handles both sequential path separators and trailing path separators and uses it in `Dir.makePath`. This reduces the number of `makeDir` calls for paths where (1) the root of the path doesn't exist, (2) there are consecutive path separators, or (3) there are trailing path separators
As an example, here are the makeDir calls that would have been made before this commit when calling `makePath` for a relative path like `a/b//c//` (where the full path needs to be created):
a/b//c//
a/b//c/
a/b//c
a/b/
a/b
a
a/b
a/b/
a/b//c
a/b//c/
a/b//c//
And after this commit:
a/b//c
a/b
a
a/b
a/b//c
The majority of these are in comments, some in doc comments which might
affect the generated documentation, and a few in parameter names -
nothing that should be breaking, however.
* docs(std.math): elaborate on difference between absCast and absInt
* docs(std.rand.Random.weightedIndex): elaborate on likelihood
I think this makes it easier to understand.
* langref: add small reminder
* docs(std.fs.path.extension): brevity
* docs(std.bit_set.StaticBitSet): mention the specific types
* std.debug.TTY: explain what purpose this struct serves
This should also make it clearer that this struct is not supposed to provide unrelated terminal manipulation functionality such as setting the cursor position or something because terminals are complicated and we should keep this struct simple and focused on debugging.
* langref(package listing): brevity
* langref: explain what exactly `threadlocal` causes to happen
* std.array_list: link between swapRemove and orderedRemove
Maybe this can serve as a TLDR and make it easier to decide.
* PrefetchOptions.locality: clarify docs that this is a range
This confused me previously and I thought I can only use either 0 or 3.
* fix typos and more
* std.builtin.CallingConvention: document some CCs
* langref: explain possibly cryptic names
I think it helps knowing what exactly these acronyms (@clz and @ctz) and
abbreviations (@popCount) mean.
* variadic function error: add missing preposition
* std.fmt.format docs: nicely hyphenate
* help menu: say what to optimize for
I think this is slightly more specific than just calling it
"optimizations". These are speed optimizations. I used the word
"performance" here.
The resolvePosix and resolveWindows routines changed behaviour in an
earlier commit so that the return value is not always an absolute path.
That caused the relativePosix and relativeWindows to return a relative
path that is not correct.
The change in behaviour mentioned above would cause a local cache-dir to
be created in the wrong directory when --cache-dir was specified for a
build.
In general, we prefer compiler code to use relative paths based on open
directory handles because this is the most portable. However, sometimes
absolute paths are used, and sometimes relative paths are used that go
up a directory.
The recent improvements in 81d2135ca6ebd71b8c121a19957c8fbf7f87125b
regressed the use case when an absolute path is used for the zig lib
directory mixed with a relative path used for the root source file. This
could happen when, for example, running the standard library tests, like
this:
stage3/bin/zig test ../lib/std/std.zig
This happened because the zig lib dir was inferred to be an absolute
directory based on the zig executable directory, while the root source
file was detected as a relative path. There was no common prefix and so
it was not determined that the std.zig file was inside the lib
directory.
This commit adds a function for resolving paths that preserves relative
path names while allowing absolute paths, and converting relative
upwards paths (e.g. "../foo") to absolute paths. This restores the
previous functionality while remaining compatible with systems such as
WASI that cannot deal with absolute paths.
This is a breaking change to the API. Instead of the first path
implicitly being the current working directory, it now asserts that the
number of paths passed is greater than zero.
Importantly, it never calls getcwd(); instead, it can possibly return
".", or a series of "../". This changes the error set to only be
`error{OutOfMemory}`.
closes#13613
Two major changes here:
1. We store the CWD as a simple `[]const u8` and lookup Preopens for
every absolute or CWD-referenced file operation, based on the
Preopen with the longest match (i.e. most specific path)
2. Preorders are normalized to POSIX absolute paths at init time.
Behavior depends on the "cwd_root" parameter of `initPreopensWasi`:
`cwd_root` is used for any Preopens that start with "."
For example:
"./foo/bar" - inits to -> "{cwd_root}/foo/bar"
"foo/bar" - inits to -> "/foo/bar"
"/foo/bar" - inits to -> "/foo/bar"
`cwd_root` must be an absolute path.
Using "/" as `cwd_root` gives behavior similar to wasi-libc.
This adds a special CWD file descriptor, AT.FDCWD (-2), to refer to the
current working directory. The `*at(...)` functions look for this and
resolve relative paths against the stored CWD. Absolute paths are
dynamically matched against the stored Preopens.
"os.initPreopensWasi()" must be called before std.os functions will
resolve relative or absolute paths correctly. This is asserted at
runtime.
Support has been added for: `open`, `rename`, `mkdir`, `rmdir`, `chdir`,
`fchdir`, `link`, `symlink`, `unlink`, `readlink`, `fstatat`, `access`,
and `faccessat`.
This also includes limited support for `getcwd()` and `realpath()`.
These return an error if the CWD does not correspond to a Preopen with
an absolute path. They also do not currently expand symlinks.
UEFI uses `\` for paths exclusively. This changes std.fs.path to use `\`
for UEFI path joining. Also adds a few tests regarding it, specifically
in making sure double-separators do not result from path joining, as the
UEFI spec says to convert any that result from joining into single
separators (UEFI Spec Version 2.7, pg. 448).
GetCurrentDirectory returns a path with a trailing slash iff the cwd is
a root directory, making the code in `resolveWindows` return an invalid
path with two consecutive slashes.
Closes#10093
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.
Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.