This is a major refactor to `Step.Run` which adds new functionality,
primarily to the execution of Zig tests.
* All tests are run, even if a test crashes. This happens through the
same mechanism as timeouts where the test processes is repeatedly
respawned as needed.
* The build status output is more precise. For each unit test, it
differentiates pass, skip, fail, crash, and timeout. Memory leaks are
reported separately, as they do not indicate a test's "status", but
are rather an additional property (a test with leaks may still pass!).
* The number of memory leaks is tracked and reported, both per-test and
for a whole `Run` step.
* Reporting is made clearer when a step is failed solely due to error
logs (`std.log.err`) where every unit test passed.
For now, there is a flag to `zig build` called `--test-timeout-ms` which
accepts a value in milliseconds. If the execution time of any individual
unit test exceeds that number of milliseconds, the test is terminated
and marked as timed out.
In the future, we may want to increase the granularity of this feature
by allowing timeouts to be specified per-step or even per-test. However,
a global option is actually very useful. In particular, it can be used
in CI scripts to ensure that no individual unit test exceeds some
reasonable limit (e.g. 60 seconds) without having to assign limits to
every individual test step in the build script.
Also, individual unit test durations are now shown in the time report
web interface -- this was fairly trivial to add since we're timing tests
(to check for timeouts) anyway.
This commit makes progress on #19821, but does not close it, because
that proposal includes a more sophisticated mechanism for setting
timeouts.
Co-Authored-By: David Rubin <david@vortan.dev>
As well as the exact byte count, include a human-readable value so it's
clearer what the error is actually telling you. The exact byte count
might not be worth keeping, but I decided I would in case it's useful in
any scenario.
This commit replaces the "fuzzer" UI, previously accessed with the
`--fuzz` and `--port` flags, with a more interesting web UI which allows
more interactions with the Zig build system. Most notably, it allows
accessing the data emitted by a new "time report" system, which allows
users to see which parts of Zig programs take the longest to compile.
The option to expose the web UI is `--webui`. By default, it will listen
on `[::1]` on a random port, but any IPv6 or IPv4 address can be
specified with e.g. `--webui=[::1]:8000` or `--webui=127.0.0.1:8000`.
The options `--fuzz` and `--time-report` both imply `--webui` if not
given. Currently, `--webui` is incompatible with `--watch`; specifying
both will cause `zig build` to exit with a fatal error.
When the web UI is enabled, the build runner spawns the web server as
soon as the configure phase completes. The frontend code consists of one
HTML file, one JavaScript file, two CSS files, and a few Zig source
files which are built into a WASM blob on-demand -- this is all very
similar to the old fuzzer UI. Also inherited from the fuzzer UI is that
the build system communicates with web clients over a WebSocket
connection.
When the build finishes, if `--webui` was passed (i.e. if the web server
is running), the build runner does not terminate; it continues running
to serve web requests, allowing interactive control of the build system.
In the web interface is an overall "status" indicating whether a build
is currently running, and also a list of all steps in this build. There
are visual indicators (colors and spinners) for in-progress, succeeded,
and failed steps. There is a "Rebuild" button which will cause the build
system to reset the state of every step (note that this does not affect
caching) and evaluate the step graph again.
If `--time-report` is passed to `zig build`, a new section of the
interface becomes visible, which associates every build step with a
"time report". For most steps, this is just a simple "time taken" value.
However, for `Compile` steps, the compiler communicates with the build
system to provide it with much more interesting information: time taken
for various pipeline phases, with a per-declaration and per-file
breakdown, sorted by slowest declarations/files first. This feature is
still in its early stages: the data can be a little tricky to
understand, and there is no way to, for instance, sort by different
properties, or filter to certain files. However, it has already given us
some interesting statistics, and can be useful for spotting, for
instance, particularly complex and slow compile-time logic.
Additionally, if a compilation uses LLVM, its time report includes the
"LLVM pass timing" information, which was previously accessible with the
(now removed) `-ftime-report` compiler flag.
To make time reports more useful, ZIR and compilation caches are ignored
by the Zig compiler when they are enabled -- in other words, `Compile`
steps *always* run, even if their result should be cached. This means
that the flag can be used to analyze a project's compile time without
having to repeatedly clear cache directory, for instance. However, when
using `-fincremental`, updates other than the first will only show you
the statistics for what changed on that particular update. Notably, this
gives us a fairly nice way to see exactly which declarations were
re-analyzed by an incremental update.
If `--fuzz` is passed to `zig build`, another section of the web
interface becomes visible, this time exposing the fuzzer. This is quite
similar to the fuzzer UI this commit replaces, with only a few cosmetic
tweaks. The interface is closer than before to supporting multiple fuzz
steps at a time (in line with the overall strategy for this build UI,
the goal will be for all of the fuzz steps to be accessible in the same
interface), but still doesn't actually support it. The fuzzer UI looks
quite different under the hood: as a result, various bugs are fixed,
although other bugs remain. For instance, viewing the source code of any
file other than the root of the main module is completely broken (as on
master) due to some bogus file-to-module assignment logic in the fuzzer
UI.
Implementation notes:
* The `lib/build-web/` directory holds the client side of the web UI.
* The general server logic is in `std.Build.WebServer`.
* Fuzzing-specific logic is in `std.Build.Fuzz`.
* `std.Build.abi` is the new home of `std.Build.Fuzz.abi`, since it now
relates to the build system web UI in general.
* The build runner now has an **actual** general-purpose allocator,
because thanks to `--watch` and `--webui`, the process can be
arbitrarily long-lived. The gpa is `std.heap.DebugAllocator`, but the
arena remains backed by `std.heap.page_allocator` for efficiency. I
fixed several crashes caused by conflation of `gpa` and `arena` in the
build runner and `std.Build`, but there may still be some I have
missed.
* The I/O logic in `std.Build.WebServer` is pretty gnarly; there are a
*lot* of threads involved. I anticipate this situation improving
significantly once the `std.Io` interface (with concurrency support)
is introduced.
added adapter to AnyWriter and GenericWriter to help bridge the gap
between old and new API
make std.testing.expectFmt work at compile-time
std.fmt no longer has a dependency on std.unicode. Formatted printing
was never properly unicode-aware. Now it no longer pretends to be.
Breakage/deprecations:
* std.fs.File.reader -> std.fs.File.deprecatedReader
* std.fs.File.writer -> std.fs.File.deprecatedWriter
* std.io.GenericReader -> std.io.Reader
* std.io.GenericWriter -> std.io.Writer
* std.io.AnyReader -> std.io.Reader
* std.io.AnyWriter -> std.io.Writer
* std.fmt.format -> std.fmt.deprecatedFormat
* std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
* std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
* std.fmt.fmtSliceHexLower -> {x}
* std.fmt.fmtSliceHexUpper -> {X}
* std.fmt.fmtIntSizeDec -> {B}
* std.fmt.fmtIntSizeBin -> {Bi}
* std.fmt.fmtDuration -> {D}
* std.fmt.fmtDurationSigned -> {D}
* {} -> {f} when there is a format method
* format method signature
- anytype -> *std.io.Writer
- inferred error set -> error{WriteFailed}
- options -> (deleted)
* std.fmt.Formatted
- now takes context type explicitly
- no fmt string
We have no control over memory usage on arbitrary systems in the wild. But we
would still like to get the warnings so we can adjust the values based on
observations in the official ZSF CI.
Closes#23254.
Closes#23638.
Aside from adding comments to document the logic in `Cache.Manifest.hit`
better, this commit fixes two serious bugs.
The first, spotted by Andrew, is that when upgrading from a shared to an
exclusive lock on the manifest file, we do not seek it back to the
start. This is a simple fix.
The second is more subtle, and has to do with the computation of file
digests. Broadly speaking, the goal of the main loop in `hit` is to
iterate the files listed in the manifest file, and check if they've
changed, based on stat and a file hash. While doing this, the
`bin_digest` field of `std.Build.Cache.File`, which is initially
`undefined`, is populated for all files, either straight from the
manifest (if the stat matches) or recomputed from the file on-disk. This
file digest is then used to update `man.hash.hasher`, which is building
the final hash used as, for instance, the output directory name when the
compiler emits into the cache directory. When `hit` returns a cache
miss, it is expected that `man.hash.hasher` includes the digests of all
"initial files"; that is, those which have been already added with e.g.
`addFilePath`, but not those which will later be added with
`addFilePost` (even though the manifest file has told us about some such
files). Previously, `hit` was using the `unhit` function to do this in a
few cases. However, this is incorrect, because `hit` assumes that all
files already have their `bin_digest` field populated; this function is
only valid to call *after* `hit` returns. Instead, we need to actually
compute the hashes which haven't yet been populated. Even if this logic
has been working, there was still a bug here, because we called `unhit`
when upgrading from a shared to an exclusive lock, writing the
(potentially `undefined`) file digests, but the loop itself writes the
file digests *again*! All in all, the hashing logic here was actually
incredibly broken.
I've taken the opportunity to restructure this section of the code into
what I think is a more readable format. A new function,
`hitWithCurrentLock`, uses the open manifest file to try and find a
cache hit. It returns a tagged union which, in the miss case, tells the
caller (`hit`) how many files already have their hash populated. This
avoids redundant work recomputing the same hash multiple times in
situations where the lock needs upgrading. This also eliminates the
outer loop from `hit`, which was a little confusing because it iterated
no more than twice!
The bugs fixed here could manifest in several different ways depending
on how contended file locks were satisfied. Most notably, on a cache
miss, the Zig compiler might have written the compilation output to the
incorrect directory (because it incorrectly constructed a hash using
`undefined` or repeated file digests), resulting in all future hits on
this manifest causing `error.FileNotFound`. This is #23110. I have been
able to reproduce #23110 on `master`, and have not been able to after
this commit, so I am relatively sure this commit resolves that issue.
Resolves: #23110
The previous commit cast doubt upon the initial report about macOS
kernel behavior, identifying another reason that ENOENT could be
returned from file creation.
However, it is demonstrable that ENOENT can be returned for both cases:
1. create file race
2. handle refers to deleted directory
This commit re-introduces the workaround for the file creation race on
macOS however it does not unconditionally retry - it first tries again
with O_EXCL to disambiguate the error condition that has occurred.
Previous commits
2b0929929d67e222ca6a9523a3a594ed456c4a51
4ea2f441df36cec61e1017f4d795d4037326c98c
had this text:
> There are no dir components, so you would think that this was
> unreachable, however we have observed on macOS two processes racing to
> do openat() with O_CREAT manifest in ENOENT.
This appears to have been a misunderstanding based on the issue
report #12138 and corresponding PR #12139 in which the steps to
reproduce removed the cache directory in a loop which also executed
detached Zig compiler processes.
There is no evidence for the macOS kernel bug however the ENOENT is
easily explained by the removal of the cache directory.
This commit reverts those commits, ultimately reporting the ENOENT as an
error rather than repeating the create file operation. However this
commit also adds an explicit error set to `std.Build.Cache.hit` as well
as changing the `failed_file_index` to a proper diagnostic field that
fully communicates what failed, leading to more informative error
messages on failure to check the cache.
The equivalent failure when occuring for AstGen performs a fatal process
kill, reasoning being that the compiler has an invariant of the cache
directory not being yanked out from underneath it while executing. This
could be made a more granular error in the future but I suspect such
thing is not valuable to pursue.
Related to #18340 but does not solve it.
A compilation build step for which the binary is not required could not
be compiled previously. There were 2 issues that caused this:
- The compiler communicated only the results of the emitted binary and
did not properly communicate the result if the binary was not emitted.
This is fixed by communicating the final hash of the artifact path (the
hash of the corresponding /o/<hash> directory) and communicating this
instead of the entire path. This changes the zig build --listen protocol
to communicate hashes instead of paths, and emit_bin_path is accordingly
renamed to emit_digest.
- There was an error related to the default llvm object path when
CacheUse.Whole was selected. I'm not really sure why this didn't manifest
when the binary is also emitted.
This was fixed by improving the path handling related to flush() and
emitLlvmObject().
In general, this commit also improves some of the path handling throughout
the compiler and standard library.
* new .zig-cache subdirectory: 'v'
- stores coverage information with filename of hash of PCs that want
coverage. This hash is a hex encoding of the 64-bit coverage ID.
* build runner
* fixed bug in file system inputs when a compile step has an
overridden zig_lib_dir field set.
* set some std lib options optimized for the build runner
- no side channel mitigations
- no Transport Layer Security
- no crypto fork safety
* add a --port CLI arg for choosing the port the fuzzing web interface
listens on. it defaults to choosing a random open port.
* introduce a web server, and serve a basic single page application
- shares wasm code with autodocs
- assets are created live on request, for convenient development
experience. main.wasm is properly cached if nothing changes.
- sources.tar comes from file system inputs (introduced with the
`--watch` feature)
* receives coverage ID from test runner and sends it on a thread-safe
queue to the WebServer.
* test runner
- takes a zig cache directory argument now, for where to put coverage
information.
- sends coverage ID to parent process
* fuzzer
- puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log
- computes coverage_id and makes it available with
`fuzzer_coverage_id` exported function.
- the memory-mapped coverage file is now namespaced by the coverage id
in hex encoding, in `.zig-cache/v`
* tokenizer
- add a fuzz test to check that several properties are upheld
Compilation errors now report a failure on rebuilds triggered by file
system watches.
Compiler crashes now report failure correctly on rebuilds triggered by
file system watches.
The compiler subprocess is restarted if a broken pipe is encountered on
a rebuild.
Changes the `make` function signature to take an options struct, which
additionally includes `watch: bool`. I intentionally am not exposing
this information to configure phase logic.
Also adds global zig cache to the compiler cache prefixes.
Closes#20600
Updates the build runner to unconditionally require a zig lib directory
parameter. This parameter is needed in order to correctly understand
file system inputs from zig compiler subprocesses, since they will refer
to "the zig lib directory", and the build runner needs to place file
system watches on directories in there.
The build runner's fanotify file watching implementation now accounts
for when two or more Cache.Path instances compare unequal but ultimately
refer to the same directory in the file system.
Breaking change: std.Build no longer has a zig_lib_dir field. Instead,
there is the Graph zig_lib_directory field, and individual Compile steps
can still have their zig lib directories overridden. I think this is
unlikely to break anyone's build in practice.
The compiler now sends a "file_system_inputs" message to the build
runner which shares the full set of files that were added to the cache
system with the build system, so that the build runner can watch
properly and redo the Compile step. This is implemented for whole cache
mode but not yet for incremental cache mode.
and add file system watching integration.
`addDirectoryWatchInput` now returns a `bool` which helps remind the
caller to
1. call addDirectoryWatchInputFromPath on any derived paths
2. but only if the dependency is not already captured by a step
dependency edge.
The make function now recursively walks all directories and adds the
found files to the cache hash rather than incorrectly only adding the
directory name to the cache hash.
closes#20571
And use it to implement InstallDir Step watch integration.
I'm not seeing any events triggered when I run `mkdir` in the watched
directory, however, and I have not yet figured out why.
This has been planned for quite some time; this commit finally does it.
Also implements file system watching integration in the make()
implementation for UpdateSourceFiles and fixes the reporting of step
caching for both.
WriteFile does not yet have file system watching integration.
I'm still learning how the fanotify API works but I think after playing
with it in this commit, I finally know how to implement it, at least on
Linux. This commit does not accomplish the goal but I want to take the
code in a different direction and still be able to reference this point
in time by viewing a source control diff.
I think the move is going to be saving the file_handle for the parent
directory, which combined with the dirent names is how we can correlate
the events back to the Step instances that have registered file system
inputs. I predict this to be similar to implementations on other
operating systems.
Now there is `captureChildProcess` which gives access to the
`std.process.Child.RunResult`, which is useful for accessing the stdout.
It also accepts and passes an optional `std.Progress.Node` to the child.
* `doc/langref` formatting
* upgrade `.{ .path = "..." }` to `b.path("...")`
* avoid using arguments named `self`
* make `Build.Step.Id` usage more consistent
* add `Build.pathResolve`
* use `pathJoin` and `pathResolve` everywhere
* make sure `Build.LazyPath.getPath2` returns an absolute path