The problem is that one may execute too many subprocesses concurrently
that, together, exceed an RSS value that causes the OOM killer to kill
something problematic such as the window manager. Or worse, nothing, and
the system freezes.
This is a real world problem. For example when building LLVM a simple
`ninja install` will bring your system to its knees if you don't know
that you should add `-DLLVM_PARALLEL_LINK_JOBS=1`.
In particular: compiling the zig std lib tests takes about 2G each,
which at 16x at once (8 cores + hyperthreading) is using all 32GB of my
RAM, causing the OOM killer to kill my window manager
The idea here is that you can annotate steps that might use a high
amount of system resources with an upper bound. So for example I could
mark the std lib tests as having an upper bound peak RSS of 3 GiB.
Then the build system will do 2 things:
1. ulimit the child process, so that it will fail if it would exceed
that memory limit.
2. Notice how much system RAM is available and avoid running too many
concurrent jobs at once that would total more than that.
This implements (1) not with an operating system enforced limit, but by
checking the maxrss after a child process exits.
However it does implement (2) correctly.
The available memory used by the build system defaults to the total
system memory, regardless of whether it is used by other processes at
the time of spawning the build runner. This value can be overridden with
the new --maxrss flag to `zig build`. This mechanism will ensure that
the sum total of upper bound RSS memory of concurrent tasks will not
exceed this value.
This system makes it so that project maintainers can annotate
problematic subprocesses, avoiding bug reports from users, who can
blissfully execute `zig build` without worrying about the project's
internals.
Nobody's computer crashes, and the build system uses as much parallelism
as possible without risking OOM. Users do not need to unnecessarily
resort to -j1 when the build system can figure this out for them.