`ContainerDeclarations` is an abstraction of `ContainerDeclaration*`.
Removing this abstraction allows the `ContainerMembers` rule to contain
more concrete information without having to look at the definition
of `ContainerDeclarations`.
When `std.mem.indexOf` is called with a single-item needle, use `indexOfScalarPos` which is significantly faster than the more general `indexOfPosLinear`. This can be done without introducing overhead to normal cases (where `needle.len > 1`).
* fs/test.zig: use arena allocator more consistently
* fs/test.zig: remove unnecessary type information
Zig can (now?) implicitly cast a `&.{ "foo"}` when passed to
`fs.path.join()`, so the `[_][]const u8` is unnecessary.
* fs/test.zig: Use fs.path.join() for longer paths
Replace long path constructions (that use several "++ path_sep ++")
with a single call to `fs.path.join`. Seems more readable to me.
* fs/test.zig: fmt
* add Module instances for each package's build.zig and attach it to the
dependencies.zig module with the hash digest hex string as the name.
* fix incorrectly skipping the wrong packages for creating
dependencies.zig
* a couple more renaming of "package" to "module"
Finish the work started in 4c4fb839972f66f55aa44fc0aca5f80b0608c731.
Now the compiler compiles again.
Wire up dependency tree fetching code in the CLI for `zig build`.
Everything is hooked up except for `createDependenciesModule` is not yet
implemented.
* start renaming "package" to "module" (see #14307)
- build system gains `main_mod_path` and `main_pkg_path` is still
there but it is deprecated.
* eliminate the object-oriented memory management style of what was
previously `*Package`. Now it is `*Package.Module` and all pointers
point to externally managed memory.
* fixes to get the new Fetch.zig code working. The previous commit was
work-in-progress. There are still two commented out code paths, the
one that leads to `Compilation.create` and the one for `zig build`
that fetches the entire dependency tree and creates the required
modules for the build runner.
Originally inspired by Go's `utf8.Valid` function. Includes some test cases from Go's test suite.
Further optimized to be faster in all tested cases (short/long ascii/UTF8), in all release modes.
Takes advantage of SIMD for the ASCII fast path.
`nextAfter()` returns the next representable value after `x` in the direction of `y` and is a standard math library function ([C++](https://en.cppreference.com/w/cpp/numeric/math/nextafter), [Java](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#nextAfter-double-double-)). It is primarily useful for bitwise incrementing/decrementing floats.
This implementation supports runtime integers, runtime floats and `comptime_int`. `comptime_float` is not supported because NaNs/infinities are intentionally difficult to obtain and because I'm not sure if the fact that it's backed by `f128` is supposed to be an implementation detail. Either way, the user could just call the function with the floating-point type whose behavior they want at comptime and then cast the result to `comptime_float`.
The float implementation was ported from mingw-w64 with some slight changes made possible because the Zig standard library doesn't care about raising FP exceptions.
The number of test cases may seem excessive but they should cover every normal and edge case for every float type and are especially important for verifying that `f80` works.
Until now, we would pass `candidate: NativeTargetInfo` which creates
a copy of the `NativeTargetInfo.DynamicLinker` buffer. We would then
return this buffer in `bad_dl: []const u8` which would goes out-of-scope
the moment we leave this function frame yielding garbage. To fix this,
we just need to remember to pass by const-pointer
`candidate: *const NativeTargetInfo`.
Follow up to #17383. This is a minor optimization that only matters when a small allocation is resized/free'd soon after it is allocated.
The only real difference I was able to observe with this was via a synthetic benchmark that allocates a full bucket and then frees all but one of the slots, over and over in a loop:
Debug build:
Benchmark 1 (9 runs): gpa-degen-master.exe
measurement mean ± σ min … max outliers delta
wall_time 575ms ± 5.19ms 569ms … 583ms 0 ( 0%) 0%
peak_rss 43.8MB ± 1.37KB 43.8MB … 43.8MB 1 (11%) 0%
Benchmark 2 (10 runs): gpa-degen-search-cur.exe
measurement mean ± σ min … max outliers delta
wall_time 532ms ± 5.55ms 520ms … 539ms 0 ( 0%) ⚡- 7.5% ± 0.9%
peak_rss 43.8MB ± 65.2KB 43.8MB … 44.0MB 1 (10%) + 0.0% ± 0.1%
ReleaseFast build:
Benchmark 1 (129 runs): gpa-degen-master-release.exe
measurement mean ± σ min … max outliers delta
wall_time 38.9ms ± 1.12ms 36.7ms … 42.4ms 8 ( 6%) 0%
peak_rss 23.2MB ± 2.39KB 23.2MB … 23.2MB 0 ( 0%) 0%
Benchmark 2 (151 runs): gpa-degen-search-cur-release.exe
measurement mean ± σ min … max outliers delta
wall_time 33.2ms ± 999us 31.9ms … 36.3ms 20 (13%) ⚡- 14.7% ± 0.6%
peak_rss 23.2MB ± 2.26KB 23.2MB … 23.2MB 0 ( 0%) + 0.0% ± 0.0%
The idea here is to avoid code bloat by having only one actual io.Reader
implementation, which is type erased, and then implement a GenericReader
that preserves type information on top of that as thin glue code.
The strategy here is for that glue code to `@errSetCast` the result of
the type-erased reader functions, however, while trying to do that I
ran into #17343.
The `makeMem*()` functions crashed under valgrind in Debug and
ReleaseSafe modes.
The reason being that `doMemCheckClientRequestExpr()` returns `0`
when not running under Valgrind, and `maxInt(usize)` when running
under Valgrind.
Thus, `@as(i1, @intCast(maxInt(usize)))` always fails and these
functions crashed before returning.
That being said, what these functions used to return was quite
unexpected: `0` on error and `-1` on success (=running under valgrind).
That doesn't match any Zig nor C conventions.
But that return value doesn't seem to be very useful. Either we are
running under Valgrind or we are not. There's no point in checking this
for every single call. Applications are likely to always discard it.
So, just return a `void` instead.
Also avoid function comments that start with `Similarly, ...` because
that doesn't refer to anything in the context of autodoc or in IDEs.
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.
This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688
In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.
To address this, there are three different changes happening here:
1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.
The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.
In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.
Benchmark 1: test-master.bat
Time (mean ± σ): 481.263 s ± 5.440 s [User: 479.159 s, System: 1.937 s]
Range (min … max): 477.416 s … 485.109 s 2 runs
Benchmark 2: test-optim-treap.bat
Time (mean ± σ): 19.639 s ± 0.037 s [User: 18.183 s, System: 1.452 s]
Range (min … max): 19.613 s … 19.665 s 2 runs
Summary
'test-optim-treap.bat' ran
24.51 ± 0.28 times faster than 'test-master.bat'
Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.
These changes may or mat not introduce a slight performance regression in the average case:
Here's the standard library tests on Windows in Debug mode:
Benchmark 1 (10 runs): std-tests-master.exe
measurement mean ± σ min … max outliers delta
wall_time 16.0s ± 30.8ms 15.9s … 16.1s 1 (10%) 0%
peak_rss 42.8MB ± 8.24KB 42.8MB … 42.8MB 0 ( 0%) 0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
measurement mean ± σ min … max outliers delta
wall_time 16.2s ± 37.6ms 16.1s … 16.3s 0 ( 0%) 💩+ 1.3% ± 0.2%
peak_rss 42.8MB ± 5.18KB 42.8MB … 42.8MB 0 ( 0%) + 0.1% ± 0.0%
And on Linux:
Benchmark 1: ./test-master
Time (mean ± σ): 16.091 s ± 0.088 s [User: 15.856 s, System: 0.453 s]
Range (min … max): 15.870 s … 16.166 s 10 runs
Benchmark 2: ./test-optim-treap
Time (mean ± σ): 16.028 s ± 0.325 s [User: 15.755 s, System: 0.492 s]
Range (min … max): 15.735 s … 16.709 s 10 runs
Summary
'./test-optim-treap' ran
1.00 ± 0.02 times faster than './test-master'
zig fetch [options] <url>
zig fetch [options] <path>
Fetches a package which is found at <url> or <path> into the global
cache directory, printing the package hash to stdout.
Closes#16972
Related to #14280
Additionally, this commit:
* Adds uncompressed .tar support to package fetching
* Introduces symlink support to package fetching
The 5.11 in uname is not something that is ever updated. There is no
versioning of the illumos system in general. Illumos prefers to rely
on feature detection.
I can't say what Solaris does these days as I do not work at Oracle;
so I left it alone.
- Adds `illumos` to the `Target.Os.Tag` enum. A new function,
`isSolarish` has been added that returns true if the tag is either
Solaris or Illumos. This matches the naming convention found in Rust's
`libc` crate[1].
- Add the tag wherever `.solaris` is being checked against.
- Check for the C pre-processor macro `__illumos__` in CMake to set the
proper target tuple. Illumos distros patch their compilers to have
this in the "built-in" set (verified with `echo | cc -dM -E -`).
Alternatively you could check the output of `uname -o`.
Right now, both Solaris and Illumos import from `c/solaris.zig`. In the
future it may be worth putting the shared ABI bits in a base file, and
mixing that in with specific `c/solaris.zig`/`c/illumos.zig` files.
[1]: 6e02a329a2/src/unix/solarish