Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.
This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688
In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.
To address this, there are three different changes happening here:
1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.
The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.
In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.
Benchmark 1: test-master.bat
Time (mean ± σ): 481.263 s ± 5.440 s [User: 479.159 s, System: 1.937 s]
Range (min … max): 477.416 s … 485.109 s 2 runs
Benchmark 2: test-optim-treap.bat
Time (mean ± σ): 19.639 s ± 0.037 s [User: 18.183 s, System: 1.452 s]
Range (min … max): 19.613 s … 19.665 s 2 runs
Summary
'test-optim-treap.bat' ran
24.51 ± 0.28 times faster than 'test-master.bat'
Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.
These changes may or mat not introduce a slight performance regression in the average case:
Here's the standard library tests on Windows in Debug mode:
Benchmark 1 (10 runs): std-tests-master.exe
measurement mean ± σ min … max outliers delta
wall_time 16.0s ± 30.8ms 15.9s … 16.1s 1 (10%) 0%
peak_rss 42.8MB ± 8.24KB 42.8MB … 42.8MB 0 ( 0%) 0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
measurement mean ± σ min … max outliers delta
wall_time 16.2s ± 37.6ms 16.1s … 16.3s 0 ( 0%) 💩+ 1.3% ± 0.2%
peak_rss 42.8MB ± 5.18KB 42.8MB … 42.8MB 0 ( 0%) + 0.1% ± 0.0%
And on Linux:
Benchmark 1: ./test-master
Time (mean ± σ): 16.091 s ± 0.088 s [User: 15.856 s, System: 0.453 s]
Range (min … max): 15.870 s … 16.166 s 10 runs
Benchmark 2: ./test-optim-treap
Time (mean ± σ): 16.028 s ± 0.325 s [User: 15.755 s, System: 0.492 s]
Range (min … max): 15.735 s … 16.709 s 10 runs
Summary
'./test-optim-treap' ran
1.00 ± 0.02 times faster than './test-master'
The SVG looks way better than the pixelated PNG and will adapt best to
whatever screen it is being displayed on. The PNG continues to be used
because Apple Safari does not support SVG favicons yet. All other major
browsers do. See https://caniuse.com/link-icon-svg.
This is a companion PR to ziglang/www.ziglang.org#310.
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.
This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for #16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.
As part of these changes, this commit also implements #17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.
This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!
In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.
Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.
Resolves: #16512Resolves: #17194
previously, in the container view (the type of view that you see when
you look at `std` for example), when listing types and namespaces, we
would only show doc comments places on the direct child decl, which in
the case of the `std` namespace, for example, it's just a bunch of
re-exports.
now, if we don't find a direct doc comment, we chase indirection and
display doc comments placed directly on the definition, if any.
this is the precise priority order:
```
/// 1
pub const Foo = _Foo;
/// 2
const _Foo = struct {
//! 3
};
```
The numbers show the priority order for autodoc.
release/17.x branch, commit 8f4dd44097c9ae25dd203d5ac87f3b48f854bba8
This adds the flag `-D_LIBCPP_PSTL_CPU_BACKEND_SERIAL`. A future
enhancement could possibly pass something different if there is a
compelling parallel implementation. That libdispatch one might be worth
looking into.
* some manual fixes to generated CPU features code. in the future it
would be nice to make the script do those automatically. I suspect
the sm_90a thing is a bug in LLVM.
* add liteos to various target OS switches. I know nothing about this
OS; someone will need to work specifically on support for this OS
when the time comes to support it properly in zig.
* while waiting for the compiler, I went ahead and made more
conservative choices about when to use `inline` in std/Target.zig
Currently, the compiler (like @typeName) writes it `fn(...) Type` but
zig fmt writes it `fn (...) Type` (notice the space after `fn`).
This inconsistency is now resolved and function types are consistently
written the zig fmt way. Before this there were more `fn (...) Type`
occurrences than `fn(...) Type` already.
The _environ variable that is populated when linking libc on Windows does not support Unicode keys/values (or, at least, the encoding is not necessarily UTF-8). So, for Unicode support, _wenviron would need to be used instead. However, this means that the keys/values would be encoded as UTF-16, so they would need to be converted to UTF-8 before being returned by `os.getenv`. This would require allocation which is not part of the `os.getenv` API, so `os.getenv` is not implementable on Windows even when linking libc.
Closes https://github.com/ziglang/zig/issues/8456
this will make s3 re-enable compression for the stdlib's autodoc
and improve loading times (and data usage) for users
alongside this commit the deploy script for the official website is also
being updated
The include directories used when preprocessing .rc files are now separate from the target, and by default will use the system MSVC include paths if the MSVC + Windows SDK are present, otherwise it will fall back to the MinGW includes distributed with Zig. This default behavior can be overridden by the `-rcincludes` option (possible values: any (the default), msvc, gnu, or none).
This behavior is useful because Windows resource files may `#include` files that only exist with in the MSVC include dirs (e.g. in `<MSVC install directory>/atlmfc/include` which can contain other .rc files, images, icons, cursors, etc). So, by defaulting to the `any` behavior (MSVC if present, MinGW fallback), users will by default get behavior that is most-likely-to-work.
It also should be okay that the include directories used when compiling .rc files differ from the include directories used when compiling the main binary, since the .res format is not dependent on anything ABI-related. The only relevant differences would be things like `#define` constants being different values in the MinGW headers vs the MSVC headers, but any such differences would likely be a MinGW bug.
I have updated Felix's ZigAndroidTemplate to work with the latest
version of Zig and we are exploring adding Android support to Mach engine.
`std.c.getcontext` is _referenced_ but not _used_, and Android's bionic libc
does not implement `getcontext`. `std.os.linux.getcontext` also cannot be
used with bionic libc, so it seems prudent to just disable this extern for now.
This may not be the perfect long-term fix, but I have a golden rebuttal to that:
before I was unable to compile Zig applications for Android, and now I can.
<img width="828" alt="image" src="https://github.com/hexops/mach/assets/3173176/1e29142b-0419-4459-9c8b-75d92f87f822">
Signed-off-by: Stephen Gutekanst <stephen@hexops.com>