Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.
This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688
In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.
To address this, there are three different changes happening here:
1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.
The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.
In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.
Benchmark 1: test-master.bat
Time (mean ± σ): 481.263 s ± 5.440 s [User: 479.159 s, System: 1.937 s]
Range (min … max): 477.416 s … 485.109 s 2 runs
Benchmark 2: test-optim-treap.bat
Time (mean ± σ): 19.639 s ± 0.037 s [User: 18.183 s, System: 1.452 s]
Range (min … max): 19.613 s … 19.665 s 2 runs
Summary
'test-optim-treap.bat' ran
24.51 ± 0.28 times faster than 'test-master.bat'
Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.
These changes may or mat not introduce a slight performance regression in the average case:
Here's the standard library tests on Windows in Debug mode:
Benchmark 1 (10 runs): std-tests-master.exe
measurement mean ± σ min … max outliers delta
wall_time 16.0s ± 30.8ms 15.9s … 16.1s 1 (10%) 0%
peak_rss 42.8MB ± 8.24KB 42.8MB … 42.8MB 0 ( 0%) 0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
measurement mean ± σ min … max outliers delta
wall_time 16.2s ± 37.6ms 16.1s … 16.3s 0 ( 0%) 💩+ 1.3% ± 0.2%
peak_rss 42.8MB ± 5.18KB 42.8MB … 42.8MB 0 ( 0%) + 0.1% ± 0.0%
And on Linux:
Benchmark 1: ./test-master
Time (mean ± σ): 16.091 s ± 0.088 s [User: 15.856 s, System: 0.453 s]
Range (min … max): 15.870 s … 16.166 s 10 runs
Benchmark 2: ./test-optim-treap
Time (mean ± σ): 16.028 s ± 0.325 s [User: 15.755 s, System: 0.492 s]
Range (min … max): 15.735 s … 16.709 s 10 runs
Summary
'./test-optim-treap' ran
1.00 ± 0.02 times faster than './test-master'
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.
This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for #16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.
As part of these changes, this commit also implements #17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.
This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!
In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.
Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.
Resolves: #16512Resolves: #17194
* some manual fixes to generated CPU features code. in the future it
would be nice to make the script do those automatically. I suspect
the sm_90a thing is a bug in LLVM.
* add liteos to various target OS switches. I know nothing about this
OS; someone will need to work specifically on support for this OS
when the time comes to support it properly in zig.
* while waiting for the compiler, I went ahead and made more
conservative choices about when to use `inline` in std/Target.zig
Currently, the compiler (like @typeName) writes it `fn(...) Type` but
zig fmt writes it `fn (...) Type` (notice the space after `fn`).
This inconsistency is now resolved and function types are consistently
written the zig fmt way. Before this there were more `fn (...) Type`
occurrences than `fn(...) Type` already.
The _environ variable that is populated when linking libc on Windows does not support Unicode keys/values (or, at least, the encoding is not necessarily UTF-8). So, for Unicode support, _wenviron would need to be used instead. However, this means that the keys/values would be encoded as UTF-16, so they would need to be converted to UTF-8 before being returned by `os.getenv`. This would require allocation which is not part of the `os.getenv` API, so `os.getenv` is not implementable on Windows even when linking libc.
Closes https://github.com/ziglang/zig/issues/8456
The include directories used when preprocessing .rc files are now separate from the target, and by default will use the system MSVC include paths if the MSVC + Windows SDK are present, otherwise it will fall back to the MinGW includes distributed with Zig. This default behavior can be overridden by the `-rcincludes` option (possible values: any (the default), msvc, gnu, or none).
This behavior is useful because Windows resource files may `#include` files that only exist with in the MSVC include dirs (e.g. in `<MSVC install directory>/atlmfc/include` which can contain other .rc files, images, icons, cursors, etc). So, by defaulting to the `any` behavior (MSVC if present, MinGW fallback), users will by default get behavior that is most-likely-to-work.
It also should be okay that the include directories used when compiling .rc files differ from the include directories used when compiling the main binary, since the .res format is not dependent on anything ABI-related. The only relevant differences would be things like `#define` constants being different values in the MinGW headers vs the MSVC headers, but any such differences would likely be a MinGW bug.
I have updated Felix's ZigAndroidTemplate to work with the latest
version of Zig and we are exploring adding Android support to Mach engine.
`std.c.getcontext` is _referenced_ but not _used_, and Android's bionic libc
does not implement `getcontext`. `std.os.linux.getcontext` also cannot be
used with bionic libc, so it seems prudent to just disable this extern for now.
This may not be the perfect long-term fix, but I have a golden rebuttal to that:
before I was unable to compile Zig applications for Android, and now I can.
<img width="828" alt="image" src="https://github.com/hexops/mach/assets/3173176/1e29142b-0419-4459-9c8b-75d92f87f822">
Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
The new `@depedencies` module contains generated code like the
following (where strings like "abc123" represent hashes):
```zig
pub const root_deps = [_]struct { []const u8, []const u8 }{
.{ "foo", "abc123" },
};
pub const packages = struct {
pub const abc123 = struct {
pub const build_root = "/home/mlugg/.cache/zig/blah/abc123";
pub const build_zig = @import("abc123");
pub const deps = [_]struct { []const u8, []const u8 }{
.{ "bar", "abc123" },
.{ "name", "ghi789" },
};
};
};
```
Each package contains a build root string, the build.zig import, and a
mapping from dependency names to package hashes. There is also such a
mapping for the root package dependencies.
In theory, we could now remove the `dep_prefix` field from `std.Build`,
since its main purpose is now handled differently. I believe this is a
desirable goal, as it doesn't really make sense to assign a single FQN
to any package (because it may appear in many different places in the
package hierarchy). This commit does not remove that field, as it's used
non-trivially in a few places in the build runner and compiler tests:
this will be a future enhancement.
Resolves: #16354Resolves: #17135
This change implements the following syntax into the compiler:
```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```
A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.
A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.
Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors
A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.
There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.
A follow-up commit will begin utilizing this syntax in the Zig compiler.
Resolves: #498
Insn.st() can be used with dynamic size just like Insn.stx(), which is
relevant in a code generation context.
using ImmOrReg caused an error as its fields were ordered differently than
Source.
This will allow users to construct e.g. a ComptimeStringMap that uses case-insensitive ASCII comparison.
Note: the previous ComptimeStringMap API is unchanged (i.e. this does not break any existing code).