Follow up to #17383. This is a minor optimization that only matters when a small allocation is resized/free'd soon after it is allocated.
The only real difference I was able to observe with this was via a synthetic benchmark that allocates a full bucket and then frees all but one of the slots, over and over in a loop:
Debug build:
Benchmark 1 (9 runs): gpa-degen-master.exe
measurement mean ± σ min … max outliers delta
wall_time 575ms ± 5.19ms 569ms … 583ms 0 ( 0%) 0%
peak_rss 43.8MB ± 1.37KB 43.8MB … 43.8MB 1 (11%) 0%
Benchmark 2 (10 runs): gpa-degen-search-cur.exe
measurement mean ± σ min … max outliers delta
wall_time 532ms ± 5.55ms 520ms … 539ms 0 ( 0%) ⚡- 7.5% ± 0.9%
peak_rss 43.8MB ± 65.2KB 43.8MB … 44.0MB 1 (10%) + 0.0% ± 0.1%
ReleaseFast build:
Benchmark 1 (129 runs): gpa-degen-master-release.exe
measurement mean ± σ min … max outliers delta
wall_time 38.9ms ± 1.12ms 36.7ms … 42.4ms 8 ( 6%) 0%
peak_rss 23.2MB ± 2.39KB 23.2MB … 23.2MB 0 ( 0%) 0%
Benchmark 2 (151 runs): gpa-degen-search-cur-release.exe
measurement mean ± σ min … max outliers delta
wall_time 33.2ms ± 999us 31.9ms … 36.3ms 20 (13%) ⚡- 14.7% ± 0.6%
peak_rss 23.2MB ± 2.26KB 23.2MB … 23.2MB 0 ( 0%) + 0.0% ± 0.0%
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.
This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688
In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.
To address this, there are three different changes happening here:
1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.
The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.
In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.
Benchmark 1: test-master.bat
Time (mean ± σ): 481.263 s ± 5.440 s [User: 479.159 s, System: 1.937 s]
Range (min … max): 477.416 s … 485.109 s 2 runs
Benchmark 2: test-optim-treap.bat
Time (mean ± σ): 19.639 s ± 0.037 s [User: 18.183 s, System: 1.452 s]
Range (min … max): 19.613 s … 19.665 s 2 runs
Summary
'test-optim-treap.bat' ran
24.51 ± 0.28 times faster than 'test-master.bat'
Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.
These changes may or mat not introduce a slight performance regression in the average case:
Here's the standard library tests on Windows in Debug mode:
Benchmark 1 (10 runs): std-tests-master.exe
measurement mean ± σ min … max outliers delta
wall_time 16.0s ± 30.8ms 15.9s … 16.1s 1 (10%) 0%
peak_rss 42.8MB ± 8.24KB 42.8MB … 42.8MB 0 ( 0%) 0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
measurement mean ± σ min … max outliers delta
wall_time 16.2s ± 37.6ms 16.1s … 16.3s 0 ( 0%) 💩+ 1.3% ± 0.2%
peak_rss 42.8MB ± 5.18KB 42.8MB … 42.8MB 0 ( 0%) + 0.1% ± 0.0%
And on Linux:
Benchmark 1: ./test-master
Time (mean ± σ): 16.091 s ± 0.088 s [User: 15.856 s, System: 0.453 s]
Range (min … max): 15.870 s … 16.166 s 10 runs
Benchmark 2: ./test-optim-treap
Time (mean ± σ): 16.028 s ± 0.325 s [User: 15.755 s, System: 0.492 s]
Range (min … max): 15.735 s … 16.709 s 10 runs
Summary
'./test-optim-treap' ran
1.00 ± 0.02 times faster than './test-master'
Now that allocator.resize() is allowed to fail, programs may wish to
test code paths that handle resize() failure. The simplest way to do
this now is to replace the vtable of the testing allocator with one
that uses Allocator.noResize for the 'resize' function pointer.
An alternative way to support this testing capability is to augment the
FailingAllocator (which is already useful for testing allocation failure
scenarios) to intentionally fail on calls to resize(). To do this, add a
'resize_fail_index' parameter to the FailingAllocator that causes
resize() to fail after the given number of calls.
Created from a conversation with @andrewrk on irc: Memory leaks when using ArrayList can be inconvenient to debug when the stack frame size is 4 because the entirety of the printed frame is within zig stdlib, and not in the users calling stack. Increasing this to 6 for Debug builds, gives 2 frames of user code. I increased the frame size for tests as well by the equivalent factor, but I'm unconvinced that's actually desirable.
Implements issue #6451.
This was needed to support allocation on Plan 9 and now other operating
systems like DOS can also use it.
It is a modified version of the WasmAllocator since wasm also uses a
sbrk-esque allocation system.
This commit also adds the necessary system bits for sbrk to work on plan 9.
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
Anecdote 1: The generic version is way more popular than the non-generic
one in Zig codebase:
git grep -w alignForward | wc -l
56
git grep -w alignForwardGeneric | wc -l
149
git grep -w alignBackward | wc -l
6
git grep -w alignBackwardGeneric | wc -l
15
Anecdote 2: In my project (turbonss) that does much arithmetic and
alignment I exclusively use the Generic functions.
Anecdote 3: we used only the Generic versions in the Macho Man's linker
workshop.
1. When the arena is already empty, resetting with `retain_capacity` no longer
results in allocating a buffer with zero capacity.
This behavior was previously intended by the `(current_capacity == 0)` check,
but wasn't correctly implemented.
2. Resetting with `.{ .retain_with_limit = 0 }` is now equivalent to
`free_all` and a new buffer with zero capacity is no longer created.
This is a useful side-effect of the above fixes.
Previously, when the last buffer in `buffer_list` was retained after
deleting all other buffers, `buffer_list` wasn't updated and pointed
to a deleted buffer.
Previously, the buffer reserved with `retain_with_limit` was missing
space for the `BufNode`.
When the user-provided a limit that was smaller than `@sizeOf(BufNode)`,
`reset` would store a new `BufNode` in an allocation smaller than
`BufNode`, leading to a buffer overrun.
The majority of these are in comments, some in doc comments which might
affect the generated documentation, and a few in parameter names -
nothing that should be breaking, however.
Now they use slices or array pointers with any element type instead of
requiring byte pointers.
This is a breaking enhancement to the language.
The safety check for overlapping pointers will be implemented in a
future commit.
closes#14040
* GPA: Catch invalid frees
Fix#14791: Catch cases where an invalid slice is passed to free().
This was silently ignored before but now logs an error. This change
uses a AutoHashMap to keep track of the sizes which seems to be an
overkill but seems like the easiest way to catch these errors.
* GPA: Add wrong alignment checks to free/resize
Implement @Inkryption's suggestion to catch free/resize with the wrong
alignment. I also changed the naming to match large allocations.
This reverts commit abc9530a88d24350481d9264edcde300f293929a.
This patch implies that the idiomatic Zig way of handling anytype
parameter is to write a bunch of boilerplate instead of directly
accessing type information and relying on the compiler to be useful.
I don't want it to be this way.
It is the compiler's job to make useful error messages when the wrong
field of a type info result is accessed, and it is the zig programmer's
job to understand what it means when a compile error points at the field
access of `@typeInfo` (along with the relevant callsites).
One thing that might be useful would be having the compiler be aware of
module boundaries and highlighting the boundaries of them. The first
reference note after crossing a module boundary is likely the most
interesting one.
Fixes a regression introduced in
e35f297aeb993ec956ae80379ddf7f86069e109b.
Now there is test coverage for ArrayList.shrinkAndFree in the case when
resizing fails.
Now it can refuse to resize when it would disturb the metadata tracking
strategy, resulting in smaller code size, a simpler implementation, and
less fragmentation.
The previous version had a fatal flaw: it did ensureCapacity(1) on the
freelist when allocating, but I neglected to consider that you could
free() twice in a row. Silly!
This strategy allocates an intrusive freelist node with every
allocation, big or small. It also does not have the problems with resize
because in this case we can push the upper areas of freed stuff into the
corresponding freelist.
* Export invalidFmtErr
To allow consistent use of "invalid format string" compile error
response for badly formatted format strings.
See https://github.com/ziglang/zig/pull/13489#issuecomment-1311759340.
* Replace format compile errors with invalidFmtErr
- Provides more consistent compile errors.
- Gives user info about the type of the badly formated value.
* Rename invalidFmtErr as invalidFmtError
For consistency. Zig seems to use “Error” more often than “Err”.
* std: add invalid format string checks to remaining custom formatters
* pass reference-trace to comp when building build file; fix checkobjectstep
Instead of making the memory alignment functions more complicated, I
added more API documentation for their existing semantics.
closes#12118closes#12135
Make `@returnAddress()` return for the BPF target, as the BPF target for
the time being does not support probing for the return address. Stack
traces for the general purpose allocator for the BPF target is also set
to not be captured.