Ryan Liptak cf3572a66b GeneralPurposeAllocator: Considerably improve worst case performance
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.

This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688

In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.

To address this, there are three different changes happening here:

1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.

The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.

In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.

Benchmark 1: test-master.bat
  Time (mean ± σ):     481.263 s ±  5.440 s    [User: 479.159 s, System: 1.937 s]
  Range (min … max):   477.416 s … 485.109 s    2 runs

Benchmark 2: test-optim-treap.bat
  Time (mean ± σ):     19.639 s ±  0.037 s    [User: 18.183 s, System: 1.452 s]
  Range (min … max):   19.613 s … 19.665 s    2 runs

Summary
  'test-optim-treap.bat' ran
   24.51 ± 0.28 times faster than 'test-master.bat'

Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.

These changes may or mat not introduce a slight performance regression in the average case:

Here's the standard library tests on Windows in Debug mode:

Benchmark 1 (10 runs): std-tests-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.0s  ± 30.8ms    15.9s  … 16.1s           1 (10%)        0%
  peak_rss           42.8MB ± 8.24KB    42.8MB … 42.8MB          0 ( 0%)        0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.2s  ± 37.6ms    16.1s  … 16.3s           0 ( 0%)        💩+  1.3% ±  0.2%
  peak_rss           42.8MB ± 5.18KB    42.8MB … 42.8MB          0 ( 0%)          +  0.1% ±  0.0%

And on Linux:

Benchmark 1: ./test-master
  Time (mean ± σ):     16.091 s ±  0.088 s    [User: 15.856 s, System: 0.453 s]
  Range (min … max):   15.870 s … 16.166 s    10 runs
 
Benchmark 2: ./test-optim-treap
  Time (mean ± σ):     16.028 s ±  0.325 s    [User: 15.755 s, System: 0.492 s]
  Range (min … max):   15.735 s … 16.709 s    10 runs
 
Summary
  './test-optim-treap' ran
    1.00 ± 0.02 times faster than './test-master'
2023-10-03 01:21:51 -07:00
2023-09-22 09:43:31 -07:00
2023-09-24 15:54:33 -07:00
2023-09-19 09:37:31 -07:00
2023-08-04 11:01:18 -07:00
2023-09-19 09:37:26 -07:00

ZIG

A general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

https://ziglang.org/

Documentation

If you are looking at this README file in a source tree, please refer to the Release Notes, Language Reference, or Standard Library Documentation corresponding to the version of Zig that you are using by following the appropriate link on the download page.

Otherwise, you're looking at a release of Zig, and you can find documentation here:

  • doc/langref.html
  • doc/std/index.html

Installation

A Zig installation is composed of two things:

  1. The Zig executable
  2. The lib/ directory

At runtime, the executable searches up the file system for the lib/ directory, relative to itself:

  • lib/
  • lib/zig/
  • ../lib/
  • ../lib/zig/
  • (and so on)

In other words, you can unpack a release of Zig anywhere, and then begin using it immediately. There is no need to install it globally, although this mechanism supports that use case too (i.e. /usr/bin/zig and /usr/lib/zig/).

Building from Source

Ensure you have the required dependencies:

  • CMake >= 3.5
  • System C/C++ Toolchain
  • LLVM, Clang, LLD development libraries == 17.x

Then it is the standard CMake build process:

mkdir build
cd build
cmake ..
make install

For more options, tips, and troubleshooting, please see the Building Zig From Source page on the wiki.

Contributing

Zig is Free and Open Source Software. We welcome bug reports and patches from everyone. However, keep in mind that Zig governance is BDFN (Benevolent Dictator For Now) which means that Andrew Kelley has final say on the design and implementation of everything.

One of the best ways you can contribute to Zig is to start using it for an open-source personal project.

This leads to discovering bugs and helps flesh out use cases, which lead to further design iterations of Zig. Importantly, each issue found this way comes with real world motivations, making it straightforward to explain the reasoning behind proposals and feature requests.

You will be taken much more seriously on the issue tracker if you have a personal project that uses Zig.

The issue label Contributor Friendly exists to help you find issues that are limited in scope and/or knowledge of Zig internals.

Please note that issues labeled Proposal but do not also have the Accepted label are still under consideration, and efforts to implement such a proposal have a high risk of being wasted. If you are interested in a proposal which is still under consideration, please express your interest in the issue tracker, providing extra insights and considerations that others have not yet expressed. The most highly regarded argument in such a discussion is a real world use case.

For more tips, please see the Contributing page on the wiki.

Community

The Zig community is decentralized. Anyone is free to start and maintain their own space for Zig users to gather. There is no concept of "official" or "unofficial". Each gathering place has its own moderators and rules. Users are encouraged to be aware of the social structures of the spaces they inhabit, and work purposefully to facilitate spaces that align with their values.

Please see the Community wiki page for a public listing of social spaces.

Description
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Readme MIT 698 MiB
Languages
Zig 98.3%
C 1.1%
C++ 0.2%
Python 0.1%