25967 Commits

Author SHA1 Message Date
Ryan Liptak
cf3572a66b GeneralPurposeAllocator: Considerably improve worst case performance
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.

This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688

In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.

To address this, there are three different changes happening here:

1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.

The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.

In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.

Benchmark 1: test-master.bat
  Time (mean ± σ):     481.263 s ±  5.440 s    [User: 479.159 s, System: 1.937 s]
  Range (min … max):   477.416 s … 485.109 s    2 runs

Benchmark 2: test-optim-treap.bat
  Time (mean ± σ):     19.639 s ±  0.037 s    [User: 18.183 s, System: 1.452 s]
  Range (min … max):   19.613 s … 19.665 s    2 runs

Summary
  'test-optim-treap.bat' ran
   24.51 ± 0.28 times faster than 'test-master.bat'

Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.

These changes may or mat not introduce a slight performance regression in the average case:

Here's the standard library tests on Windows in Debug mode:

Benchmark 1 (10 runs): std-tests-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.0s  ± 30.8ms    15.9s  … 16.1s           1 (10%)        0%
  peak_rss           42.8MB ± 8.24KB    42.8MB … 42.8MB          0 ( 0%)        0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.2s  ± 37.6ms    16.1s  … 16.3s           0 ( 0%)        💩+  1.3% ±  0.2%
  peak_rss           42.8MB ± 5.18KB    42.8MB … 42.8MB          0 ( 0%)          +  0.1% ±  0.0%

And on Linux:

Benchmark 1: ./test-master
  Time (mean ± σ):     16.091 s ±  0.088 s    [User: 15.856 s, System: 0.453 s]
  Range (min … max):   15.870 s … 16.166 s    10 runs
 
Benchmark 2: ./test-optim-treap
  Time (mean ± σ):     16.028 s ±  0.325 s    [User: 15.755 s, System: 0.492 s]
  Range (min … max):   15.735 s … 16.709 s    10 runs
 
Summary
  './test-optim-treap' ran
    1.00 ± 0.02 times faster than './test-master'
2023-10-03 01:21:51 -07:00
Ryan Liptak
da7ecfb2de Treap: Add InorderIterator 2023-10-02 21:11:14 -07:00
Techatrix
2adb932ad6 translate-c: convert clang errors messages into std.zig.ErrorBundle 2023-09-25 18:10:44 +03:00
kcbanner
e7bf143b36 type: handle the 0-length array case in abiSizeAdvanced
This fixes a panic in `unionAbiSize` when a 0-length array of a union is used as a struct field.

Because `resolveTypeLayout` does not resolve the `elem_ty` if `arrayLenIncludingSentinel` returns
0 for the array, the child union type is not guaranteed to have a resolved layout at this point.

Fixed this case by just returning 0 here.
2023-09-25 05:24:55 -07:00
Garrett Beck
8fab4f98c4 Prevent hitting a clang assert when dealing with FullSourceLoc 2023-09-25 12:49:23 +03:00
Andrew Kelley
28ac9f8b70
Merge pull request #17253 from ziglang/MultiArrayList-0bit-struct
std.MultiArrayList: add test coverage for 0-bit structs
2023-09-25 02:33:32 -07:00
Jay Petacat
731fd217db Add embedded SVG favicon to reference doc templates
The SVG looks way better than the pixelated PNG and will adapt best to
whatever screen it is being displayed on. The PNG continues to be used
because Apple Safari does not support SVG favicons yet. All other major
browsers do. See https://caniuse.com/link-icon-svg.

This is a companion PR to ziglang/www.ziglang.org#310.
2023-09-25 12:24:06 +03:00
Jan Weidner
c8ba5839f7
std.http.Client: add note about resource management 2023-09-25 12:17:11 +03:00
Andrew Kelley
eb072fa528
Merge pull request #17256 from ziglang/packed-bit-offsets
compiler: packed structs cache bit offsets
2023-09-24 19:42:06 -07:00
Andrew Kelley
6bd54a1d3e
update zig1.wasm
Notable changes in this update:

127198e58cb3dcf2d2287124cf15a23a7d3a9c02 fixes building zig2 artifact on
macOS Sonoma 14.0 (more specifically the SDK 14.0 linker).

a8d2ed806558cc1472f3a532169a4994abe17833 fixed some alignment edge
cases which is needed to do the store_hash=false change in the compiler
source code.

df5f0517b33b5f7bc2a508cf6a0ee62246f02d21 preserves result type
information through the address-of operator.
2023-09-24 15:54:33 -07:00
Andrew Kelley
ac6f9eb2ca InternPool: store_hash=false for FieldMap
This is something I wanted to do a long time ago but was blocked
by #10618 which is now solved.
2023-09-24 15:49:56 -07:00
Andrew Kelley
cc69315f03 std.MultiArrayList: add test coverage for 0-bit structs
closes #10618
solved by #17172
2023-09-24 15:49:56 -07:00
Andrew Kelley
df5f0517b3
Merge pull request #17205 from mlugg/rls-ref
compiler: preserve result type information through address-of operator
2023-09-24 15:19:48 -07:00
Tomasz Lisowski
a9f25c7d64 Update LLVM version in README from 16.x to 17.x 2023-09-24 14:49:29 -07:00
Michael Dusan
127198e58c cbe: support more symbol attributes
implement codegen for:

- decl weak linkage
- decl aliases
- fn decl weak linkage

windows msvc:
- `__declspec(selectany)` is not supported for functions
- skip weak linkage for functions

closes #17050
2023-09-24 14:44:15 -07:00
Andrew Kelley
c08c0fc6ed revert "compiler: packed structs cache bit offsets"
This is mostly a revert of a7088fd9a3edb037f0f51bb402a3c557334634f3.
Measurement revealed the commit actually regressed performance.
2023-09-24 14:37:36 -07:00
Andrew Kelley
a7088fd9a3 compiler: packed structs cache bit offsets
Instead of linear search every time a packed struct field's bit or byte
offset is wanted, they are computed once during resolution of the packed
struct's backing int type, and stored in InternPool for O(1) lookup.

Closes #17178
2023-09-23 23:06:08 -07:00
mlugg
fb6fff2561 resinator: do not include in only_core_functionality builds
This prevents resinator from being included in zig1 and zig2.
2023-09-24 06:57:11 +01:00
antlilja
8eff0a0a66 Support non zig dependencies
Dependencies no longer require a build.zig file.

Adds path function to Dependency struct which
returns a LazyPath into a dependency.
2023-09-24 02:47:21 +01:00
Andrew Kelley
c9413a880b
Merge pull request #17244 from ziglang/elf-vm-mgmt
elf: misc improvements, plus let's actually link against a parsed archive!
2023-09-23 18:32:43 -07:00
Garrett Beck
8b78df403f Allow Step.TranslateC to not link libc 2023-09-23 17:41:11 -07:00
mlugg
9ff872c982
behavior: disable newly failing tests 2023-09-24 00:27:33 +01:00
Loris Cro
78ebf8f577 autodoc: give explicit width to logo
fix #17251
2023-09-24 01:17:06 +02:00
Krzysztof Wolicki
2aa0afd206
autodoc: Update icon for generated html source views (#17200) 2023-09-24 00:52:07 +02:00
mlugg
62d077cfa1
tests: give explicit stack size to module tests on WASI
I have observed the standard library tests overflowing the default WASI
stack as of the previous commit. As best as I can tell, this isn't
directly our fault: LLVM is just emitting less efficient code in debug
builds with the new codegen patterns.
2023-09-23 22:01:08 +01:00
mlugg
09a57583a4
compiler: preserve result type information through address-of operator
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for #16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements #17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: #16512
Resolves: #17194
2023-09-23 22:01:08 +01:00
mlugg
01906a3ad8
print_zir: speed up ZIR printing
Source location resolution previously made ZIR printing incredibly slow,
since it was O(N^2). Since we usually resolve source locations
approximately in order, it is much more efficient to resolve them using
a "cursor" which navigates the file.

This takes the time for `zig ast-check -t Sema.zig` down from many
minutes (enough that I got bored and killed the process; well over 10)
to a few seconds.
2023-09-23 22:01:08 +01:00
mlugg
fbe9fcd243
Autodoc: prevent infinite recursion when resolving parameter type
Note that this will also be necessary for `switch_block` and
`switch_block_ref` once those instructions are correctly implemented.
2023-09-23 22:01:08 +01:00
mlugg
5fa260ba06
InternPool: do not append sentinel value twice when initializing aggregate of u8 2023-09-23 22:01:08 +01:00
travisstaloch
759b0fe00a
std.testing: expectEqualDeep() - support self-referential structs 2023-09-23 20:25:57 +00:00
Andrew Kelley
80ae27bc84 resinator: fix 32-bit builds
This is unfortunately not caught by the CI because the resinator code is
not enabled unless `-Denable-llvm` is used.
2023-09-23 13:23:26 -07:00
Jakub Konka
5f4c9a7449 sema: fix mem leaks in inferred error set handling 2023-09-23 12:48:08 -07:00
Andrew Kelley
b00287175c
Merge pull request #17174 from Snektron/spirv-stuffies
spirv gaming
2023-09-23 12:37:48 -07:00
Robin Voetter
cff8ab88f5 spirv: fixes 2023-09-23 12:36:56 -07:00
Robin Voetter
572517376a spirv: air dbg_var_val and dbg_var_ptr 2023-09-23 12:36:56 -07:00
Robin Voetter
68c7fc5c59 spirv: fix blocks that return no value 2023-09-23 12:36:56 -07:00
Robin Voetter
63512192de spirv: fix source line numbers 2023-09-23 12:36:56 -07:00
Robin Voetter
075584a4d7 spirv: enable passing tests 2023-09-23 12:36:56 -07:00
Robin Voetter
d9a8c779d8 spirv: constant elem ptr 2023-09-23 12:36:56 -07:00
Robin Voetter
a75300c8d8 spirv: air slice 2023-09-23 12:36:56 -07:00
Robin Voetter
8895025688 spirv: air wrap_errunion_payload 2023-09-23 12:36:56 -07:00
Robin Voetter
4f215a6d28 spirv: air unwrap_errunion_payload 2023-09-23 12:36:56 -07:00
Robin Voetter
48ab11639a spirv: air is_err, is_non_err 2023-09-23 12:36:56 -07:00
Robin Voetter
b845c9d532 spirv: generate module initializer 2023-09-23 12:36:56 -07:00
Robin Voetter
5d844faf7c spirv: air array_elem_val using hack
SPIR-V doesn't support true element indexing, so we probably
need to switch over to isByRef like in llvm for this to work
properly. Currently a temporary is used, which at least
seems to work.
2023-09-23 12:36:56 -07:00
Robin Voetter
26c279cca2 spirv: air aggregate_init for array 2023-09-23 12:36:56 -07:00
Robin Voetter
8d49b2ef4e spirv: air array_to_slice 2023-09-23 12:36:56 -07:00
Robin Voetter
66b1f6c163 spirv: air sub_with_overflow 2023-09-23 12:36:56 -07:00
Robin Voetter
749307dbb2 spirv: air union_init 2023-09-23 12:36:56 -07:00
Robin Voetter
98046b4c3c spirv: air set_union_tag + improve load()/store() 2023-09-23 12:36:56 -07:00