26152 Commits

Author SHA1 Message Date
Jacob Young
644d943861 x86_64: implement 128-bit integer comparisons 2023-10-04 16:26:56 -04:00
Jacob Young
2a5335d7b6 x86_64: implement C abi for 128-bit integers 2023-10-04 14:42:35 -04:00
Jacob Young
9748096992 x86_64: fix various crashes 2023-10-04 14:42:35 -04:00
Jacob Young
08c24935b1 behavior: reenable passing x86_64 tests 2023-10-04 14:42:35 -04:00
Jakub Konka
8b4e3b6aee comp: add support for -fdata-sections 2023-10-04 11:21:56 -07:00
Andrew Kelley
a306bfcd8e
Merge pull request #17344 from ziglang/type-erased-reader
std: add type-erased reader; base GenericReader on it
2023-10-04 11:21:19 -07:00
Krzysztof Wolicki
4d1e8ef1eb
autodoc: Add hidden class to [src] link when starting renderApi to only show it for functions (#17322) 2023-10-04 18:42:49 +02:00
Krzysztof Wolicki
9c3165b81b
autodoc: Add handling for array_cat, array_mul, struct_init_empty_result. (#17367)
Fix issue with getting docs for src-less types in main.js
2023-10-04 18:25:00 +02:00
nikneym
9f0d2f9417 linux: add fanotify API 2023-10-04 13:48:22 +03:00
Andrew Kelley
398db54434
Merge pull request #17276 from ziglang/anon-decls
compiler: start handling anonymous decls differently
2023-10-04 03:36:18 -07:00
Ryan Liptak
ec0f76c599 GeneralPurposeAllocator.searchBucket: check current bucket before searching the list
Follow up to #17383. This is a minor optimization that only matters when a small allocation is resized/free'd soon after it is allocated.

The only real difference I was able to observe with this was via a synthetic benchmark that allocates a full bucket and then frees all but one of the slots, over and over in a loop:

Debug build:

Benchmark 1 (9 runs): gpa-degen-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           575ms ± 5.19ms     569ms …  583ms          0 ( 0%)        0%
  peak_rss           43.8MB ± 1.37KB    43.8MB … 43.8MB          1 (11%)        0%
Benchmark 2 (10 runs): gpa-degen-search-cur.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           532ms ± 5.55ms     520ms …  539ms          0 ( 0%)        -  7.5% ±  0.9%
  peak_rss           43.8MB ± 65.2KB    43.8MB … 44.0MB          1 (10%)          +  0.0% ±  0.1%

ReleaseFast build:

Benchmark 1 (129 runs): gpa-degen-master-release.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          38.9ms ± 1.12ms    36.7ms … 42.4ms          8 ( 6%)        0%
  peak_rss           23.2MB ± 2.39KB    23.2MB … 23.2MB          0 ( 0%)        0%
Benchmark 2 (151 runs): gpa-degen-search-cur-release.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.2ms ±  999us    31.9ms … 36.3ms         20 (13%)        - 14.7% ±  0.6%
  peak_rss           23.2MB ± 2.26KB    23.2MB … 23.2MB          0 ( 0%)          +  0.0% ±  0.0%
2023-10-04 02:55:54 -07:00
Kai Jellinghaus
11489bb04f
Update IORING_OP to reflect upstream (#17388)
Reference [upstream io_uring.h](cbf3a2cb15/include/uapi/linux/io_uring.h (L234))
2023-10-04 09:18:14 +00:00
Andrew Kelley
9a09651019 update zig1.wasm
Notably, this contains bug fixes related to `@errorCast` which are
required by the changes to `std.io.Reader` in this branch, and the
compiler source code has a dependency on `std.io.Reader`.
2023-10-03 14:58:13 -07:00
Andrew Kelley
8ebebbd134 std.macho: remove alignment from LoadCommandIterator 2023-10-03 14:55:17 -07:00
Andrew Kelley
d8540dd708 std: add type-erased reader; base GenericReader on it
The idea here is to avoid code bloat by having only one actual io.Reader
implementation, which is type erased, and then implement a GenericReader
that preserves type information on top of that as thin glue code.

The strategy here is for that glue code to `@errSetCast` the result of
the type-erased reader functions, however, while trying to do that I
ran into #17343.
2023-10-03 14:55:17 -07:00
Jacob G-W
d634e02d33 plan9: implement linking anon decls 2023-10-03 12:49:45 -07:00
Luuk de Gram
de78caf9c4 wasm: implement lowering anon decls 2023-10-03 12:49:29 -07:00
Jakub Konka
cbdf4858e8 elf: assert llvm_object is null in getAnonDeclVAddr 2023-10-03 12:49:24 -07:00
Jakub Konka
d5b9d18bf0 coff: implement lowering anon decls 2023-10-03 12:49:21 -07:00
Jakub Konka
57eefe0533 macho: implement lowering anon decls 2023-10-03 12:49:17 -07:00
Jakub Konka
0134e5d2a1 codegen: separate getAnonDeclVAddr into lowerAnonDecl and the former
Implement the stub for Elf.

I believe that separating the concerns, namely, having an interface
function that is responsible for signalling the linker to lower
the anon decl only, and a separate function to obtain the decl's
vaddr is preferable since it allows us to handle codegen errors
in a simpler way.
2023-10-03 12:49:12 -07:00
Andrew Kelley
0483b4a512 link: stub out getAnonDeclVAddr 2023-10-03 12:12:51 -07:00
Andrew Kelley
c4b0b7a30b C backend: render anon decls
Introduce the new mechanism needed to render anonymous decls to C code
that the frontend is now using.

The current strategy is to collect the set of used anonymous decls into
one ArrayHashMap for the entire compilation, and then render them during
flush().

In the future this may need to be adjusted for incremental compilation
purposes, so that removing a Decl from decl_table means that newly
unused anonymous decls are no longer rendered. However, let's do one
thing at a time. The only goal of this branch is to stop using
Module.Decl objects for unnamed constants.
2023-10-03 12:12:51 -07:00
Andrew Kelley
9d069d98e3 C backend: start handling anonymous decls
Start keeping track of dependencies on anon decls for dependency
ordering during flush()

Currently this causes use of undefined symbols because these
dependencies need to get rendered into the output.
2023-10-03 12:12:51 -07:00
Andrew Kelley
c0b5512544 compiler: start handling anonymous decls differently
Instead of explicitly creating a `Module.Decl` object for each anonymous
declaration, each `InternPool.Index` value is implicitly understood to
be an anonymous declaration when encountered by backend codegen.

The memory management strategy for these anonymous decls then becomes to
garbage collect them along with standard InternPool garbage.

In the interest of a smooth transition, this commit only implements this
new scheme for string literals and leaves all the previous mechanisms in
place.
2023-10-03 12:12:50 -07:00
Krzysztof Wolicki
4df7f7c86a Add -fno-stack-protector to flags when building libzigcpp
This allows both debug and release builds to link to it
without forcing release builds to link to libssp
2023-10-03 11:21:21 -07:00
Andrew Kelley
87d09edf2d
Merge pull request #17352 from kcbanner/extern_union_comptime_memory
sema: Support reinterpreting extern/packed unions at comptime via field access
2023-10-03 11:20:08 -07:00
ocrap7
c933a7c58c llvm: remove extra copy of wrapped payloads 2023-10-03 11:16:11 -07:00
Andrew Kelley
5c3393dc2e
Merge pull request #17375 from xxxbxxx/packed-struct
codegen: fix field offsets in packed structs
2023-10-03 11:13:40 -07:00
Andrew Kelley
7733894761
Merge pull request #17341 from rzezeski/illumos-updates
Illumos/Solaris updates
2023-10-03 11:04:41 -07:00
Andrew Kelley
47f08605bd
Merge pull request #17383 from squeek502/gpa-optim-treap
GeneralPurposeAllocator: Considerably improve worst case performance
2023-10-03 10:58:48 -07:00
Ian Johnson
6734d2117e Add behavior test for empty tuple type
Closes #16412
2023-10-03 16:01:08 +03:00
Andrew Kelley
df4853a627
Merge pull request #17363 from ziglang/tar-symlinks
introduce the `zig fetch` subcommand and symlink support in zig packages
2023-10-03 03:33:26 -07:00
Frank Denis
4930094e62 valgrind.memcheck: fix makeMem*()
The `makeMem*()` functions crashed under valgrind in Debug and
ReleaseSafe modes.

The reason being that `doMemCheckClientRequestExpr()` returns `0`
when not running under Valgrind, and `maxInt(usize)` when running
under Valgrind.

Thus, `@as(i1, @intCast(maxInt(usize)))` always fails and these
functions crashed before returning.

That being said, what these functions used to return was quite
unexpected: `0` on error and `-1` on success (=running under valgrind).
That doesn't match any Zig nor C conventions.

But that return value doesn't seem to be very useful. Either we are
running under Valgrind or we are not. There's no point in checking this
for every single call. Applications are likely to always discard it.

So, just return a `void` instead.

Also avoid function comments that start with `Similarly, ...` because
that doesn't refer to anything in the context of autodoc or in IDEs.
2023-10-03 02:51:01 -07:00
Ryan Liptak
95f4c1532a Treap: do not set key to undefined in remove to allow re-use of removed nodes 2023-10-03 01:21:51 -07:00
Ryan Liptak
cf3572a66b GeneralPurposeAllocator: Considerably improve worst case performance
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.

This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688

In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.

To address this, there are three different changes happening here:

1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.

The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.

In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.

Benchmark 1: test-master.bat
  Time (mean ± σ):     481.263 s ±  5.440 s    [User: 479.159 s, System: 1.937 s]
  Range (min … max):   477.416 s … 485.109 s    2 runs

Benchmark 2: test-optim-treap.bat
  Time (mean ± σ):     19.639 s ±  0.037 s    [User: 18.183 s, System: 1.452 s]
  Range (min … max):   19.613 s … 19.665 s    2 runs

Summary
  'test-optim-treap.bat' ran
   24.51 ± 0.28 times faster than 'test-master.bat'

Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.

These changes may or mat not introduce a slight performance regression in the average case:

Here's the standard library tests on Windows in Debug mode:

Benchmark 1 (10 runs): std-tests-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.0s  ± 30.8ms    15.9s  … 16.1s           1 (10%)        0%
  peak_rss           42.8MB ± 8.24KB    42.8MB … 42.8MB          0 ( 0%)        0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.2s  ± 37.6ms    16.1s  … 16.3s           0 ( 0%)        💩+  1.3% ±  0.2%
  peak_rss           42.8MB ± 5.18KB    42.8MB … 42.8MB          0 ( 0%)          +  0.1% ±  0.0%

And on Linux:

Benchmark 1: ./test-master
  Time (mean ± σ):     16.091 s ±  0.088 s    [User: 15.856 s, System: 0.453 s]
  Range (min … max):   15.870 s … 16.166 s    10 runs
 
Benchmark 2: ./test-optim-treap
  Time (mean ± σ):     16.028 s ±  0.325 s    [User: 15.755 s, System: 0.492 s]
  Range (min … max):   15.735 s … 16.709 s    10 runs
 
Summary
  './test-optim-treap' ran
    1.00 ± 0.02 times faster than './test-master'
2023-10-03 01:21:51 -07:00
Veikka Tuominen
0bdbd3e235 Sema: fix issues in @errorCast with error unions 2023-10-03 00:45:48 -07:00
xdBronch
c9c3ee704c correctly detect apple a15 and a16 chips 2023-10-03 00:36:59 -07:00
Xavier Bouchoux
405705cb76 codegen: fix byte-aligned field offsets in unaligned nested packed structs 2023-10-03 05:34:19 +00:00
Xavier Bouchoux
62d178e91a codegen: fix field offsets in packed structs
* add nested packed struct/union behavior tests
 * use ptr_info.packed_offset rather than trying to duplicate the logic from Sema.structFieldPtrByIndex()
 * use the container_ptr_info.packed_offset to account for non-aligned nested structs.
 * dedup type.packedStructFieldBitOffset() and module.structPackedFieldBitOffset()
2023-10-03 06:39:20 +02:00
Ryan Liptak
da7ecfb2de Treap: Add InorderIterator 2023-10-02 21:11:14 -07:00
Ian Johnson
573a13f8be Support symlinks for git+http(s) dependencies 2023-10-02 18:14:57 -07:00
kcbanner
1b8a50ea5e union: skip failing tests on ppc 2023-10-02 20:39:02 -04:00
Andrew Kelley
21181181bf zig fetch: enhanced error reporting
* Package: use std.tar diagnostics to give detailed error messages
* std.tar: add diagnostic for unsupported file type
2023-10-02 17:02:25 -07:00
Andrew Kelley
ef9966c985 introduce the 'zig fetch' command + symlink support
zig fetch [options] <url>
zig fetch [options] <path>

Fetches a package which is found at <url> or <path> into the global
cache directory, printing the package hash to stdout.

Closes #16972
Related to #14280

Additionally, this commit:

* Adds uncompressed .tar support to package fetching
* Introduces symlink support to package fetching
2023-10-02 17:02:25 -07:00
Andrew Kelley
309c53295f std.fs: give readLink an explicit error set 2023-10-02 17:02:24 -07:00
Andrew Kelley
a4352982b3 compiler: extract package hashing logic to separate file
There are no functional changes in this commit.
2023-10-02 17:02:24 -07:00
Andrew Kelley
a5144d19b7 std.tar: support symlinks
closes #16678
2023-10-02 17:02:24 -07:00
Carl Åstholm
412d863ba5 std.Build: expose -idirafter to the build system 2023-10-02 16:22:07 -07:00
Ryan Zezeski
dd026588d0 illumos: fix dynamic linker path 2023-10-02 16:37:37 -06:00