26106 Commits

Author SHA1 Message Date
Andrew Kelley
47f08605bd
Merge pull request #17383 from squeek502/gpa-optim-treap
GeneralPurposeAllocator: Considerably improve worst case performance
2023-10-03 10:58:48 -07:00
Ian Johnson
6734d2117e Add behavior test for empty tuple type
Closes #16412
2023-10-03 16:01:08 +03:00
Andrew Kelley
df4853a627
Merge pull request #17363 from ziglang/tar-symlinks
introduce the `zig fetch` subcommand and symlink support in zig packages
2023-10-03 03:33:26 -07:00
Frank Denis
4930094e62 valgrind.memcheck: fix makeMem*()
The `makeMem*()` functions crashed under valgrind in Debug and
ReleaseSafe modes.

The reason being that `doMemCheckClientRequestExpr()` returns `0`
when not running under Valgrind, and `maxInt(usize)` when running
under Valgrind.

Thus, `@as(i1, @intCast(maxInt(usize)))` always fails and these
functions crashed before returning.

That being said, what these functions used to return was quite
unexpected: `0` on error and `-1` on success (=running under valgrind).
That doesn't match any Zig nor C conventions.

But that return value doesn't seem to be very useful. Either we are
running under Valgrind or we are not. There's no point in checking this
for every single call. Applications are likely to always discard it.

So, just return a `void` instead.

Also avoid function comments that start with `Similarly, ...` because
that doesn't refer to anything in the context of autodoc or in IDEs.
2023-10-03 02:51:01 -07:00
Ryan Liptak
95f4c1532a Treap: do not set key to undefined in remove to allow re-use of removed nodes 2023-10-03 01:21:51 -07:00
Ryan Liptak
cf3572a66b GeneralPurposeAllocator: Considerably improve worst case performance
Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large. For example, if exactly `slot_count` allocations of a single size class were performed and then all of them were freed except one, then the bucket for those allocations would have to be kept around indefinitely. If that pattern of allocation were done over and over, then the bucket list for that size class could grow incredibly large.

This allocation pattern has been seen in the wild: https://github.com/Vexu/arocc/issues/508#issuecomment-1738275688

In that case, the length of the bucket list for the `128` size class would grow to tens of thousands of buckets and cause Debug runtime to balloon to ~8 minutes whereas with the c_allocator the Debug runtime would be ~3 seconds.

To address this, there are three different changes happening here:

1. std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n) [before, all new buckets would get added to the head of the list]. Note: Any data structure with O(log n) or better search/insert/delete would also work for this use-case.
2. If the 'current' bucket for a size class is full, the list of buckets is never traversed and instead a new bucket is allocated. Previously, traversing the bucket list could only find a non-full bucket in specific circumstances, and only because of a separate optimization that is no longer needed (before, after any resize/free, the affected bucket would be moved to the head of the bucket list to allow searchBucket to perform better on average). Now, the current_bucket for each size class only changes when either (1) the current bucket is emptied/freed, or (2) a new bucket is allocated (due to the current bucket being full or null). Because each bucket's alloc_cursor only moves forward (i.e. slots within a bucket are never re-used), we can therefore always know that any bucket besides the current_bucket will be full, so traversing the list in the hopes of finding an existing non-full bucket is entirely pointless.
3. Size + alignment information for small allocations has been moved into the Bucket data instead of keeping it in a separate HashMap. This offers an improvement over the HashMap since whenever we need to get/modify the length/alignment of an allocation it's extremely likely we will already have calculated any bucket-related information necessary to get the data.

The first change is the most relevant and accounts for most of the benefit here. Also note that the overall functionality of GeneralPurposeAllocator is unchanged.

In the degraded `arocc` case, these changes bring Debug performance from ~8 minutes to ~20 seconds.

Benchmark 1: test-master.bat
  Time (mean ± σ):     481.263 s ±  5.440 s    [User: 479.159 s, System: 1.937 s]
  Range (min … max):   477.416 s … 485.109 s    2 runs

Benchmark 2: test-optim-treap.bat
  Time (mean ± σ):     19.639 s ±  0.037 s    [User: 18.183 s, System: 1.452 s]
  Range (min … max):   19.613 s … 19.665 s    2 runs

Summary
  'test-optim-treap.bat' ran
   24.51 ± 0.28 times faster than 'test-master.bat'

Note: Much of the time taken on Windows in this particular case is related to gathering stack traces. With `.stack_trace_frames = 0` the runtime goes down to 6.7 seconds, which is a little more than 2.5x slower compared to when the c_allocator is used.

These changes may or mat not introduce a slight performance regression in the average case:

Here's the standard library tests on Windows in Debug mode:

Benchmark 1 (10 runs): std-tests-master.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.0s  ± 30.8ms    15.9s  … 16.1s           1 (10%)        0%
  peak_rss           42.8MB ± 8.24KB    42.8MB … 42.8MB          0 ( 0%)        0%
Benchmark 2 (10 runs): std-tests-optim-treap.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          16.2s  ± 37.6ms    16.1s  … 16.3s           0 ( 0%)        💩+  1.3% ±  0.2%
  peak_rss           42.8MB ± 5.18KB    42.8MB … 42.8MB          0 ( 0%)          +  0.1% ±  0.0%

And on Linux:

Benchmark 1: ./test-master
  Time (mean ± σ):     16.091 s ±  0.088 s    [User: 15.856 s, System: 0.453 s]
  Range (min … max):   15.870 s … 16.166 s    10 runs
 
Benchmark 2: ./test-optim-treap
  Time (mean ± σ):     16.028 s ±  0.325 s    [User: 15.755 s, System: 0.492 s]
  Range (min … max):   15.735 s … 16.709 s    10 runs
 
Summary
  './test-optim-treap' ran
    1.00 ± 0.02 times faster than './test-master'
2023-10-03 01:21:51 -07:00
Veikka Tuominen
0bdbd3e235 Sema: fix issues in @errorCast with error unions 2023-10-03 00:45:48 -07:00
xdBronch
c9c3ee704c correctly detect apple a15 and a16 chips 2023-10-03 00:36:59 -07:00
Ryan Liptak
da7ecfb2de Treap: Add InorderIterator 2023-10-02 21:11:14 -07:00
Ian Johnson
573a13f8be Support symlinks for git+http(s) dependencies 2023-10-02 18:14:57 -07:00
Andrew Kelley
21181181bf zig fetch: enhanced error reporting
* Package: use std.tar diagnostics to give detailed error messages
* std.tar: add diagnostic for unsupported file type
2023-10-02 17:02:25 -07:00
Andrew Kelley
ef9966c985 introduce the 'zig fetch' command + symlink support
zig fetch [options] <url>
zig fetch [options] <path>

Fetches a package which is found at <url> or <path> into the global
cache directory, printing the package hash to stdout.

Closes #16972
Related to #14280

Additionally, this commit:

* Adds uncompressed .tar support to package fetching
* Introduces symlink support to package fetching
2023-10-02 17:02:25 -07:00
Andrew Kelley
309c53295f std.fs: give readLink an explicit error set 2023-10-02 17:02:24 -07:00
Andrew Kelley
a4352982b3 compiler: extract package hashing logic to separate file
There are no functional changes in this commit.
2023-10-02 17:02:24 -07:00
Andrew Kelley
a5144d19b7 std.tar: support symlinks
closes #16678
2023-10-02 17:02:24 -07:00
Carl Åstholm
412d863ba5 std.Build: expose -idirafter to the build system 2023-10-02 16:22:07 -07:00
Andrew Kelley
53775b0999 CLI: fix -fno-clang
Aro/Clang detection logic treated `-fno-clang` the same as `-fclang`.
2023-10-01 21:37:02 -07:00
Veikka Tuominen
fc4d53e2ea
Merge pull request #17221 from Vexu/aro-translate-c
Aro translate-c
2023-10-02 07:08:53 +03:00
Jacob Young
0f1652dc60
Merge pull request #17262 from jacobly0/x86_64
x86_64: support operations that are implemented in compiler_rt
2023-10-01 20:45:42 -04:00
kcbanner
62a0fbdaef air_print: fix panic when printing .abs 2023-10-01 15:08:50 -07:00
Veikka Tuominen
5792570197 add Aro sources as a dependency
ref: 5688dbccfb58216468267a0f46b96bed7013715a
2023-10-01 23:51:54 +03:00
Veikka Tuominen
47050fbb7d aro translate-c: update to cast builtin changes 2023-10-01 23:51:54 +03:00
Veikka Tuominen
7ec729b3ae aro-translate-c: move shared types to a common namespace 2023-10-01 23:51:54 +03:00
Veikka Tuominen
31ecf75311 aro-translate-c: translate enums 2023-10-01 23:51:54 +03:00
Veikka Tuominen
fef94da958 add compiler flag for selecting C frontend 2023-10-01 23:51:54 +03:00
Jacob Young
da335f0ee4 x86_64: implement float @sqrt builtin 2023-10-01 15:09:52 -04:00
Jacob Young
fbe5bf469e x86_64: implement float arithmetic builtins 2023-10-01 15:09:52 -04:00
Jacob Young
1eb023908d x86_64: implement float round builtins 2023-10-01 15:09:52 -04:00
Jacob Young
c3042cbe12 x86_64: add missing caller preserved regs
All allocatable registers have to be either callee preserved or caller
preserved.
2023-10-01 15:09:52 -04:00
Jacob Young
8470652f10 x86_64: implement float compare and cast builtins 2023-10-01 15:09:52 -04:00
Jacob Young
6d5cbdb863 behavior: cleanup floatop tests 2023-10-01 15:09:52 -04:00
Jacob Young
3bd1b9e15f x86_64: implement and test unary float builtins 2023-10-01 15:09:52 -04:00
Jakub Konka
af40bce08a x86_64: emit R_X86_64_GOT32 for non-PIC GOT references 2023-10-01 21:09:35 +02:00
Andrew Kelley
8e1421f19e
Merge pull request #17346 from Vexu/errSetCast
Sema: implement `@errSetCast` for error unions
2023-10-01 12:00:17 -07:00
Veikka Tuominen
0b1ba6eb52
update zig1.wasm 2023-10-01 17:16:34 +03:00
Veikka Tuominen
63bd2bff12 Sema: add @errorCast which works for both error sets and error unions
Closes #17343
2023-10-01 17:00:01 +03:00
Jay Petacat
d8bfbbbf25 std.mem.zeroes: Zero out entire extern union, including padding
Fixes #17258
2023-10-01 02:39:05 -07:00
Andrew Kelley
376242e586
Merge pull request #17161 from tiehuis/vectorize-index-of-scalar
std.mem: add vectorized indexOfScalarPos and indexOfSentinel
2023-10-01 00:07:57 -07:00
Ian Johnson
9a001e1f7c Support fetching dependencies over git+http(s)
Closes #14298

This commit adds support for fetching dependencies over git+http(s)
using a minimal implementation of the Git protocols and formats relevant
to fetching repository data.

Git URLs can be specified in `build.zig.zon` as follows:

```zig
.xml = .{
    .url = "git+https://github.com/ianprime0509/zig-xml#7380d59d50f1cd8460fd748b5f6f179306679e2f",
    .hash = "122085c1e4045fa9cb69632ff771c56acdb6760f34ca5177e80f70b0b92cd80da3e9",
},
```

The fragment part of the URL may specify a commit ID (SHA1 hash), branch
name, or tag. It is an error to omit the fragment: if this happens, the
compiler will prompt the user to add it, using the commit ID of the HEAD
commit of the repository (that is, the latest commit of the default
branch):

```
Fetch Packages... xml... /var/home/ian/src/zig-gobject/build.zig.zon:6:20: error: url field is missing an explicit ref
            .url = "git+https://github.com/ianprime0509/zig-xml",
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: try .url = "git+https://github.com/ianprime0509/zig-xml#dfdc044f3271641c7d428dc8ec8cd46423d8b8b6",
```

This implementation currently supports only version 2 of Git's wire
protocol (documented in
[protocol-v2](https://git-scm.com/docs/protocol-v2)), which was first
introduced in Git 2.19 (2018) and made the default in 2.26 (2020).

The wire protocol behaves similarly when used over other transports,
such as SSH and the "Git protocol" (git:// URLs), so it should be
reasonably straightforward to support fetching dependencies from such
URLs if the necessary transports are implemented (e.g. #14295).
2023-09-30 18:30:43 -07:00
Lucas Santos
303181901b Improve (Unmanaged)ArrayList.insert
(Unmanaged)ArrayList.insert has the same inefficiency as the old insertSlice. With the new addManyAt, the solution is trivial.
Also improves the test "growing memory preserves contents". In the previous implementation, if any changes were made to the ArrayList memory growth policy (function growMemory), the list could end up with enough capacity to not trigger a memory growth, defeating the purpose of the test. The new implementation more robustly triggers a memory growth.
2023-09-30 16:17:22 -07:00
Marc Tiehuis
08635f08a9 fix indexOfSentinel alignment for types larger than 1 byte 2023-09-30 22:15:47 +13:00
Andrew Kelley
937e8cb705
Merge pull request #17328 from ziglang/simplify-cbe-deps
C backend: remove unneeded ordering mechanism
2023-09-30 01:26:00 -07:00
Marc Tiehuis
5b5da0ef8c std.mem: check backend vector support for indexOfSentinel/indexOfScalarPos 2023-09-30 21:22:12 +13:00
Marc Tiehuis
cd766513fe std.mem: add vectorized indexOfScalarPos and indexOfSentinel
These are an order of magnitude quicker than the previous
implementations:

A relative comparison of each, measuring scanning a 1G file.

    Reading 1G (1.0000000009313226GiB)

             std.mem.sliceTo: 281.232ms
          vectorized.sliceTo: 24.769ms
                      strlen: 24.291ms

           std.indexOfScalar: 229.016ms
    vectorized.indexOfScalar: 24.685ms
                      memchr: 24.958ms
2023-09-30 21:19:43 +13:00
Jakub Konka
873c695c41
Merge pull request #17319 from ziglang/elf-tls
elf: add basic TLS segment handling
2023-09-30 08:43:33 +02:00
Andrew Kelley
864bb5dc07 C backend: iterate decl_table via slice 2023-09-29 19:14:17 -07:00
Andrew Kelley
7c605ba62c C backend: remove ?*Decl from DeclGen
Another simplification. DeclGen already has `decl_index` which can be
used to retrieve the `*Decl` if needed.
2023-09-29 19:14:17 -07:00
Andrew Kelley
0d841e827a C backend: remove unneeded ordering mechanism
This logic to lower snippets of C code in a dependency order is no
longer needed. Simplify the logic by deleting the mechanism.
2023-09-29 19:14:17 -07:00
Andrew Kelley
101df768a0
Merge pull request #17312 from LucasSantos91/master
Fix inefficiency with ArrayList.insertSlice
2023-09-29 18:15:24 -07:00
Jakub Konka
e72fd185e0 elf: skip writing out zerofill atoms to file 2023-09-30 00:52:10 +02:00