If page aligned requested pagezero size is 0, skip generating
__PAGEZERO segment.
Add misc improvements to the pipeline, and correctly transfer the
requested __PAGEZERO size to the linker.
Pass `-pagezero_size` to the MachO linker. This is the final
"unsupported linker arg" that I could chase that CGo uses. After this
and #11874 we may be able to fail on an "unsupported linker arg" instead
of emiting a warning.
Test case:
zig=/code/zig/build/zig
CGO_ENABLED=1 GOOS=darwin GOARCH=amd64 CC="$zig cc -target x86_64-macos" CXX="$zig c++ -target x86_64-macos" go build -a -ldflags "-s -w" cgo.go
I compiled a trivial CGo program and executed it on an amd64 Darwin
host.
To be honest, I am not entirely sure what this is doing. This feels
right after reading what this argument does in LLVM sources, but I am by
no means qualified to make MachO pull requests. Will take feedback.
After doing performance testing, it seems that multi-compilation-unit
compiler-rt did not bring the performance improvements that we expected
it to. The idea is that it makes linking faster, however, it incurred a
cost in the frontend that was not offset by any gains in linking.
Furthermore, the single-object compiler-rt (with -ffunction-sections and
--gc-sections) ends up being fewer bytes on disk and so it's actually
the same or faster linking speed than the multi-compilation-unit
version.
So we are planning to keep using single-compilation-unit compiler-rt for
the foreseeable future, but may experiment with this again in the
future, in which case this commit can be reverted.
compiler_rt_lib and compiler_rt_obj are extracted from the generic
JobQueue into simple boolean flags, and then handled explicitly inside
performAllTheWork().
Introduced generic handling of allocation failure and made
setMiscFailure not return a possible error.
Building the compiler-rt static library now takes advantage of
Compilation's ThreadPool. This introduced a problem, however, because
now each of the object files of compiler-rt all perform AstGen for the
full standard library and compiler-rt files. Even though all of them end
up being cache hits except for the first ones, this is wasteful - O(N*M)
where N is number of compilation units inside compiler-rt and M is the
number of .zig files in the standard library and compiler-rt combined.
More importantly, however, it causes a deadlock, because each thread
interacts with a file system lock for doing AstGen on files, and threads
end up waiting for each other. This will need to be handled with a
process-level file caching system, or some other creative solution.
The purpose of this branch is to switch to using an object file for each
independent function, in order to make linking simpler - instead of
relying on `-ffunction-sections` and `--gc-sections`, which involves the
linker doing the work of linking everything and then undoing work via
garbage collection, this will allow the linker to only include the
compilation units that are depended on in the first place.
This commit makes progress towards that goal.
This passes -Wl,-no-pie linker arg. Golang uses that. From the `ld(1)`
man page:
Create a position dependent executable. This is the default.
Not adding to the help text, because this is the default.
Instead of always using std.testing.allocator, the test harness now follows
the same logic as self-hosted for choosing an allocator - that is - it
uses C allocator when linking libc, std.testing.allocator otherwise, and
respects `-Dforce-gpa` to override the decision. I did this because
I found GeneralPurposeAllocator to be prohibitively slow when doing
multi-threading, even in the context of a debug build.
There is now a second thread pool which is used to spawn each
test case. The stage2 tests are passed the first thread pool. If it were
only multi-threading the stage1 tests then we could use the same thread
pool for everything. However, the problem with this strategy with stage2
is that stage2 wants to spawn tasks and then call wait() on the main
thread. If we use the same thread pool for everything, we get a deadlock
because all the threads end up all hanging at wait() and nothing is
getting done. So we use our second thread pool to simulate a "process pool"
of sorts.
I spent most of the time working on this commit scratching my head trying
to figure out why I was getting ETXTBSY when spawning the test cases.
Turns out it's a fundamental Unix design flaw, already a known, unsolved
issue by Go and Java maintainers:
https://github.com/golang/go/issues/22315https://bugs.openjdk.org/browse/JDK-8068370
With this change, the following command, executed on my laptop, went from
6m24s to 1m44s:
```
stage1/bin/zig build test-cases -fqemu -fwasmtime -Denable-llvm
```
closes#11818
This comment is now deleted because the task is completed in this
commit:
```
// TODO: Update this to behave like `beginComptimePtrLoad` and properly check/use
// `container_ty` and `array_ty`, instead of trusting that the parent decl type
// matches the type used to derive the elem_ptr/field_ptr/etc.
//
// This is needed because the types will not match if the pointer we're mutating
// through is reinterpreting comptime memory.
```
The main strategy is to change the ComptimePtrMutationKit struct so that
instead of `val: *Value` it now returns a tagged union which can be one
of three possibilities:
* The pointer type matches the actual comptime Value so a direct
modification is possible. Before this commit, the implementation
incorrectly assumed this was always the case.
* In the case of needing to write through a reinterpreted pointer, a
mutable base Value pointer is provided along with a byte offset
pointing to the element value in virtual memory.
* Otherwise, it means a compile error must be emitted because one or
both of the types (the owner of the value, or the pointer type being
used to write through) do not have a well-defined memory layout.
After calling beginComptimePtrMutation, the one callsite now switches on
this tagged union and does the appropriate thing. The main new logic is
for the second case, which involves pointer reinterpretation, which now
takes this strategy:
1. write the base value to a memory buffer.
2. perform the pointer store at the proper byte offset, thereby
modifying a subset of the buffer.
3. read the base value from the memory buffer, overwriting the old base
value.
This commit does not change any behavior, but changes the type of
the runtime_index field from u32 to a non-exhaustive enum. This allows
us to put `std.math.maxInt(u32)` only in the enum type definition and
give it an official meaning.
Rather than storing all the shifts in temporaries, we perform the correct
shifting without temporaries. This makes the runtime code more performant
and also the backend code is simplified as we have a singular abstraction.
This does however not support floats of bitsizes
different than 32 or 64. f16, f80, f126 will require
support for compiler-rt and are out-of-scope for this commit.
Signed integers are currently not supported either.