When targeting WebAssembly, we default to building a single-threaded build
as threads are still experimental. The user however can enable a multi-
threaded build by specifying '-fno-single-threaded'. It's a compile-error
to enable this flag, but not also enable shared-memory.
When a thread is detached from the main thread, we automatically
cleanup any allocated memory. For this we first reset the stack-pointer
to the original stack-pointer of the main-thread so we can safely clear
the memory which also contains the thread's stack.
When `join` detects a thread has completed, it will free the allocated
memory of the thread. For this we must first copy the allocator. This is
required as the allocated memory holds a reference to the original
allocator. If we free the memory, we would end up with UB as the
allocator would free itself.
We now reset the Thread ID to 0 and wake up the main thread listening
for the thread to finish. We use inline assembly as we cannot use
the stack to set the thread ID as it could possibly clobber any
of the memory.
Currently, we leak the memory that was allocated for the thread.
We need to implement a way where we can clean up the memory without
using the stack (as the stack is stored inside this same memory).
We now store the original allocator that was used to allocate the
memory required for the thread. This allocator can then be used
in any cleanup functionality to ensure the memory is freed correctly.
Secondly, we now use a function to set the stack pointer instead of
generating a function using global assembly. This is a lot cleaner
and more readable.
This implements a first version to spawn a WASI-thread. For a new thread
to be created, we calculate the size required to store TLS, the new stack,
and metadata. This size is then allocated using a user-provided allocator.
After a new thread is spawn, the HOST will call into our bootstrap procedure.
This bootstrap procedure will then initialize the TLS segment and set the
newly spawned thread's TID. It will also set the stack pointer to the newly
created stack to ensure we do not clobber the main thread's stack.
When bootstrapping the thread is completed, we will call the user's
function on this new thread.
Implements std's `Futex` for the WebAssembly target using Wasm's
`atomics` instruction set. When the `atomics` cpu feature is disabled
we emit a compile-error.
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
This reverts commit fa6cea22bf9cd5ff6a9dd882166cf7e479acfd6c.
Apologies for the merge. I thought this was a bug fix, but I see that it
is implementing a proposal that I intended to reject.
(Firstly, I changed `n` to `b`, as that is less confusing. It's not a length, it's a right boundary.)
The invariant maintained is `cur < b`. In the worst case `2*cur + 1` results in a maximum of `2b`. Since `2b` is not guaranteed to be lower than `maxInt`, we have to add one overflow check to `siftDown` to make sure we avoid undefined behavior.
LLVM also seems to have a nicer time compiling this version of the function. It is about 2x faster in my tests (I think LLVM was stumped by the `child += @intFromBool` line), and adding/removing the overflow check has a negligible performance difference on my machine. Of course, we could check `2b <= maxInt` in the parent function, and dispatch to a version of the function without the overflow check in the common case, but that probably is not worth the code size just to eliminate a single instruction.
This makes it so `first_child_index` will not be accessed when it is equal to `self.len`. (i.e. `self.items[self.len]` will not happen) The access itself was "safe" (as in, `self.len < self.items.len`) because we were only calling `doSiftDown` in the case where there was a stale value at `self.items[self.len]`. However, it is still technically a bug, and can manifest by an unnecessary comparison of a value to a copy of itself.
LLVM has trouble compiling the old implementation, (presumably) because `leading_zeros` is thought to be a `u7` rather than a `u6`, which means `63 - clz` is not equivalent to `63 ^ clz`, which means it can't deduce that the final condition can simply be flipped. (I am assuming `usize` is a `u64` here for ease of understanding, but it's the same for any power of 2)
https://zig.godbolt.org/z/Pbj4P7ob3
The new version is slightly better too because `isMinLayer(maxInt(usize))` is now well-defined behavior.
The 'Content-Length' header was inspected by mistake,
which makes it effectively impossible to use chunked
Transfer-Encoding when using the http client.
Tested locally with a HTTP server - data is properly sent
with POST method and the proper encoding declared, after the fix.