1. a fiber can't put itself on a queue that allows it to be rescheduled
2. allow the idle fiber to unlock a mutex held by another fiber by
ignoring reschedule requests originating from the idle fiber
When the previous fiber did not request to be registered as an awaiter,
it may not have actually been a full blown `Fiber`, so only create the
`Fiber` pointer when needed.
Implements deflate compression from scratch. A history window is kept in
the writer's buffer for matching and a chained hash table is used to
find matches. Tokens are accumulated until a threshold is reached and
then outputted as a block. Flush is used to indicate end of stream.
Additionally, two other deflate writers are provided:
* `Raw` writes only in store blocks (the uncompressed bytes). It
utilizes data vectors to efficiently send block headers and data.
* `Huffman` only performs Huffman compression on data and no matching.
The above are also able to take advantage of writer semantics since they
do not need to keep a history.
Literal and distance code parameters in `token` have also been reworked.
Their parameters are now derived mathematically, however the more
expensive ones are still obtained through a lookup table (expect on
ReleaseSmall).
Decompression bit reading has been greatly simplified, taking advantage
of the ability to peek on the underlying reader. Additionally, a few
bugs with limit handling have been fixed.
There were only a few dozen lines of common logic, and they frankly
introduced more complexity than they eliminated. Instead, let's accept
that the implementations of `SelfInfo` are all pretty different and want
to track different state. This probably fixes some synchronization and
memory bugs by simplifying a bunch of stuff. It also improves the DWARF
unwind cache, making it around twice as fast in a debug build with the
self-hosted x86_64 backend, because we no longer have to redundantly go
through the hashmap lookup logic to find the module. Unwinding on
Windows will also see a slight performance boost from this change,
because `RtlVirtualUnwind` does not need to know the module whatsoever,
so the old `SelfInfo` implementation was doing redundant work. Lastly,
this makes it even easier to implement `SelfInfo` on freestanding
targets; there is no longer a need to emulate a real module system,
since the user controls the whole implementation!
There are various other small refactors here in the `SelfInfo`
implementations as well as in the DWARF unwinding logic. This change
turned out to make a lot of stuff simpler!
Apparently the `__eh_frame` in Mach-O binaries doesn't include the
terminator entry, but in all other respects it acts like `.eh_frame`
rather than `.debug_frame`. I have no idea.
By my estimation, these changes speed up DWARF unwinding when using the
self-hosted x86_64 backend by around 7x. There are two very significant
enhancements: we no longer iterate frames which don't fit in the stack
trace buffer, and we cache register rules (in a fixed buffer) to avoid
re-parsing and evaluating CFI instructions in most cases. Alongside this
are a bunch of smaller enhancements, such as pre-caching the result of
evaluating the CIE's initial instructions, avoiding re-parsing of CIEs,
and big simplifications to the `Dwarf.Unwind.VirtualMachine` logic.
This logic was causing some occasional infinite looping on ARM, where
the `.debug_frame` section is often incomplete since the `.exidx`
section is used for unwind information. But the information we're
getting from the compiler is totally *valid*: it's leaving the rule as
the default, which is (as with most architectures) equivalent to
`.undefined`!
This has been a TODO for ages, but in the past it didn't really matter
because stack traces are typically printed to stderr for which a mutex
is held so in practice there was a mutex guarding usage of `SelfInfo`.
However, now that `SelfInfo` is also used for simply capturing traces,
thread safety is needed. Instead of just a single mutex, though, there
are a couple of different mutexes involved; this helps make critical
sections smaller, particularly when unwinding the stack as `unwindFrame`
doesn't typically need to hold any lock at all.