It's easy to do FP unwinding from a CPU context: you just report the
captured ip/pc value first, and then unwind from the captured fp value.
All this really needed was a couple of new functions on the
`std.debug.cpu_context` implementations so that we don't need to rely on
`std.debug.Dwarf` to access the captured registers.
Resolves: #25576
For unwinding purposes, we don't care about unsupported registers. Yet because
we added these rules to the cache entry, we'd later try to evaluate them and
thus fail the unwind attempt for no good reason. They'd also take up cache rule
slots that would be better spent on actually relevant registers.
Note that any attempt to read unsupported registers during unwinding will still
fail the unwind attempt as expected.
There were only a few dozen lines of common logic, and they frankly
introduced more complexity than they eliminated. Instead, let's accept
that the implementations of `SelfInfo` are all pretty different and want
to track different state. This probably fixes some synchronization and
memory bugs by simplifying a bunch of stuff. It also improves the DWARF
unwind cache, making it around twice as fast in a debug build with the
self-hosted x86_64 backend, because we no longer have to redundantly go
through the hashmap lookup logic to find the module. Unwinding on
Windows will also see a slight performance boost from this change,
because `RtlVirtualUnwind` does not need to know the module whatsoever,
so the old `SelfInfo` implementation was doing redundant work. Lastly,
this makes it even easier to implement `SelfInfo` on freestanding
targets; there is no longer a need to emulate a real module system,
since the user controls the whole implementation!
There are various other small refactors here in the `SelfInfo`
implementations as well as in the DWARF unwinding logic. This change
turned out to make a lot of stuff simpler!
Apparently the `__eh_frame` in Mach-O binaries doesn't include the
terminator entry, but in all other respects it acts like `.eh_frame`
rather than `.debug_frame`. I have no idea.
By my estimation, these changes speed up DWARF unwinding when using the
self-hosted x86_64 backend by around 7x. There are two very significant
enhancements: we no longer iterate frames which don't fit in the stack
trace buffer, and we cache register rules (in a fixed buffer) to avoid
re-parsing and evaluating CFI instructions in most cases. Alongside this
are a bunch of smaller enhancements, such as pre-caching the result of
evaluating the CIE's initial instructions, avoiding re-parsing of CIEs,
and big simplifications to the `Dwarf.Unwind.VirtualMachine` logic.
Our usage of `ucontext_t` in the standard library was kind of
problematic. We unnecessarily mimiced libc-specific structures, and our
`getcontext` implementation was overkill for our use case of stack
tracing.
This commit introduces a new namespace, `std.debug.cpu_context`, which
contains "context" types for various architectures (currently x86,
x86_64, ARM, and AARCH64) containing the general-purpose CPU registers;
the ones needed in practice for stack unwinding. Each implementation has
a function `current` which populates the structure using inline
assembly. The structure is user-overrideable, though that should only be
necessary if the standard library does not have an implementation for
the *architecture*: that is to say, none of this is OS-dependent.
Of course, in POSIX signal handlers, we get a `ucontext_t` from the
kernel. The function `std.debug.cpu_context.fromPosixSignalContext`
converts this to a `std.debug.cpu_context.Native` with a big ol' target
switch.
This functionality is not exposed from `std.c` or `std.posix`, and
neither are `ucontext_t`, `mcontext_t`, or `getcontext`. The rationale
is that these types and functions do not conform to a specific ABI, and
in fact tend to get updated over time based on CPU features and
extensions; in addition, different libcs use different structures which
are "partially compatible" with the kernel structure. Overall, it's a
mess, but all we need is the kernel context, so we can just define a
kernel-compatible structure as long as we don't claim C compatibility by
putting it in `std.c` or `std.posix`.
This change resulted in a few nice `std.debug` simplifications, but
nothing too noteworthy. However, the main benefit of this change is that
DWARF unwinding---sometimes necessary for collecting stack traces
reliably---now requires far less target-specific integration.
Also fix a bug I noticed in `PageAllocator` (I found this due to a bug
in my distro's QEMU distribution; thanks, broken QEMU patch!) and I
think a couple of minor bugs in `std.debug`.
Resolves: #23801Resolves: #23802
This abstraction isn't really tied to DWARF at all! Really, we're just
loading some information from an ELF file which is useful for debugging.
That *includes* DWARF, but it also includes other information. For
instance, the other change here:
Now, if DWARF information is missing, `debug.SelfInfo.ElfModule` will
name symbols by finding a matching symtab entry. We actually already do
this on Mach-O, so it makes obvious sense to do the same on ELF! This
change is what motivated the restructuring to begin with.
The symtab work is derived from #22077.
Co-authored-by: geemili <opensource@geemili.xyz>
turns out this isn't technically specific to that target at all; other
targets just don't emit mid-function 'ret' instructions as much so
certain CFI instruction patterns were only seen on aarch64.
thanks to jacob for finding the bug <3
The big endian RISC-V effort is mostly driven by MIPS (the company) which is
pivoting to RISC-V, and presumably needs a big endian variant to fill the niche
that big endian MIPS (the ISA) did.
GCC already supports these targets, but LLVM support will only appear in 22;
this commit just adds the necessary target knowledge and checks on our end.
This API is based around the unsound idea that a process can perform
checked virtual memory loads to prevent crashing. This depends on
OS-specific APIs that may be unavailable, disabled, or impossible due to
virtualization.
It also makes collecting stack traces ridiculously slow, which is a
problem for users of DebugAllocator - in other words, everybody, all the
time. It also makes strace go from being superbly clean to being awful.
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.
This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of #9938.
There are two concepts here: one for whether dwarf supports unwinding on
that target, and another for whether the Zig standard library
implements it yet.
...which have a ucontext_t but not a PC register. The current stack
unwinding implementation does not yet support this architecture.
Also fix name of `std.debug.SelfInfo.openSelf` to remove redundancy.
Also removed this hook into root providing an "openSelfDebugInfo"
function. Sorry, this debugging code is not of sufficient quality to
offer a plugin API right now.
After this commit:
`std.debug.SelfInfo` is a cross-platform abstraction for the current
executable's own debug information, with a goal of minimal code bloat
and compilation speed penalty.
`std.debug.Dwarf` does not assume the current executable is itself the
thing being debugged, however, it does assume the debug info has the
same CPU architecture and OS as the current executable. It is planned to
remove this limitation.