DWARF 5 moves around some fields and adds a few new ones that can't be
parsed or ignored by our current DWARF 4 parser. This isn't a complete
implementation of DWARF 5, but this is enough to make stack traces
mostly work. Line numbers from C++ don't show up, but I know the info
is there. I think the answer is to iterate through .debug_line_str in
getLineNumberInfo, but I didn't want to fall into an even deeper rabbit
hole tonight.
Compilers will sometimes `#define linux 1` if the operating system in
use is Linux. This clashes with the code produced by the C backend when
processing the stdlib, e.g. std.Target.Os.VersionRange [^1] which is a
struct containing a field named `linux`.
The output of the C backend doesn't rely on this macro being defined,
and other code also shouldn't rely on it -- e.g. quoting from the GCC
documentation [^2]:
"""
The C standard requires that all system-specific macros be part of
the reserved namespace. All names which begin with two underscores, or
an underscore and a capital letter, are reserved for the compiler and
library to use as they wish. However, historically system-specific
macros have had names with no special prefix; for instance, it is
common to find unix defined on Unix systems.
[...]
We are slowly phasing out all predefined macros which are outside the
reserved namespace. You should never use them in new programs, and we
encourage you to correct older code to use the parallel macros
whenever you find it. We don’t recommend you use the system-specific
macros that are in the reserved namespace, either. It is better in the
long run to check specifically for features you need
"""
[^1]: 8c32d989c9/lib/std/target.zig (L224)
[^2]: https://gcc.gnu.org/onlinedocs/cpp/System-specific-Predefined-Macros.html#System-specific-Predefined-Macros
This also surfaces the fact that clz, ctz and popCount didn't actually
support 128 bit integers, despite what was claimed by
226fcd7c709ec664c5d883042cf7beb3026f66cb. This was partially hidden by
the fact that the test code for popCount only exercised 128 bit integers
in a comptime context. This commit duplicates that test case for runtime
ints too.
This parameter is only currently needed by zig_byte_swap() and
zig_bit_reverse(). This commit adds an option to airBuiltinCall() to
allow emitting the signedness information only when needed, removing
this unused parameter from the other builtins.
This folds the airCountZeroes() code from
226fcd7c709ec664c5d883042cf7beb3026f66cb back into airBuiltinCall(),
since most of these builtins happen to require the same arguments and
can be unified under a common function signature.
This required adjusting `Type.nameAlloc` to be used with a
general-purpose allocator and added `Type.nameAllocArena` for the arena
use case (avoids allocation sometimes).
This fixes 2 entrypoints within the self-hosted wasm linker that would be called
for the llvm backend, whereas we should simply call into the llvm backend to perform such action.
i.e. not allocate a decl index when we have an llvm object, and when flushing a module,
we should be calling it on llvm's object, rather than have the wasm linker perform the operation.
Also, this fixes the wasm intrinsics for wasm.memory.size and wasm.memory.grow.
Lastly, this commit ensures that when an extern function is being resolved, we tell LLVM how
to import such function.
Like decl code generation, also unify the wasm backend and the wasm linker to call into
the general purpose `codegen.zig` to generate the code for a function.
This also unifies the wasm backend to use `generateSymbol` when lowering a constant
that cannot be lowered to an immediate value.
As both decls and constants are now refactored, the old `genTypedValue` is removed.
To unify the wasm backend with the other backends, we will now call `generateSymbol` to
lower a Decl into bytes. This means we also have to change some function signatures
to comply with the linker interface.
Since the general purpose generateSymbol is less featureful than wasm's, some tests are
temporarily disabled.
* AIR: use pl_op instead of ty_pl for wasm_memory_size. No need to
store the type because the type is always `u32`.
* AstGen: use coerced_ty for `@wasmMemorySize` and `@wasmMemoryGrow`
and do the coercions in Sema.
* Sema: use more accurate source locations for errors.
* Provide more information in the compiler error message.
* Codegen: use liveness data to avoid lowering unused
`@wasmMemorySize`.
* LLVM backend: add implementation
- I wasn't able to test it because we are hitting a linker error for
`-target wasm32-wasi -fLLVM`.
* C backend: use `zig_unimplemented()` instead of silently doing wrong
behavior for these builtins.
* behavior tests: branch only on stage2_arch for inclusion of the
wasm.zig file. We would change it to `builtin.cpu.arch` but that is
causing a compiler crash on some backends.
This implements the wasm builtins by lowering to builtins that are supported by c-compilers.
In this case: Clang.
This also simplifies the `AIR` instruction as it now uses the payload field of `ty_pl` and `pl_op`
directly to store the index argument rather than storing it inside Extra. This saves us 4 bytes
per builtin call.
Symbols that have globals used to have their lookup key be the symbol name.
This key is now the offset into the string table.
Imports have both the module name (library name) and name (of the symbol), those strings are now
also being interned. This can save us up to 24bytes per import which have both their module name and name de-duplicated.
Module names are almost entirely the same for all imports, providing us with a big chance of saving us 12 bytes at least.
Just like imports, exports can also have a seperate name than the internal symbol name. Rather than storing the slice,
we now store the offset of this string instead.
For all symbols read from object files as well as generated from Zig code
will now be interned and have their offset into the string table saved on the `Symbol` instead.
Besides interning, local symbols now also use a decl's fully qualified name.
When a decl/symbol is extern/to-be-imported, the name of the decl itself will be used for symbol resolving.
Similarly for symbols that will be exported, will have their 'export name' set.
This is preliminary work for string interning in the wasm linker.
Using an arena would defeat the purpose of de-duplicating strings as we wouldn't be able to free memory
of duplicated strings.
This change also means we can simplify wasm binary parsing, by creating a general purpose parser that
parses the binary into its sections, but untyped. Doing this, allows us to re-use the base of that, for
object file, but also debug info parsing.
Unless the pointer is a pointer to a function, if the pointee type
has zero-bits, we need to return `MCValue.none` as the `Decl` has
not been lowered to memory, and therefore, any GOT reference will be
wrong.
* fix alignment issues for consts with natural ABI alignment not
matching that of the `ldr` instruction in `aarch64` - solved by
preceeding the `ldr` with an additional `add` instruction to form
the full address before dereferencing the pointer.
* redo selection of segment/section for decls and consts based on
combined type and value
Rather than ping ponging between codegen and the linker to generate the symbols/atoms
for a local constant and its relocations. We now create all neccesary objects within the linker.
This simplifies the code as we can now simply call `lowerUnnamedConst` from anywhere in codegen,
allowing us to further improve lowering constants into .rodata so we do not have to sacrifice
lowering certain types such as decl_ref's where its type is a slice.
Otherwise, we risk collisions in the global symbol table. This is
also an opportunity to generalise and rewrite the symbol table
abstraction.
Also, improve the logs for the symbol table.
Prior to this change, the routine would assume it is called first,
before any symbol was created, thus precluding an option that in the
incremental setting, we might have already pulled a suitably defined
and exported symbol that could collide and/or be replaced by the
symbol synthesised by the linker.
We now correctly implement exporting decls. This means it is possible to export
a decl with a different name than the decl that is doing the export.
This also sets the symbols with the correct flags, so when we emit a relocatable
object file, a linker can correctly resolve symbols and/or export the symbol to the host environment.
This commit also includes fixes to ensure relocations have the correct offset to how other
linkers will expect the offset, rather than what we use internally.
Other linkers accept the offset, relative to the section.
Internally we use an offset relative to the atom.
When generating a relocatable object file, we now emit a custom "reloc.CODE" and "reloc.DATA" section
which will contain the relocations for each section.
Using a new symbol location -> Atom mapping, we can now easily find the corresponding `Atom` from a symbol.
This can be used to construct the symbol table, as well as easier access to a target atom when performing
a relocation for a data symbol.
When creating a relocatable object file, we do no longer perform the following actions:
- Merge data segments
- Calculate stack size
- Relocations
We now also make the stack pointer symbol `undefined` for this use case as well as add the symbol
as an import.
When creating a relocatable object file, emit the symbol table.
We do this by iterating over all atoms, and finding the corresponding
symbols of those. This provides us all the meta information such as size, and offset as well.
This data is required for defined data symbols.
When we emit an object file, the "Names" section does not have to be emitted, as all symbol names
are already in the symbol table, so the names section is redundant.
- Correctly get discard symbol by first checking if it was discarded or not.
- Remove imports if extern symbols were resolved by an object file.
- Correctly relocate data symbols by ensuring the atom is from the correct file.
- Fix the `Names` section by using the resolved symbols, rather than the ones defined in Zig code.
We now correctly allocate and create atoms for symbols from other object files.
Imports are now also resolved and appended when required.
Besides those changes, we now duplicate all symbol names, so we can correctly
generate unique names for unnamed constants.
TODO: String interning
This implements the merging of all sections, to generate a valid wasm binary where all symbols
have been resolved and their respective sections have been merged into the final binary.