When using the LLVM backend, array copies were lowered as calls to
`llvm.memcpy.*` builtin which could cause recursive calls to memcpy to
be generated (observed with `-target x86_64-linux -mcpu x86_64+avx512vl
--debug-rt`).
By instead performing these small fixed-size copies with integers or
vectors the LLVM backend does not generate calls to the `llvm.memcpy`
builtin, and so (with `-fno-builtin`) recursive calls to memcpy will
not be generated by LLVM.
The assertions and (test build) runtime safety have been removed as they
may cause (mutually) recursive calls to memcpy in debug builds since the
panic handler generates calls to llvm.memcpy.
The new memcpy function aims to be more generic than the previous
implementation which was adapted from an implementation optimized for
x86_64 avx2 machines. Even on x86_64 avx2 machines this implementation
should be generally be faster due to fewer branches in the small length
cases and generating less machine code.
Note that the new memcpy function no longer acts as a memmove.
When we're compiling compiler_rt for any WebAssembly target, we do
not want to expose all the compiler-rt functions to the host runtime.
By setting the visibility of all exports to `hidden`, we allow the
linker to resolve the symbols during linktime, while not expose the
functions to the host runtime. This also means the linker can
properly garbage collect any compiler-rt function that does not get
resolved. The symbol visibility for all target remains the same as
before: `default`.
This moves functions that LLVM generates calls to,
to the compiler_rt implementation itself, rather than c.zig.
This is a prerequisite for native backends to link with compiler-rt.
This also allows native backends to generate calls to `memcpy` and the like.