The new memcpy function aims to be more generic than the previous
implementation which was adapted from an implementation optimized for
x86_64 avx2 machines. Even on x86_64 avx2 machines this implementation
should be generally be faster due to fewer branches in the small length
cases and generating less machine code.
Note that the new memcpy function no longer acts as a memmove.
When we're compiling compiler_rt for any WebAssembly target, we do
not want to expose all the compiler-rt functions to the host runtime.
By setting the visibility of all exports to `hidden`, we allow the
linker to resolve the symbols during linktime, while not expose the
functions to the host runtime. This also means the linker can
properly garbage collect any compiler-rt function that does not get
resolved. The symbol visibility for all target remains the same as
before: `default`.
This moves functions that LLVM generates calls to,
to the compiler_rt implementation itself, rather than c.zig.
This is a prerequisite for native backends to link with compiler-rt.
This also allows native backends to generate calls to `memcpy` and the like.