Previously we incorrectly assumed all memset's to have its element
abi-size be 1 byte. This would set the region of memory incorrectly.
We now have a more efficient loop, as well as support any element
type by re-using the `store` function for each element and moving
the pointer by 1 element.
When the element is comptime-known, we can check if it has a repeated
byte representation. In this case, `@memset` can be lowered with the
LLVM intrinsic rather than with a loop.