This change fixes some division-by-zero bugs introduced by the optimized
ring buffer read/write functions in d8c067966.
There are edge cases where decompression can use a length zero ring
buffer as the size of the ring buffer used is exactly the the window
size specified by a Zstandard frame, and this can be zero. Switching
away from loops to mem copies means that we need to ensure ring buffers
do not have length zero ring when attempting to read/write from them.
This reverts a change introduced in #17400 causing a bug when
decompressing an RLE block into a ring buffer.
RLE blocks contain only a single byte of data to copy into the output,
so attempting to copy a slice causes buffer overruns and incorrect
decompression.
Use inline to vastly simplify the exposed API. This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
Previously `executeSequenceRingBuffer()` would not verify the offset
against the number of bytes already decoded, so it would happily copy
garbage bytes rather than return an error before the window was filled.
To fix this a new `written_count` is added to the decode state that
tracks the total number of bytes decoded.