This change fixes some division-by-zero bugs introduced by the optimized
ring buffer read/write functions in d8c067966.
There are edge cases where decompression can use a length zero ring
buffer as the size of the ring buffer used is exactly the the window
size specified by a Zstandard frame, and this can be zero. Switching
away from loops to mem copies means that we need to ensure ring buffers
do not have length zero ring when attempting to read/write from them.
`send_zc` tries to avoid making intermediate copies of data. Zerocopy execution
is not guaranteed and may fall back to copying.
The flags field of the first struct io_uring_cqe may likely contain
IORING_CQE_F_MORE , which means that there will be a second completion event /
notification for the request, with the user_data field set to the same value.
The user must not modify the data buffer until the notification is posted. The
first cqe follows the usual rules and so its res field will contain the number
of bytes sent or a negative error code. The notification's res field will be set
to zero and the flags field will contain IORING_CQE_F_NOTIF. The two step model
is needed because the kernel may hold on to buffers for a long time, e.g.
waiting for a TCP ACK, and having a separate cqe for request completions allows
userspace to push more data without extra delays. Note, notifications are only
responsible for controlling the lifetime of the buffers, and as such don't mean
anything about whether the data has atually been sent out or received by the
other end. Even errored requests may generate a notification, and the user must
check for IORING_CQE_F_MORE rather than relying on the result.
Available since kernel 6.0.
References:
https://man7.org/linux/man-pages/man3/io_uring_prep_send_zc.3.htmlhttps://man7.org/linux/man-pages/man2/io_uring_enter.2.html
Boundary symbols have a special name prefix:
* section$start$segname$sectname
* section$stop$segname$sectname
* segment$start$segname
* segment$stop$segname
and will resolve to either start or end of the respective
section/segment if found.
If not found, we return an error stating we couldn't find the
requested section/segment rather than silently failing and resolving
the address to 0 which seems to be the case with Apple's ld64.
There are two optimizations here, which work together to avoid a
pathological case.
The first optimization is that AstGen now records the result type of an
array multiplication expression where possible. This type is not used
according to the language specification, but instead as an optimization.
In the expression '.{x} ** 1000', if we know that the result must be an
array, then it is much more efficient to coerce the LHS to an array with
length 1 before doing the multiplication. Otherwise, we end up with a
1000-element tuple which we must coerce to an array by individually
extracting each field.
Secondly, the previous logic would repeatedly extract element/field
values from the LHS when initializing the result. This is unnecessary:
each element must only be extracted once, and the result reused.
These changes together give huge improvements to compiler performance on
a pathological case: AIR instructions go from 65551 to 15, and total AIR
bytes go from 1.86MiB to 264.57KiB. Codegen time spent on this function
(in a debug compiler build) goes from minutes to essentially zero.
Resolves: #17586
b3462b7 caused a regression in a third-party project, since it forced
resolution of field initializers for any field call 'foo.bar()', despite
this only being necessary when 'bar' is a comptime field.
See https://github.com/ziglang/zig/pull/17692#issuecomment-1802096734.
Previously the symbol tag field would remain `undefined` until it
was set during `flush`. However, the symbol's tag would be observed
earlier than where it was being set. We now set it to the explicit
tag `undefined` so this can be caught during debug. The symbol tag of
a decl will now also be set right after `updateDecl` and `updateFunc`.
Likewise, we now also set the `name` field during atom creation for
decls, as well as set the other fields to the max(u32) to ensure we
get a compiler crash during debug to ensure any misses will be caught.