77 Commits

Author SHA1 Message Date
Ryan Liptak
68b87918df Fix handling of Windows (WTF-16) and WASI (UTF-8) paths
Windows paths now use WTF-16 <-> WTF-8 conversion everywhere, which is lossless. Previously, conversion of ill-formed UTF-16 paths would either fail or invoke illegal behavior.

WASI paths must be valid UTF-8, and the relevant function calls have been updated to handle the possibility of failure due to paths not being encoded/encodable as valid UTF-8.

Closes #18694
Closes #1774
Closes #2565
2024-02-24 14:05:24 -08:00
Ryan Liptak
f6b6b8a4ae Add std.unicode.fmtUtf8 that can handle ill-formed UTF-8
Ill-formed UTF-8 byte sequences are replaced by the replacement character (U+FFFD) according to "U+FFFD Substitution of Maximal Subparts" from Chapter 3 of the Unicode standard, and as specified by https://encoding.spec.whatwg.org/#utf-8-decoder
2024-02-24 14:04:59 -08:00
Ryan Liptak
4ee1309a8d std.unicode: Refactor and add WTF-16/WTF-8 functions
Renamed functions for consistent `Le` capitalization and conventions:

- utf16leToUtf8Alloc -> utf16LeToUtf8Alloc
- utf16leToUtf8AllocZ -> utf16LeToUtf8AllocZ
- utf16leToUtf8 -> utf16LeToUtf8
- utf8ToUtf16LeWithNull -> utf8ToUtf16LeAllocZ
- fmtUtf16le -> fmtUtf16Le

New UTF related functions:

- utf16LeToUtf8ArrayList
- utf8ToUtf16LeArrayList
- utf8ToUtf16LeAlloc
- isSurrogateCodepoint

(the ArrayList functions are mostly to allow the Alloc and AllocZ to share an implementation)

New WTF related functions/structs:

- wtf8Encode
- wtf8Decode
- wtf8ValidateSlice
- Wtf8View
- Wtf8Iterator
- wtf16LeToWtf8ArrayList
- wtf16LeToWtf8Alloc
- wtf16LeToWtf8AllocZ
- wtf16LeToWtf8
- wtf8ToWtf16LeArrayList
- wtf8ToWtf16LeAlloc
- wtf8ToWtf16LeAllocZ
- wtf8ToWtf16Le
- wtf8ToUtf8Lossy
- wtf8ToUtf8LossyAlloc
- wtf8ToUtf8LossyAllocZ
- Wtf16LeIterator
2024-02-24 14:04:58 -08:00
vinnichase
279607cae5
Fix fmt UTF-8 characters as fill (#18533)
Co-authored-by: Jacob Young <jacobly0@users.noreply.github.com>
2024-01-13 22:47:03 -05:00
Andrew Kelley
6a32d58876
Merge pull request #18318 from castholm/simd-segfault
Rename `simd.suggestVectorSize` to clarify intent and fix related segfault
2024-01-09 17:13:58 -08:00
davideger
e426ae43ae
Updated Utf8View example to format the single codepoint UTF-8 slice with {s} (#18288) 2024-01-01 18:47:27 -05:00
Carl Åstholm
59ac0d1eed Deprecate suggestVectorSize in favor of suggestVectorLength
The function returns the vector length, not the byte size of the vector or the bit size of individual elements. This distinction is very important and some usages of this function in the stdlib operated under these incorrect assumptions.
2024-01-01 16:18:57 +01:00
Meghan Denny
6a12fd62c1 std: make std.unicode.initComptime() a comptime-known function
resolved a TODO :)
2023-12-08 15:59:17 +02:00
Ryan Liptak
15a6b27957 std.unicode: Disable utf8 -> utf16 ASCII fast path on mips
Fixes a compile error when the target is mips, since std.simd.interlace does not work correctly on mips and raises a compile error if it is used.
2023-11-21 13:51:03 +02:00
mlugg
51595d6b75
lib: correct unnecessary uses of 'var' 2023-11-19 09:55:07 +00:00
Andrew Kelley
3fc6fc6812 std.builtin.Endian: make the tags lower case
Let's take this breaking change opportunity to fix the style of this
enum.
2023-10-31 21:37:35 -04:00
Jacob Young
d890e81761 mem: fix ub in writeInt
Use inline to vastly simplify the exposed API.  This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
2023-10-31 21:37:35 -04:00
Ryan Liptak
13c8ec9db0 std.unicode: Add ASCII fast path to UTF-16 -> UTF-8 conversion functions 2023-10-31 02:23:35 -07:00
Ryan Liptak
03117c5290 std.unicode: Add ASCII fast path to UTF-8 -> UTF-16 conversion functions 2023-10-31 02:23:33 -07:00
Jacob Young
fe93332ba2 x86_64: implement enough to pass unicode tests
* implement vector comparison
 * implement reduce for bool vectors
 * fix `@memcpy` bug
 * enable passing std tests
2023-10-23 22:42:18 -04:00
Jacob Young
27fe945a00 Revert "Revert "Merge pull request #17637 from jacobly0/x86_64-test-std""
This reverts commit 6f0198cadbe29294f2bf3153a27beebd64377566.
2023-10-22 15:46:43 -04:00
Andrew Kelley
6f0198cadb Revert "Merge pull request #17637 from jacobly0/x86_64-test-std"
This reverts commit 0c99ba1eab63865592bb084feb271cd4e4b0357e, reversing
changes made to 5f92b070bf284f1493b1b5d433dd3adde2f46727.

This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
2023-10-22 12:16:35 -07:00
Jacob Young
ccc9ebf0bd std: slightly improve codegen of std.unicode.utf8ValidateSlice 2023-10-22 12:07:23 -04:00
Jacob Young
2e6e39a700 x86_64: fix bugs and disable erroring tests 2023-10-21 10:55:41 -04:00
Veikka Tuominen
c919e9a280 std.simd: return comptime_int from suggestVectorSize 2023-10-13 16:58:05 +03:00
Karl Seguin
d68f39b541
std.unicode.utf8ValidateSlice: optimize implementation (#17329)
Originally inspired by Go's `utf8.Valid` function. Includes some test cases from Go's test suite.

Further optimized to be faster in all tested cases (short/long ascii/UTF8), in all release modes.

Takes advantage of SIMD for the ASCII fast path.
2023-10-06 23:49:21 -04:00
Ryan Liptak
a155e35850
std.json: Fix decoding of UTF-16 surrogate pairs (#16830)
* std.unicode: Add more UTF-16 decoding functions

This mostly makes parts of Utf16LeIterator reusable

* std.json: Fix decoding of UTF-16 surrogate pairs

Before this commit, there were 524,288 codepoints that would get decoded improperly. After this commit, there are 0.

Fixes #16828
2023-08-15 09:11:59 -04:00
mlugg
f26dda2117 all: migrate code to new cast builtin syntax
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:

* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
2023-06-24 16:56:39 -07:00
Eric Joldasov
d884d7050e
all: replace comptime try with try comptime
Signed-off-by: Eric Joldasov <bratishkaerik@getgoogleoff.me>
2023-06-13 23:46:58 +06:00
dweiller
bd3360e03d convert s[start..start+len] to s[start..][0..len] 2023-05-07 15:55:21 +10:00
mlugg
ccf670c2b0 Zir: implement explicit block_comptime instruction
Resolves: #7056
2023-04-12 12:06:19 -04:00
Andrew Kelley
50eb7983cd remove most conditional compilation based on stage1
There are still a few occurrences of "stage1" in the standard library
and self-hosted compiler source, however, these instances need a bit
more careful inspection to ensure no breakage.
2022-12-06 20:38:54 -07:00
Andrew Kelley
ceb0a632cf std.mem.Allocator: allow shrink to fail
closes #13535
2022-11-29 23:30:38 -07:00
Jan Philipp Hafer
cf744cf04f
add suggestions by ifreund
also remove 2 redundant and outcommented tests
2022-05-17 18:56:06 +02:00
Jan Philipp Hafer
405f4286f7
std.unicode: add utf16 byte length and codepoints counting routines
* comptime and runtime tests are based on tests for counting utf8 code points
2022-05-17 18:54:29 +02:00
r00ster91
62d717e2ff Add std.unicode.replacement_character 2022-04-15 11:20:11 +03:00
r00ster
c4aac28a42 Reuse code in Utf8Iterator.nextCodepoint 2022-04-12 05:34:12 -04:00
PhaseMage
8a97807d68
Full response file (*.rsp) support
I hit the "quotes in an RSP file" issue when trying to compile gRPC using
"zig cc". As a fun exercise, I decided to see if I could fix it myself.
I'm fully open to this code being flat-out rejected. Or I can take feedback
to fix it up.

This modifies (and renames) _ArgIteratorWindows_ in process.zig such that
it works with arbitrary strings (or the contents of an RSP file).

In main.zig, this new _ArgIteratorGeneral_ is used to address the "TODO"
listed in _ClangArgIterator_.

This change closes #4833.

**Pros:**

- It has the nice attribute of handling "RSP file" arguments in the same way it
  handles "cmd_line" arguments.
- High Performance, minimal allocations
- Fixed bug in previous _ArgIteratorWindows_, where final trailing backslashes
  in a command line were entirely dropped
- Added a test case for the above bug
- Harmonized the _ArgIteratorXxxx._initWithAllocator()_ and _next()_ interface
  across Windows/Posix/Wasi (Moved Windows errors to _initWithAllocator()_
  rather than _next()_)
- Likely perf benefit on Windows by doing _utf16leToUtf8AllocZ()_ only once
  for the entire cmd_line

**Cons:**

- Breaking Change in std library on Windows: Call
  _ArgIterator.initWithAllocator()_ instead of _ArgIterator.init()_
- PhaseMage is new with contributions to Zig, might need a lot of hand-holding
- PhaseMage is a Windows person, non-Windows stuff will need to be double-checked

**Testing Done:**

- Wrote a few new test cases in process.zig
- zig.exe build test -Dskip-release (no new failures seen)
- zig cc now builds gRPC without error
2022-01-30 21:27:52 +02:00
Lee Cannon
85de022c56
allocgate: std Allocator interface refactor 2021-11-30 23:32:47 +00:00
Andrew Kelley
902df103c6 std lib API deprecations for the upcoming 0.9.0 release
See #3811
2021-11-30 00:13:07 -07:00
Ryan Liptak
e97feb96e4 Replace ArrayList.init/ensureTotalCapacity pairs with initCapacity
Because ArrayList.initCapacity uses 'precise' capacity allocation, this should save memory on average, and definitely will save memory in cases where ArrayList is used where a regular allocated slice could have also be used.
2021-11-04 14:54:25 -04:00
Andrew Kelley
6115cf2240 migrate from std.Target.current to @import("builtin").target
closes #9388
closes #9321
2021-10-04 23:48:55 -07:00
Ryan Liptak
59f5053bed Update all ensureCapacity calls to the relevant non-deprecated version 2021-09-19 13:52:56 +02:00
Ryan Liptak
db940a2c81 std.unicode: cleanup allocations on error in allocating functions
Fixes leaks when `utf16leToUtf8Alloc`/`utf16leToUtf8AllocZ`/`utf8ToUtf16LeWithNull` return an error and adds relevant test cases
2021-09-16 11:43:07 +02:00
Andrew Kelley
d29871977f remove redundant license headers from zig standard library
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.

Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
2021-08-24 12:25:09 -07:00
Jonathan Marler
08e5daa7d5 Add std.unicode.fmtUtf16le 2021-07-15 20:37:16 +03:00
Andrew Kelley
125b85d737 move "unreachable code" error from stage1 to stage2
* AstGen: implement "unreachable code" error for blocks. This works at
   the statement level.
 * stage1: remove the "unreachable code" error implementation, which
   means removing the `is_gen` field from IrInstSrc. This is one small
   step towards a smaller memory footprint for stage1. The benefits
   won't be realized until a future commit because this flag took
   advantage of padding.

There may be a regression here with "union has no associated enum"
error, and there is a regression with the following code:

```zig
const a = noreturn;
```

A future commit will address these regressions.
2021-07-02 13:26:50 -07:00
Jacob G-W
9fffffb07b fix code broken from previous commit 2021-06-21 17:03:03 -07:00
Andrew Kelley
5619ce2406 Merge remote-tracking branch 'origin/master' into stage2-whole-file-astgen
Conflicts:
 * doc/langref.html.in
 * lib/std/enums.zig
 * lib/std/fmt.zig
 * lib/std/hash/auto_hash.zig
 * lib/std/math.zig
 * lib/std/mem.zig
 * lib/std/meta.zig
 * test/behavior/alignof.zig
 * test/behavior/bitcast.zig
 * test/behavior/bugs/1421.zig
 * test/behavior/cast.zig
 * test/behavior/ptrcast.zig
 * test/behavior/type_info.zig
 * test/behavior/vector.zig

Master branch added `try` to a bunch of testing function calls, and some
lines also had changed how to refer to the native architecture and other
`@import("builtin")` stuff.
2021-05-08 14:45:21 -07:00
Veikka Tuominen
fd77f2cfed std: update usage of std.testing 2021-05-08 15:15:30 +03:00
Andrew Kelley
df24ce52b1 Merge remote-tracking branch 'origin/master' into stage2-whole-file-astgen
In particular I wanted to take advantage of the new hex float parsing
code.
2021-04-28 14:57:38 -07:00
Andrew Kelley
429cd2b5dd std: change @import("builtin") to std.builtin 2021-04-15 19:06:39 -07:00
xackus
a5007d819a std.meta: add isError 2021-04-11 16:26:29 +02:00
Frank Denis
6c2e0c2046 Year++ 2020-12-31 15:45:24 -08:00
Rageoholic
0369b65082
Switch to using unicode when parsing the command line on windows (#7241)
* Switch to using unicode when parsing the command line on windows

* Apply changes by LemonBoy and *hopefully* fix tests on MIPs

Co-authored-by: LemonBoy <LemonBoy@users.noreply.github.com>

* Fix up next and skip

* Move comment to more relevant place

Co-authored-by: LemonBoy <LemonBoy@users.noreply.github.com>
2020-11-30 13:47:01 -05:00