89 Commits

Author SHA1 Message Date
Jacob Young
509be7cf1f x86_64: fix std test failures 2023-11-03 23:18:21 -04:00
Jacob Young
fe93332ba2 x86_64: implement enough to pass unicode tests
* implement vector comparison
 * implement reduce for bool vectors
 * fix `@memcpy` bug
 * enable passing std tests
2023-10-23 22:42:18 -04:00
Jacob Young
27fe945a00 Revert "Revert "Merge pull request #17637 from jacobly0/x86_64-test-std""
This reverts commit 6f0198cadbe29294f2bf3153a27beebd64377566.
2023-10-22 15:46:43 -04:00
Andrew Kelley
6f0198cadb Revert "Merge pull request #17637 from jacobly0/x86_64-test-std"
This reverts commit 0c99ba1eab63865592bb084feb271cd4e4b0357e, reversing
changes made to 5f92b070bf284f1493b1b5d433dd3adde2f46727.

This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
2023-10-22 12:16:35 -07:00
Jacob Young
32e85d44eb x86_64: disable failing tests, enable test-std testing 2023-10-21 10:55:41 -04:00
Jacob Young
2e6e39a700 x86_64: fix bugs and disable erroring tests 2023-10-21 10:55:41 -04:00
mlugg
f26dda2117 all: migrate code to new cast builtin syntax
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:

* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
2023-06-24 16:56:39 -07:00
Tom Read Cutting
346ec15c50
Correctly handle carriage return characters according to the spec (#12661)
* Scan from line start when finding tag in tokenizer

This resolves a crash that can occur for invalid bytes like carriage
returns that are valid characters when not parsed from within literals.

There are potentially other edge cases this could resolve as well, as
the calling code for this function didn't account for any potential
'pending_invalid_tokens' that could be queued up by the tokenizer from
within another state.

* Fix carriage return crash in multiline string

Follow the guidance of #38:

> However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR.

Zig fmt already had code for deleting carriage returns, but would still
crash - now it no longer does so. Carriage returns encountered before
line-feeds are now appropriately removed on program compilation as well.

* Only accept carriage returns before line feeds

Previous commit was much less strict about this, this more closely
matches the desired spec of only allow CR characters in a CRLF pair, but
not otherwise.

* Fix CR being rejected when used as whitespace

Missed this comment from ziglang/zig-spec#83:

> CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.

* Add tests for carriage return handling
2023-02-19 14:14:03 +02:00
Techatrix
c63be507cf don't tokenize an invalid string literal 2023-02-11 14:25:25 +02:00
Veikka Tuominen
841b38aae8 tokenizer: detect null as non-first byte of a line comment
Line comments do not produce actual tokens so they need special
handling for null bytes.

Closes #14346
2023-01-17 20:39:19 +02:00
r00ster91
6b7d9b34e8 api(std.ascii): remove deprecated decls 2022-12-09 21:57:17 +01:00
Veikka Tuominen
6039554b26 tokenizer: detect null bytes before EOF
Closes #13811
2022-12-08 00:16:30 +02:00
Veikka Tuominen
349d78a443 validate number literals in AstGen 2022-09-13 20:26:04 -04:00
r00ster91
83909651ea test: simplify testTokenize
What this does is already done by `expectEqual`.
Now the trace seems to be shorter and more concise so the errors should be easier to read now.
2022-08-16 00:20:19 +02:00
r00ster91
5490688d65 refactor: use std.ascii functions 2022-08-16 00:20:19 +02:00
r00ster91
e3b3eab840 test(names): some renamings 2022-08-16 00:20:19 +02:00
r00ster91
f07cba10a3 test(names): remove unnecessary "tokenizer - " prefix 2022-08-16 00:20:19 +02:00
zooster
8fd20a5eb0 fix: disallow newline in char literal 2022-08-10 16:13:56 -04:00
Ali Chraghi
a4df443f96 Update Tokenizer Dump Function
fix missed `loc` field
2022-02-20 17:47:42 -05:00
Veikka Tuominen
9c36cf92f0 parser: make some errors point to end of previous token
For some errors if the found token is not on the same line as
the previous token, point to the end of the previous token.
This usually results in more helpful errors.
2022-02-17 14:23:35 +02:00
Andrew Kelley
902df103c6 std lib API deprecations for the upcoming 0.9.0 release
See #3811
2021-11-30 00:13:07 -07:00
Travis Staloch
4870595352 sat-arithmetic: add additional tokenizer tests 2021-09-28 17:03:43 -07:00
Travis Staloch
dcbc52ec85 sat-arithmetic: correctly tokenize <<|, <<|=
- set state rather than result.tag in tokenizer.zig
- add test to tokenizer.zig for <<, <<|, <<|=
2021-09-28 17:03:43 -07:00
Travis Staloch
29f41896ed sat-arithmetic: add operator support
- adds initial support for the operators +|, -|, *|, <<|, +|=, -|=, *|=, <<|=
- uses operators in addition to builtins in behavior test
- adds binOpExt() and assignBinOpExt() to AstGen.zig. these need to be audited
2021-09-28 17:02:43 -07:00
Ryan Liptak
3b09262c12 tokenizer: Fix index-out-of-bounds on unfinished unicode escapes before EOF 2021-09-22 14:33:33 -04:00
Andrew Kelley
1ad905c71e
Merge pull request #9649 from Snektron/address-space
Address Spaces
2021-09-20 20:37:04 -04:00
Ryan Liptak
2a728f6e5f tokenizer: Fix index-out-of-bounds on string_literal_backslash right before EOF 2021-09-20 20:16:14 -04:00
Robin Voetter
ccc7f9987d Address spaces: addrspace(A) parsing
The grammar for function prototypes, (global) variable declarations, and
pointer types now accepts an optional addrspace(A) modifier.
2021-09-20 02:29:03 +02:00
Andrew Kelley
05cf44933d stage2: delete keywords true, false, undefined, null
The grammar does not need these as keywords; they are merely primitives
provided by the language the same as `void`, `u32`, etc.
2021-08-28 12:10:55 -07:00
Andrew Kelley
d29871977f remove redundant license headers from zig standard library
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.

Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
2021-08-24 12:25:09 -07:00
Andrew Kelley
ec63411905 Revert "Skip over CRs at the end of multiline literals"
This reverts commit 9de452f9a69d5590743a194bc2d0817d26d66a0b.

No CRs allowed in multiline string literals - this is intentional.
2021-07-07 18:00:04 -07:00
Daniele Cocca
9de452f9a6 Skip over CRs at the end of multiline literals
Fixes #9257.
This is needed when tokenizing input containing DOS line endings, i.e.
the CRLF sequence.
2021-07-07 20:03:19 +03:00
Andrew Kelley
c5c23db627 tokenizer: clean up invalid token error
It now displays the byte with proper printability handling. This makes
the relevant compile error test case no longer a regression in quality
from stage1 to stage2.
2021-07-02 13:28:31 -07:00
Andrew Kelley
7a2e0d9810 AstGen: cleanups to pass more compile error test cases 2021-07-02 13:28:29 -07:00
Andrew Kelley
24c432608f stage2: improve compile errors from tokenizer
In order to not regress the quality of compile errors, some improvements
had to be made.

 * std.zig.parseCharLiteral is improved to return more detailed parse
   failure information.
 * tokenizer is improved to handle null bytes in the middle of strings,
   character literals, and line comments.
 * validating how many unicode escape digits in string literals is moved
   to std.zig.parseStringLiteral rather than handled in the tokenizer.
 * when a tokenizer error occurs, if the reported token is the 'invalid'
   tag, an error note is added to point to the invalid byte location.
   Further improvements would be:
   - Mention the expected set of allowed bytes at this location.
   - Display the invalid byte (if printable, print it, otherwise
     escape-print it).
2021-07-02 13:27:35 -07:00
Andrew Kelley
3f680abbe2 stage2: tokenizer: require null terminated source
By requiring the source file to be null-terminated, we avoid extra
branching while simplifying the logic at the same time.

Running ast-check on a large zig source file (udivmodti4_test.zig),
master branch compared to this commit:
 * 4% faster wall clock
 * 7% fewer cache misses
 * 1% fewer branches
2021-07-02 13:27:35 -07:00
Jacob G-W
641ecc260f std, src, doc, test: remove unused variables 2021-06-21 17:03:03 -07:00
Dmitry Matveyev
00982f75e9
stage2: Remove special double ampersand parsing case (#9114)
* Remove parser error on double ampersand

* Add failing test for double ampersand case

* Add error when encountering double ampersand in AstGen

"Bit and" operator should not make sense when one of its operands
is an address.

* Check that 2 ampersands are adjacent to each other in source string

* Remove cases of unused variables in tests
2021-06-20 21:04:14 +03:00
Isaac Freund
608bc1cbd5
stage2: disallow 1.e9 and 0x1.p9 as float literals
Instead require `1e9` and `0x1p9`, disallowing the trailing dot.

This change to the grammar is consistent with forbidding `1.` and `0x1.`
as float literals and ensures there is only one way to do things here.
2021-05-31 19:51:11 +00:00
Isaac Freund
ec10595b65 stage2: disallow trailing dot on float literals
This disallows e.g. `1.` or `0x1.` as a float literal, which is
consistent with the grammar.
2021-05-27 21:00:44 -04:00
Matthew Borkowski
f750618846 stop tokenizer from recognizing lone @ or @ followed by a digit as a builtin 2021-05-26 04:13:04 -04:00
Andrew Kelley
44de884980 tokenizer: fix crash on multiline string with only 1 backslash
Closes #8904
2021-05-25 18:13:10 -07:00
Veikka Tuominen
fd77f2cfed std: update usage of std.testing 2021-05-08 15:15:30 +03:00
Andrew Kelley
8e6c2b7a47 Merge remote-tracking branch 'origin/master' into ast-memory-layout 2021-02-24 15:08:23 -07:00
Josh Wolfe
8b9434871e
Avoid concept of a "Unicode character" in documentation and error messages (#8059) 2021-02-24 08:26:13 -05:00
Isaac Freund
f3ee10b454
zig fmt: fix comments ending with EOF after decls
Achieve this by reducing the amount of special casing to handle EOF so
that the already correct logic for normal comments does not need to be
duplicated.
2021-02-22 18:32:37 +01:00
Veikka Tuominen
e2289961c6
snake_case Token.Tag 2021-02-12 02:12:00 +02:00
Andrew Kelley
272a0ab359 zig fmt: implement "line comment followed by top-level comptime" 2021-02-01 20:11:55 -07:00
Andrew Kelley
20554d32c0 zig fmt: start reworking with new memory layout
* start implementation of ast.Tree.firstToken and lastToken
 * clarify some ast.Node doc comments
 * reimplement renderToken
2021-02-01 17:23:49 -07:00
Andrew Kelley
bf8fafc37d stage2: tokenizer does not emit line comments anymore
only std.zig.render cares about these, and it can find them in the
original source easily enough.
2021-01-31 21:57:48 -07:00