mirror/zig - zig - Bouvais Git

mirror/zig

mirror of https://github.com/ziglang/zig.git synced 2025-12-08 07:13:08 +00:00

Author	SHA1	Message	Date
Andrew Kelley	d789f1e5cf	fuzzer: write inputs to shared memory before running breaking change to the fuzz testing API; it now passes a type-safe context parameter to the fuzz function. libfuzzer is reworked to select inputs from the entire corpus. I tested that it's roughly as good as it was before in that it can find the panics in the simple examples, as well as achieve decent coverage on the tokenizer fuzz test. however I think the next step here will be figuring out why so many points of interest are missing from the tokenizer in both Debug and ReleaseSafe modes. does not quite close #20803 yet since there are some more important things to be done, such as opening the previous corpus, continuing fuzzing after finding bugs, storing the length of the inputs, etc.	2025-02-11 13:39:20 -08:00
Igor Stojković	0676c04681	tokenizer: fix 0 byte following invalid (#21482 ) closes #21481	2024-09-23 13:06:30 -07:00
Andrew Kelley	892ce7ef52	rework fuzzing API The previous API used `std.testing.fuzzInput(.{})` however that has the problem that users call it multiple times incorrectly, and there might be work happening to obtain the corpus which should not be included in coverage analysis, and which must not slow down iteration speed. This commit restructures it so that the main loop lives in libfuzzer and directly calls the "test one" function. In this commit I was a little too aggressive because I made the test runner export `fuzzer_one` for this purpose. This was motivated by performance, but it causes "exported symbol collision: fuzzer_one" to occur when more than one fuzz test is provided. There are three ways to solve this: 1. libfuzzer needs to be passed a function pointer instead. Possible performance downside. 2. build runner needs to build a different process per fuzz test. Potentially wasteful and unclear how to isolate them. 3. test runner needs to perform a relocation at runtime to point the function call to the relevant unit test. Portability issues and dubious performance gains.	2024-09-11 13:41:29 -07:00
Eric Petersen	36b89101df	tokenizer: use labeled switch statements	2024-09-10 16:09:37 -07:00
Ian Johnson	9007534551	std.zig.tokenizer: simplify line-based tokens Closes #21358 Closes #21360 This commit modifies the `multiline_string_literal_line`, `doc_comment`, and `container_doc_comment` tokens to no longer include the line ending as part of the token. This makes it easier to handle line endings (which may be LF, CRLF, or in edge cases possibly nonexistent) consistently. In the two issues linked above, Autodoc was already assuming this for doc comments, and yielding incorrect results when handling files with CRLF line endings (both in Markdown parsing and source rendering). Applying the same simplification for multiline string literals also brings `zig fmt` into conformance with https://github.com/ziglang/zig-spec/issues/38 regarding formatting of multiline strings with CRLF line endings: the spec says that `zig fmt` should remove the CR from such line endings, but this was not previously the case.	2024-09-10 13:34:33 +03:00
Andrew Kelley	e0ffac4e3c	introduce a web interface for fuzzing * new .zig-cache subdirectory: 'v' - stores coverage information with filename of hash of PCs that want coverage. This hash is a hex encoding of the 64-bit coverage ID. * build runner * fixed bug in file system inputs when a compile step has an overridden zig_lib_dir field set. * set some std lib options optimized for the build runner - no side channel mitigations - no Transport Layer Security - no crypto fork safety * add a --port CLI arg for choosing the port the fuzzing web interface listens on. it defaults to choosing a random open port. * introduce a web server, and serve a basic single page application - shares wasm code with autodocs - assets are created live on request, for convenient development experience. main.wasm is properly cached if nothing changes. - sources.tar comes from file system inputs (introduced with the `--watch` feature) * receives coverage ID from test runner and sends it on a thread-safe queue to the WebServer. * test runner - takes a zig cache directory argument now, for where to put coverage information. - sends coverage ID to parent process * fuzzer - puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log - computes coverage_id and makes it available with `fuzzer_coverage_id` exported function. - the memory-mapped coverage file is now namespaced by the coverage id in hex encoding, in `.zig-cache/v` * tokenizer - add a fuzz test to check that several properties are upheld	2024-08-07 00:48:32 -07:00
Andrew Kelley	c2b8afcac9	tokenizer: tabs and carriage returns spec conformance	2024-07-31 16:57:42 -07:00
Andrew Kelley	377e8579f9	std.zig.tokenizer: simplify I pointed a fuzzer at the tokenizer and it crashed immediately. Upon inspection, I was dissatisfied with the implementation. This commit removes several mechanisms: * Removes the "invalid byte" compile error note. * Dramatically simplifies tokenizer recovery by making recovery always occur at newlines, and never otherwise. * Removes UTF-8 validation. * Moves some character validation logic to `std.zig.parseCharLiteral`. Removing UTF-8 validation is a regression of #663, however, the existing implementation was already buggy. When adding this functionality back, it must be fuzz-tested while checking the property that it matches an independent Unicode validation implementation on the same file. While we're at it, fuzzing should check the other properties of that proposal, such as no ASCII control characters existing inside the source code. Other changes included in this commit: * Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This function has an awkward API that is too easy to misuse. * Make `utf8Decode2` and friends use arrays as parameters, eliminating a runtime assertion in favor of using the type system. After this commit, the crash found by fuzzing, which was "\x07\xd5\x80\xc3=o\xda\|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea" no longer causes a crash. However, I did not feel the need to add this test case because the simplified logic eradicates most crashes of this nature.	2024-07-31 16:57:42 -07:00
gooncreeper	c50f300387	Tokenizer bug fixes and improvements Fixes many error messages corresponding to invalid bytes displaying the wrong byte. Additionaly improves handling of UTF-8 in some places.	2024-07-15 11:31:19 +03:00
Michael Bradshaw	02b3d5b58a	Rename isASCII to isAscii	2024-07-02 16:31:15 +02:00
Travis Staloch	8af59d1f98	ComptimeStringMap: return a regular struct and optimize this patch renames ComptimeStringMap to StaticStringMap, makes it accept only a single type parameter, and return a known struct type instead of an anonymous struct. initial motivation for these changes was to reduce the 'very long type names' issue described here https://github.com/ziglang/zig/pull/19682. this breaks the previous API. users will now need to write: `const map = std.StaticStringMap(T).initComptime(kvs_list);` * move `kvs_list` param from type param to an `initComptime()` param * new public methods * `keys()`, `values()` helpers * `init(allocator)`, `deinit(allocator)` for runtime data * `getLongestPrefix(str)`, `getLongestPrefixIndex(str)` - i'm not sure these belong but have left in for now incase they are deemed useful * performance notes: * i posted some benchmarking results here: https://github.com/travisstaloch/comptime-string-map-revised/issues/1 * i noticed a speedup reducing the size of the struct from 48 to 32 bytes and thus use u32s instead of usize for all length fields * i noticed speedup storing KVs as a struct of arrays * latest benchmark shows these wall_time improvements for debug/safe/small/fast builds: -6.6% / -10.2% / -19.1% / -8.9%. full output in link above.	2024-04-22 15:31:41 -07:00
Jacob Young	509be7cf1f	x86_64: fix std test failures	2023-11-03 23:18:21 -04:00
Jacob Young	fe93332ba2	x86_64: implement enough to pass unicode tests * implement vector comparison * implement reduce for bool vectors * fix `@memcpy` bug * enable passing std tests	2023-10-23 22:42:18 -04:00
Jacob Young	27fe945a00	Revert "Revert "Merge pull request #17637 from jacobly0/x86_64-test-std"" This reverts commit 6f0198cadbe29294f2bf3153a27beebd64377566.	2023-10-22 15:46:43 -04:00
Andrew Kelley	6f0198cadb	Revert "Merge pull request #17637 from jacobly0/x86_64-test-std" This reverts commit 0c99ba1eab63865592bb084feb271cd4e4b0357e, reversing changes made to 5f92b070bf284f1493b1b5d433dd3adde2f46727. This caused a CI failure when it landed in master branch due to a 128-bit `@byteSwap` in std.mem.	2023-10-22 12:16:35 -07:00
Jacob Young	32e85d44eb	x86_64: disable failing tests, enable test-std testing	2023-10-21 10:55:41 -04:00
Jacob Young	2e6e39a700	x86_64: fix bugs and disable erroring tests	2023-10-21 10:55:41 -04:00
mlugg	f26dda2117	all: migrate code to new cast builtin syntax Most of this migration was performed automatically with `zig fmt`. There were a few exceptions which I had to manually fix: * `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten * `@truncate`'s fixup is incorrect for vectors * Test cases are not formatted, and their error locations change	2023-06-24 16:56:39 -07:00
Tom Read Cutting	346ec15c50	Correctly handle carriage return characters according to the spec (#12661 ) * Scan from line start when finding tag in tokenizer This resolves a crash that can occur for invalid bytes like carriage returns that are valid characters when not parsed from within literals. There are potentially other edge cases this could resolve as well, as the calling code for this function didn't account for any potential 'pending_invalid_tokens' that could be queued up by the tokenizer from within another state. * Fix carriage return crash in multiline string Follow the guidance of #38: > However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR. Zig fmt already had code for deleting carriage returns, but would still crash - now it no longer does so. Carriage returns encountered before line-feeds are now appropriately removed on program compilation as well. * Only accept carriage returns before line feeds Previous commit was much less strict about this, this more closely matches the desired spec of only allow CR characters in a CRLF pair, but not otherwise. * Fix CR being rejected when used as whitespace Missed this comment from ziglang/zig-spec#83: > CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt. * Add tests for carriage return handling	2023-02-19 14:14:03 +02:00
Techatrix	c63be507cf	don't tokenize an invalid string literal	2023-02-11 14:25:25 +02:00
Veikka Tuominen	841b38aae8	tokenizer: detect null as non-first byte of a line comment Line comments do not produce actual tokens so they need special handling for null bytes. Closes #14346	2023-01-17 20:39:19 +02:00
r00ster91	6b7d9b34e8	api(std.ascii): remove deprecated decls	2022-12-09 21:57:17 +01:00
Veikka Tuominen	6039554b26	tokenizer: detect null bytes before EOF Closes #13811	2022-12-08 00:16:30 +02:00
Veikka Tuominen	349d78a443	validate number literals in AstGen	2022-09-13 20:26:04 -04:00
r00ster91	83909651ea	test: simplify testTokenize What this does is already done by `expectEqual`. Now the trace seems to be shorter and more concise so the errors should be easier to read now.	2022-08-16 00:20:19 +02:00
r00ster91	5490688d65	refactor: use std.ascii functions	2022-08-16 00:20:19 +02:00
r00ster91	e3b3eab840	test(names): some renamings	2022-08-16 00:20:19 +02:00
r00ster91	f07cba10a3	test(names): remove unnecessary "tokenizer - " prefix	2022-08-16 00:20:19 +02:00
zooster	8fd20a5eb0	fix: disallow newline in char literal	2022-08-10 16:13:56 -04:00
Ali Chraghi	a4df443f96	Update Tokenizer Dump Function fix missed `loc` field	2022-02-20 17:47:42 -05:00
Veikka Tuominen	9c36cf92f0	parser: make some errors point to end of previous token For some errors if the found token is not on the same line as the previous token, point to the end of the previous token. This usually results in more helpful errors.	2022-02-17 14:23:35 +02:00
Andrew Kelley	902df103c6	std lib API deprecations for the upcoming 0.9.0 release See #3811	2021-11-30 00:13:07 -07:00
Travis Staloch	4870595352	sat-arithmetic: add additional tokenizer tests	2021-09-28 17:03:43 -07:00
Travis Staloch	dcbc52ec85	sat-arithmetic: correctly tokenize <<\|, <<\|= - set state rather than result.tag in tokenizer.zig - add test to tokenizer.zig for <<, <<\|, <<\|=	2021-09-28 17:03:43 -07:00
Travis Staloch	29f41896ed	sat-arithmetic: add operator support - adds initial support for the operators +\|, -\|, \|, <<\|, +\|=, -\|=, \|=, <<\|= - uses operators in addition to builtins in behavior test - adds binOpExt() and assignBinOpExt() to AstGen.zig. these need to be audited	2021-09-28 17:02:43 -07:00
Ryan Liptak	3b09262c12	tokenizer: Fix index-out-of-bounds on unfinished unicode escapes before EOF	2021-09-22 14:33:33 -04:00
Andrew Kelley	1ad905c71e	Merge pull request #9649 from Snektron/address-space Address Spaces	2021-09-20 20:37:04 -04:00
Ryan Liptak	2a728f6e5f	tokenizer: Fix index-out-of-bounds on string_literal_backslash right before EOF	2021-09-20 20:16:14 -04:00
Robin Voetter	ccc7f9987d	Address spaces: addrspace(A) parsing The grammar for function prototypes, (global) variable declarations, and pointer types now accepts an optional addrspace(A) modifier.	2021-09-20 02:29:03 +02:00
Andrew Kelley	05cf44933d	stage2: delete keywords `true`, `false`, `undefined`, `null` The grammar does not need these as keywords; they are merely primitives provided by the language the same as `void`, `u32`, etc.	2021-08-28 12:10:55 -07:00
Andrew Kelley	d29871977f	remove redundant license headers from zig standard library We already have a LICENSE file that covers the Zig Standard Library. We no longer need to remind everyone that the license is MIT in every single file. Previously this was introduced to clarify the situation for a fork of Zig that made Zig's LICENSE file harder to find, and replaced it with their own license that required annual payments to their company. However that fork now appears to be dead. So there is no need to reinforce the copyright notice in every single file.	2021-08-24 12:25:09 -07:00
Andrew Kelley	ec63411905	Revert "Skip over CRs at the end of multiline literals" This reverts commit 9de452f9a69d5590743a194bc2d0817d26d66a0b. No CRs allowed in multiline string literals - this is intentional.	2021-07-07 18:00:04 -07:00
Daniele Cocca	9de452f9a6	Skip over CRs at the end of multiline literals Fixes #9257. This is needed when tokenizing input containing DOS line endings, i.e. the CRLF sequence.	2021-07-07 20:03:19 +03:00
Andrew Kelley	c5c23db627	tokenizer: clean up invalid token error It now displays the byte with proper printability handling. This makes the relevant compile error test case no longer a regression in quality from stage1 to stage2.	2021-07-02 13:28:31 -07:00
Andrew Kelley	7a2e0d9810	AstGen: cleanups to pass more compile error test cases	2021-07-02 13:28:29 -07:00
Andrew Kelley	24c432608f	stage2: improve compile errors from tokenizer In order to not regress the quality of compile errors, some improvements had to be made. * std.zig.parseCharLiteral is improved to return more detailed parse failure information. * tokenizer is improved to handle null bytes in the middle of strings, character literals, and line comments. * validating how many unicode escape digits in string literals is moved to std.zig.parseStringLiteral rather than handled in the tokenizer. * when a tokenizer error occurs, if the reported token is the 'invalid' tag, an error note is added to point to the invalid byte location. Further improvements would be: - Mention the expected set of allowed bytes at this location. - Display the invalid byte (if printable, print it, otherwise escape-print it).	2021-07-02 13:27:35 -07:00
Andrew Kelley	3f680abbe2	stage2: tokenizer: require null terminated source By requiring the source file to be null-terminated, we avoid extra branching while simplifying the logic at the same time. Running ast-check on a large zig source file (udivmodti4_test.zig), master branch compared to this commit: * 4% faster wall clock * 7% fewer cache misses * 1% fewer branches	2021-07-02 13:27:35 -07:00
Jacob G-W	641ecc260f	std, src, doc, test: remove unused variables	2021-06-21 17:03:03 -07:00
Dmitry Matveyev	00982f75e9	stage2: Remove special double ampersand parsing case (#9114 ) * Remove parser error on double ampersand * Add failing test for double ampersand case * Add error when encountering double ampersand in AstGen "Bit and" operator should not make sense when one of its operands is an address. * Check that 2 ampersands are adjacent to each other in source string * Remove cases of unused variables in tests	2021-06-20 21:04:14 +03:00
Isaac Freund	608bc1cbd5	stage2: disallow `1.e9` and `0x1.p9` as float literals Instead require `1e9` and `0x1p9`, disallowing the trailing dot. This change to the grammar is consistent with forbidding `1.` and `0x1.` as float literals and ensures there is only one way to do things here.	2021-05-31 19:51:11 +00:00

1 2

100 Commits