mirror/zig - zig - Bouvais Git

mirror/zig

mirror of https://github.com/ziglang/zig.git synced 2025-12-29 09:33:18 +00:00

Author	SHA1	Message	Date
Isaac Freund	a5cb4ab95e	parser: disallow ptr modifiers on array types	2021-03-12 00:18:30 +01:00
Isaac Freund	b988815bf0	parser: fix parsing/rendering of a[b.. :c] slicing The modification to the grammar in the comment is in line with the grammar in the zig-spec repo. Note: checking if the previous token is a colon is insufficent to tell if a block has a label, the identifier must be checked for as well. This can be seen in sentinel terminated slicing: `foo[0..1:{}]`	2021-03-08 01:37:28 +01:00
Vincent Rischmann	272ae0ca0d	fix parsing of assignment with 'inline for' and 'inline while'	2021-03-06 17:39:54 -08:00
Andrew Kelley	434fce2146	zig fmt: recovery: missing while rbrace Previously, this test case resulted in zig fmt entering an endless loop.	2021-03-04 20:54:09 -07:00
Isaac Freund	7b5b7bda87	parser: fix infinite loop on missing comma in param list	2021-03-01 16:09:57 -08:00
Andrew Kelley	9ada7638a5	zig fmt: function with labeled block as return type	2021-02-24 16:42:46 -07:00
Andrew Kelley	988f1c6a6f	zig fmt: fn proto end with anytype and comma also zig fmt: space after top level doc comment	2021-02-23 18:23:49 -07:00
Isaac Freund	5306b1a9ab	zig fmt: container doc comments	2021-02-23 18:32:47 +01:00
Veikka Tuominen	928790364a	zig fmt: correct Node.firstToken for .fn_decl, add error for missing container	2021-02-22 17:39:41 +02:00
Veikka Tuominen	67dac2936c	parser: warn on missing for loop payload, recover from invalid global error set access	2021-02-22 10:04:05 +02:00
Andrew Kelley	878e99d580	parser: fix recovery for missing semicolons	2021-02-21 18:04:23 -07:00
Andrew Kelley	79f1876367	parser: remove support for recovering from extra top level end curlies After #35 is implemented, we should be able to recover from this at any indentation level, reporting a parse error and yet also parsing all the decls even inside structs. Until then, I don't want to add any hacks to make this work.	2021-02-21 17:57:04 -07:00
Andrew Kelley	866f7dc7d6	parser: support more recovery test cases	2021-02-21 17:37:10 -07:00
Andrew Kelley	15603f403c	AST: use fn_proto not fn_decl for extern decls saves a few bytes per extern function declaration	2021-02-21 16:01:22 -07:00
Andrew Kelley	88d0e77b97	parse: implement error for invalid bit range and alignment	2021-02-21 00:18:20 -07:00
Andrew Kelley	8fee41b1d5	stage2: AST: clean up parse errors * struct instead of tagged union * delete dead code * simplify parser code * remove unnecessary metaprogramming	2021-02-19 18:04:52 -07:00
Isaac Freund	95b95ea33e	stage2: make same line doc comments a parse error Allowing same line doc comments causes some ambiguity as to how generated docs should represent the case in which both same line and preceding line doc comments are present: /// preceding line const foobar = 42; /// same line Furthermore disallowing these makes things simpler as there is now only one way to add a doc comment to a decl or struct field.	2021-02-19 22:59:27 +01:00
Andrew Kelley	c2b4d51749	astgen: update a handful of expression types to new mem layout break, continue, blocks, bit_not, negation, identifiers, string literals, integer literals, inline assembly also gave multiline string literals a different node tag from regular string literals, for code clarity and to avoid an unnecessary load from token_tags array.	2021-02-13 21:40:12 -07:00
Veikka Tuominen	bb22490fcc	snake_case Node.Tag	2021-02-12 02:12:43 +02:00
Veikka Tuominen	e2289961c6	snake_case Token.Tag	2021-02-12 02:12:00 +02:00
Isaac Freund	5df7fc36c6	zig fmt: implement Tree.lastToken() for struct init	2021-02-10 11:53:53 -08:00
Isaac Freund	928f6f48a6	zig fmt: implement Tree.lastToken() for array init	2021-02-10 11:53:53 -08:00
Isaac Freund	a524e57090	zig fmt: support bodyless function decls extern function declarations do not have a body, so allow setting the rhs for FnDecl to 0 to indicate this is the case.	2021-02-10 11:53:53 -08:00
Andrew Kelley	fa5fcdd734	zig fmt: fix regression with many container members	2021-02-09 22:42:00 -07:00
Andrew Kelley	36eee7bc6c	zig fmt: anytype, fn calls with one param, trailing commas and extra newlines between top level declarations	2021-02-09 22:26:21 -07:00
Andrew Kelley	39acc4c020	zig fmt: for loops	2021-02-09 20:08:40 -07:00
Andrew Kelley	1c79eea125	zig fmt: while loops	2021-02-09 17:23:57 -07:00
Andrew Kelley	b1d8a0a5a6	zig fmt: asm expressions	2021-02-08 22:03:23 -07:00
Isaac Freund	d869133a9f	zig fmt: implement switches	2021-02-08 15:41:31 -08:00
Isaac Freund	57cec38e61	std/zig/ast: fix Tree.lastToken() for blocks The fact that blocks may end in a semicolon but this semicolon is not counted by recursive lastToken() evaluation on the sub expression causes off-by-one errors for lastToken() on blocks currently. To fix this, introduce BlockSemicolon and BlockTwoSemicolon following the pattern used for trailing commas in e.g. builtin function arguments.	2021-02-07 14:51:37 -08:00
Isaac Freund	0e38362d24	zig fmt: split Slice and SliceSentinel This saves 4 whole bytes in the common case where there is no sentinel.	2021-02-07 14:51:37 -08:00
Isaac Freund	1d71b19c0d	zig fmt: implement error set decls	2021-02-07 14:51:37 -08:00
Isaac Freund	33915cb1ed	zig fmt: implement pointer types rename PtrType => PtrTypeBitRange, SliceType => PtrType This rename was done as the current SliceType is used for non-bitrange pointers as well as slices and because PtrTypeSentinel/PtrTypeAligned are also used for slices. Therefore using the same Ptr prefix for all these pointer/slice nodes is an improvement.	2021-02-06 21:29:45 -08:00
Andrew Kelley	d898945786	zig fmt: builtin call with trailing comma	2021-02-05 20:38:30 -07:00
Andrew Kelley	16a2562c3f	zig fmt: implement container decls	2021-02-05 15:47:18 -07:00
Isaac Freund	6f3b93e2e8	zig fmt: struct and anon array initialization	2021-02-05 10:51:45 -08:00
Andrew Kelley	7069459a76	zig fmt: implement struct init	2021-02-04 19:59:06 -07:00
Andrew Kelley	8e46d06650	zig fmt: implement fn protos and defers	2021-02-04 16:38:29 -07:00
Andrew Kelley	725adf8332	zig fmt: builtin calls and array access	2021-02-03 22:12:11 -07:00
Andrew Kelley	1a83b29bea	zig fmt: implement if, call, field access, assignment	2021-02-02 21:05:53 -07:00
Andrew Kelley	20554d32c0	zig fmt: start reworking with new memory layout * start implementation of ast.Tree.firstToken and lastToken * clarify some ast.Node doc comments * reimplement renderToken	2021-02-01 17:23:49 -07:00
Andrew Kelley	bf8fafc37d	stage2: tokenizer does not emit line comments anymore only std.zig.render cares about these, and it can find them in the original source easily enough.	2021-01-31 21:57:48 -07:00
Andrew Kelley	4dca99d3f6	stage2: rework AST memory layout This is a proof-of-concept of switching to a new memory layout for tokens and AST nodes. The goal is threefold: * smaller memory footprint * faster performance for tokenization and parsing * most importantly, a proof-of-concept that can be also applied to ZIR and TZIR to improve the entire compiler pipeline in this way. I had a few key insights here: * Underlying premise: using less memory will make things faster, because of fewer allocations and better cache utilization. Also using less memory is valuable in and of itself. * Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of padding between the enum tag (which kind of token is it; which kind of AST node is it) and the next fields in the struct. It also improves cache coherence, since one can peek ahead in the tokens array without having to load the source locations of tokens. * Token memory can be conserved by only having the tag (1 byte) and byte offset (4 bytes) for a total of 5 bytes per token. It is not necessary to store the token ending byte offset because one can always re-tokenize later, but also most tokens the length can be trivially determined from the tag alone, and for ones where it doesn't, string literals for example, one must parse the string literal again later anyway in astgen, making it free to re-tokenize. * AST nodes do not actually need to store more than 1 token index because one can poke left and right in the tokens array very cheaply. So far we are left with one big problem though: how can we put AST nodes into an array, since different AST nodes are different sizes? This is where my key observation comes in: one can have a hash table for the extra data for the less common AST nodes! But it gets even better than that: I defined this data that is always present for every AST Node: * tag (1 byte) - which AST node is it * main_token (4 bytes, index into tokens array) - the tag determines which token this points to * struct{lhs: u32, rhs: u32} - enough to store 2 indexes to other AST nodes, the tag determines how to interpret this data You can see how a binary operation, such as `a * b` would fit into this structure perfectly. A unary operation, such as `a` would also fit, and leave `rhs` unused. So this is a total of 13 bytes per AST node. And again, we don't have to pay for the padding to round up to 16 because we store in struct-of-arrays format. I made a further observation: the only kind of data AST nodes need to store other than the main_token is indexes to sub-expressions. That's it. The only purpose of an AST is to bring a tree structure to a list of tokens. This observation means all the data that nodes store are only sets of u32 indexes to other nodes. The other tokens can be found later by the compiler, by poking around in the tokens array, which again is super fast because it is struct-of-arrays, so you often only need to look at the token tags array, which is an array of bytes, very cache friendly. So for nearly every kind of AST node, you can store it in 13 bytes. For the rarer AST nodes that have 3 or more indexes to other nodes to store, either the lhs or the rhs will be repurposed to be an index into an extra_data array which contains the extra AST node indexes. In other words, no hash table needed, it's just 1 big ArrayList with the extra data for AST Nodes. Final observation, no need to have a canonical tag for a given AST. For example: The expression `foo(bar)` is a function call. Function calls can have any number of parameters. However in this example, we can encode the function call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs for the function expr and rhs for the only parameter expr. Meanwhile if the code was `foo(bar, baz)` then the AST node would have to be `FunctionCall` with lhs still being the function expr, but rhs being the index into `extra_data`. Then because the tag is `FunctionCall` it means `extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end". Now the range `extra_data[start..end]` describes the list of parameters to the function. Point being, you only have to pay for the extra bytes if the AST actually requires it. There's no limit to the number of different AST tag encodings. Preliminary results: 15% improvement on cache-misses * 28% improvement on total instructions executed * 26% improvement on total CPU cycles * 22% improvement on wall clock time This is 1/4 items on the checklist before this can actually be merged: * [x] parser * [ ] render (zig fmt) * [ ] astgen * [ ] translate-c	2021-01-30 20:16:59 -07:00
LemonBoy	ac004e1bf1	stage1: Allow nameless test blocks Nameless blocks are never filtered, the test prefix is still applied.	2021-01-22 15:46:58 +01:00
Frank Denis	6c2e0c2046	Year++	2020-12-31 15:45:24 -08:00
Vexu	98d5bfbd4d	update grammar in langref	2020-11-22 21:30:09 +02:00
Vexu	a63fd34c50	return a valid node even if invalid deref was used	2020-10-29 19:20:15 +02:00
Travis	d7f9128b5d	add error message to zig side of tokenizing/parsing	2020-10-29 12:03:45 -05:00
Tadeo Kondrak	069fbb3c01	Add opaque type syntax	2020-10-06 22:08:24 -06:00
Andrew Kelley	4a69b11e74	add license header to all std lib files add SPDX license identifier copyright ownership is zig contributors	2020-08-20 16:07:04 -04:00

1 2 3

139 Commits