139 Commits

Author SHA1 Message Date
Isaac Freund
a5cb4ab95e parser: disallow ptr modifiers on array types 2021-03-12 00:18:30 +01:00
Isaac Freund
b988815bf0 parser: fix parsing/rendering of a[b.. :c] slicing
The modification to the grammar in the comment is in line with the
grammar in the zig-spec repo.

Note: checking if the previous token is a colon is insufficent to tell
if a block has a label, the identifier must be checked for as well. This
can be seen in sentinel terminated slicing: `foo[0..1:{}]`
2021-03-08 01:37:28 +01:00
Vincent Rischmann
272ae0ca0d fix parsing of assignment with 'inline for' and 'inline while' 2021-03-06 17:39:54 -08:00
Andrew Kelley
434fce2146 zig fmt: recovery: missing while rbrace
Previously, this test case resulted in zig fmt entering an endless loop.
2021-03-04 20:54:09 -07:00
Isaac Freund
7b5b7bda87 parser: fix infinite loop on missing comma in param list 2021-03-01 16:09:57 -08:00
Andrew Kelley
9ada7638a5 zig fmt: function with labeled block as return type 2021-02-24 16:42:46 -07:00
Andrew Kelley
988f1c6a6f zig fmt: fn proto end with anytype and comma
also

zig fmt: space after top level doc comment
2021-02-23 18:23:49 -07:00
Isaac Freund
5306b1a9ab
zig fmt: container doc comments 2021-02-23 18:32:47 +01:00
Veikka Tuominen
928790364a
zig fmt: correct Node.firstToken for .fn_decl, add error for missing container 2021-02-22 17:39:41 +02:00
Veikka Tuominen
67dac2936c
parser: warn on missing for loop payload, recover from invalid global error set access 2021-02-22 10:04:05 +02:00
Andrew Kelley
878e99d580 parser: fix recovery for missing semicolons 2021-02-21 18:04:23 -07:00
Andrew Kelley
79f1876367 parser: remove support for recovering from extra top level end curlies
After #35 is implemented,
we should be able to recover from this *at any indentation level*,
reporting a parse error and yet also parsing all the decls even
inside structs. Until then, I don't want to add any hacks to make
this work.
2021-02-21 17:57:04 -07:00
Andrew Kelley
866f7dc7d6 parser: support more recovery test cases 2021-02-21 17:37:10 -07:00
Andrew Kelley
15603f403c AST: use fn_proto not fn_decl for extern decls
saves a few bytes per extern function declaration
2021-02-21 16:01:22 -07:00
Andrew Kelley
88d0e77b97 parse: implement error for invalid bit range and alignment 2021-02-21 00:18:20 -07:00
Andrew Kelley
8fee41b1d5 stage2: AST: clean up parse errors
* struct instead of tagged union
 * delete dead code
 * simplify parser code
 * remove unnecessary metaprogramming
2021-02-19 18:04:52 -07:00
Isaac Freund
95b95ea33e
stage2: make same line doc comments a parse error
Allowing same line doc comments causes some ambiguity as to how
generated docs should represent the case in which both same line
and preceding line doc comments are present:

/// preceding line
const foobar = 42; /// same line

Furthermore disallowing these makes things simpler as there is now only
one way to add a doc comment to a decl or struct field.
2021-02-19 22:59:27 +01:00
Andrew Kelley
c2b4d51749 astgen: update a handful of expression types to new mem layout
break, continue, blocks, bit_not, negation, identifiers, string
literals, integer literals, inline assembly

also gave multiline string literals a different node tag from regular
string literals, for code clarity and to avoid an unnecessary load from
token_tags array.
2021-02-13 21:40:12 -07:00
Veikka Tuominen
bb22490fcc
snake_case Node.Tag 2021-02-12 02:12:43 +02:00
Veikka Tuominen
e2289961c6
snake_case Token.Tag 2021-02-12 02:12:00 +02:00
Isaac Freund
5df7fc36c6 zig fmt: implement Tree.lastToken() for struct init 2021-02-10 11:53:53 -08:00
Isaac Freund
928f6f48a6 zig fmt: implement Tree.lastToken() for array init 2021-02-10 11:53:53 -08:00
Isaac Freund
a524e57090 zig fmt: support bodyless function decls
extern function declarations do not have a body, so allow setting
the rhs for FnDecl to 0 to indicate this is the case.
2021-02-10 11:53:53 -08:00
Andrew Kelley
fa5fcdd734 zig fmt: fix regression with many container members 2021-02-09 22:42:00 -07:00
Andrew Kelley
36eee7bc6c zig fmt: anytype, fn calls with one param, trailing commas
and extra newlines between top level declarations
2021-02-09 22:26:21 -07:00
Andrew Kelley
39acc4c020 zig fmt: for loops 2021-02-09 20:08:40 -07:00
Andrew Kelley
1c79eea125 zig fmt: while loops 2021-02-09 17:23:57 -07:00
Andrew Kelley
b1d8a0a5a6 zig fmt: asm expressions 2021-02-08 22:03:23 -07:00
Isaac Freund
d869133a9f zig fmt: implement switches 2021-02-08 15:41:31 -08:00
Isaac Freund
57cec38e61 std/zig/ast: fix Tree.lastToken() for blocks
The fact that blocks may end in a semicolon but this semicolon is not
counted by recursive lastToken() evaluation on the sub expression causes
off-by-one errors for lastToken() on blocks currently.

To fix this, introduce BlockSemicolon and BlockTwoSemicolon following
the pattern used for trailing commas in e.g. builtin function arguments.
2021-02-07 14:51:37 -08:00
Isaac Freund
0e38362d24 zig fmt: split Slice and SliceSentinel
This saves 4 whole bytes in the common case where there is no sentinel.
2021-02-07 14:51:37 -08:00
Isaac Freund
1d71b19c0d zig fmt: implement error set decls 2021-02-07 14:51:37 -08:00
Isaac Freund
33915cb1ed zig fmt: implement pointer types
rename PtrType => PtrTypeBitRange, SliceType => PtrType

This rename was done as the current SliceType is used for non-bitrange
pointers as well as slices and because PtrTypeSentinel/PtrTypeAligned
are also used for slices. Therefore using the same Ptr prefix for all
these pointer/slice nodes is an improvement.
2021-02-06 21:29:45 -08:00
Andrew Kelley
d898945786 zig fmt: builtin call with trailing comma 2021-02-05 20:38:30 -07:00
Andrew Kelley
16a2562c3f zig fmt: implement container decls 2021-02-05 15:47:18 -07:00
Isaac Freund
6f3b93e2e8 zig fmt: struct and anon array initialization 2021-02-05 10:51:45 -08:00
Andrew Kelley
7069459a76 zig fmt: implement struct init 2021-02-04 19:59:06 -07:00
Andrew Kelley
8e46d06650 zig fmt: implement fn protos and defers 2021-02-04 16:38:29 -07:00
Andrew Kelley
725adf8332 zig fmt: builtin calls and array access 2021-02-03 22:12:11 -07:00
Andrew Kelley
1a83b29bea zig fmt: implement if, call, field access, assignment 2021-02-02 21:05:53 -07:00
Andrew Kelley
20554d32c0 zig fmt: start reworking with new memory layout
* start implementation of ast.Tree.firstToken and lastToken
 * clarify some ast.Node doc comments
 * reimplement renderToken
2021-02-01 17:23:49 -07:00
Andrew Kelley
bf8fafc37d stage2: tokenizer does not emit line comments anymore
only std.zig.render cares about these, and it can find them in the
original source easily enough.
2021-01-31 21:57:48 -07:00
Andrew Kelley
4dca99d3f6 stage2: rework AST memory layout
This is a proof-of-concept of switching to a new memory layout for
tokens and AST nodes. The goal is threefold:

 * smaller memory footprint
 * faster performance for tokenization and parsing
 * most importantly, a proof-of-concept that can be also applied to ZIR
   and TZIR to improve the entire compiler pipeline in this way.

I had a few key insights here:

 * Underlying premise: using less memory will make things faster, because
   of fewer allocations and better cache utilization. Also using less
   memory is valuable in and of itself.
 * Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of
   padding between the enum tag (which kind of token is it; which kind
   of AST node is it) and the next fields in the struct. It also improves
   cache coherence, since one can peek ahead in the tokens array without
   having to load the source locations of tokens.
 * Token memory can be conserved by only having the tag (1 byte) and byte
   offset (4 bytes) for a total of 5 bytes per token. It is not necessary
   to store the token ending byte offset because one can always re-tokenize
   later, but also most tokens the length can be trivially determined from
   the tag alone, and for ones where it doesn't, string literals for
   example, one must parse the string literal again later anyway in
   astgen, making it free to re-tokenize.
 * AST nodes do not actually need to store more than 1 token index because
   one can poke left and right in the tokens array very cheaply.

So far we are left with one big problem though: how can we put AST nodes
into an array, since different AST nodes are different sizes?

This is where my key observation comes in: one can have a hash table for
the extra data for the less common AST nodes! But it gets even better than
that:

I defined this data that is always present for every AST Node:

 * tag (1 byte)
   - which AST node is it
 * main_token (4 bytes, index into tokens array)
   - the tag determines which token this points to
 * struct{lhs: u32, rhs: u32}
   - enough to store 2 indexes to other AST nodes, the tag determines
     how to interpret this data

You can see how a binary operation, such as `a * b` would fit into this
structure perfectly. A unary operation, such as `*a` would also fit,
and leave `rhs` unused. So this is a total of 13 bytes per AST node.
And again, we don't have to pay for the padding to round up to 16 because
we store in struct-of-arrays format.

I made a further observation: the only kind of data AST nodes need to
store other than the main_token is indexes to sub-expressions. That's it.
The only purpose of an AST is to bring a tree structure to a list of tokens.
This observation means all the data that nodes store are only sets of u32
indexes to other nodes. The other tokens can be found later by the compiler,
by poking around in the tokens array, which again is super fast because it
is struct-of-arrays, so you often only need to look at the token tags array,
which is an array of bytes, very cache friendly.

So for nearly every kind of AST node, you can store it in 13 bytes. For the
rarer AST nodes that have 3 or more indexes to other nodes to store, either
the lhs or the rhs will be repurposed to be an index into an extra_data array
which contains the extra AST node indexes. In other words, no hash table needed,
it's just 1 big ArrayList with the extra data for AST Nodes.

Final observation, no need to have a canonical tag for a given AST. For example:
The expression `foo(bar)` is a function call. Function calls can have any
number of parameters. However in this example, we can encode the function
call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs
for the function expr and rhs for the only parameter expr. Meanwhile if the
code was `foo(bar, baz)` then the AST node would have to be `FunctionCall`
with lhs still being the function expr, but rhs being the index into
`extra_data`. Then because the tag is `FunctionCall` it means
`extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end".
Now the range `extra_data[start..end]` describes the list of parameters
to the function.

Point being, you only have to pay for the extra bytes if the AST actually
requires it. There's no limit to the number of different AST tag encodings.

Preliminary results:

 * 15% improvement on cache-misses
 * 28% improvement on total instructions executed
 * 26% improvement on total CPU cycles
 * 22% improvement on wall clock time

This is 1/4 items on the checklist before this can actually be merged:

 * [x] parser
 * [ ] render (zig fmt)
 * [ ] astgen
 * [ ] translate-c
2021-01-30 20:16:59 -07:00
LemonBoy
ac004e1bf1 stage1: Allow nameless test blocks
Nameless blocks are never filtered, the test prefix is still applied.
2021-01-22 15:46:58 +01:00
Frank Denis
6c2e0c2046 Year++ 2020-12-31 15:45:24 -08:00
Vexu
98d5bfbd4d
update grammar in langref 2020-11-22 21:30:09 +02:00
Vexu
a63fd34c50
return a valid node even if invalid deref was used 2020-10-29 19:20:15 +02:00
Travis
d7f9128b5d add error message to zig side of tokenizing/parsing 2020-10-29 12:03:45 -05:00
Tadeo Kondrak
069fbb3c01
Add opaque type syntax 2020-10-06 22:08:24 -06:00
Andrew Kelley
4a69b11e74 add license header to all std lib files
add SPDX license identifier
copyright ownership is zig contributors
2020-08-20 16:07:04 -04:00