These AST nodes now have a flags field and then a bunch of optional
trailing objects. The end result is lower memory usage and consequently
better performance. This is part of an ongoing effort to reduce the
amount of memory parsed ASTs take up.
Running `zig fmt` on the std lib:
* cache-misses: 2,554,321 => 2,534,745
* instructions: 3,293,220,119 => 3,302,479,874
* peak memory: 74.0 MiB => 73.0 MiB
Holding the entire std lib AST in memory at the same time:
93.9 MiB => 88.5 MiB
I'm not sure why I disabled them when landing extended Wasm/WASI
support, but they pass the parser tests just fine now, so I'm gonna
go ahead and re-enable them.
To prevent cache misses, token ids go in their own array, and the
start/end offsets go in a different one.
perf measurement before:
2,667,914 cache-misses:u
2,139,139,935 instructions:u
894,167,331 cycles:u
perf measurement after:
1,757,723 cache-misses:u
2,069,932,298 instructions:u
858,105,570 cycles:u
std.ast uses a singly linked list for lists of things. This is a
breaking change to the self-hosted parser API.
std.ast.Tree has been separated into a private "Parser" type which
represents in-progress parsing, and std.ast.Tree which has only
"output" data. This means cleaner, but breaking, API for parse results.
Specifically, `tokens` and `errors` are no longer SegmentedList but a
slice.
The way to iterate over AST nodes has necessarily changed since lists of
nodes are now singly linked lists rather than SegmentedList.
From these changes, I observe the following on the
self-hosted-parser benchmark from ziglang/gotta-go-fast:
throughput: 45.6 MiB/s => 55.6 MiB/s
maxrss: 359 KB => 342 KB
This commit breaks the build; more updates are necessary to fix API
usage of the self-hosted parser.
* Make the tokenizer spit out an Invalid token on the first invalid
character found in the number literal.
* More parsing and tokenizer tests for number literals
* fix invalid switch statement in ir.zig