mirror/zig - zig - Bouvais Git

mirror/zig

mirror of https://github.com/ziglang/zig.git synced 2025-12-06 14:23:09 +00:00

Author	SHA1	Message	Date
Isaac Freund	d869133a9f	zig fmt: implement switches	2021-02-08 15:41:31 -08:00
Isaac Freund	57cec38e61	std/zig/ast: fix Tree.lastToken() for blocks The fact that blocks may end in a semicolon but this semicolon is not counted by recursive lastToken() evaluation on the sub expression causes off-by-one errors for lastToken() on blocks currently. To fix this, introduce BlockSemicolon and BlockTwoSemicolon following the pattern used for trailing commas in e.g. builtin function arguments.	2021-02-07 14:51:37 -08:00
Isaac Freund	0e38362d24	zig fmt: split Slice and SliceSentinel This saves 4 whole bytes in the common case where there is no sentinel.	2021-02-07 14:51:37 -08:00
Isaac Freund	1d71b19c0d	zig fmt: implement error set decls	2021-02-07 14:51:37 -08:00
Isaac Freund	33915cb1ed	zig fmt: implement pointer types rename PtrType => PtrTypeBitRange, SliceType => PtrType This rename was done as the current SliceType is used for non-bitrange pointers as well as slices and because PtrTypeSentinel/PtrTypeAligned are also used for slices. Therefore using the same Ptr prefix for all these pointer/slice nodes is an improvement.	2021-02-06 21:29:45 -08:00
Andrew Kelley	d898945786	zig fmt: builtin call with trailing comma	2021-02-05 20:38:30 -07:00
Andrew Kelley	16a2562c3f	zig fmt: implement container decls	2021-02-05 15:47:18 -07:00
Isaac Freund	6f3b93e2e8	zig fmt: struct and anon array initialization	2021-02-05 10:51:45 -08:00
Andrew Kelley	7069459a76	zig fmt: implement struct init	2021-02-04 19:59:06 -07:00
Andrew Kelley	8e46d06650	zig fmt: implement fn protos and defers	2021-02-04 16:38:29 -07:00
Andrew Kelley	725adf8332	zig fmt: builtin calls and array access	2021-02-03 22:12:11 -07:00
Andrew Kelley	1a83b29bea	zig fmt: implement if, call, field access, assignment	2021-02-02 21:05:53 -07:00
Andrew Kelley	20554d32c0	zig fmt: start reworking with new memory layout * start implementation of ast.Tree.firstToken and lastToken * clarify some ast.Node doc comments * reimplement renderToken	2021-02-01 17:23:49 -07:00
Andrew Kelley	bf8fafc37d	stage2: tokenizer does not emit line comments anymore only std.zig.render cares about these, and it can find them in the original source easily enough.	2021-01-31 21:57:48 -07:00
Andrew Kelley	4dca99d3f6	stage2: rework AST memory layout This is a proof-of-concept of switching to a new memory layout for tokens and AST nodes. The goal is threefold: * smaller memory footprint * faster performance for tokenization and parsing * most importantly, a proof-of-concept that can be also applied to ZIR and TZIR to improve the entire compiler pipeline in this way. I had a few key insights here: * Underlying premise: using less memory will make things faster, because of fewer allocations and better cache utilization. Also using less memory is valuable in and of itself. * Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of padding between the enum tag (which kind of token is it; which kind of AST node is it) and the next fields in the struct. It also improves cache coherence, since one can peek ahead in the tokens array without having to load the source locations of tokens. * Token memory can be conserved by only having the tag (1 byte) and byte offset (4 bytes) for a total of 5 bytes per token. It is not necessary to store the token ending byte offset because one can always re-tokenize later, but also most tokens the length can be trivially determined from the tag alone, and for ones where it doesn't, string literals for example, one must parse the string literal again later anyway in astgen, making it free to re-tokenize. * AST nodes do not actually need to store more than 1 token index because one can poke left and right in the tokens array very cheaply. So far we are left with one big problem though: how can we put AST nodes into an array, since different AST nodes are different sizes? This is where my key observation comes in: one can have a hash table for the extra data for the less common AST nodes! But it gets even better than that: I defined this data that is always present for every AST Node: * tag (1 byte) - which AST node is it * main_token (4 bytes, index into tokens array) - the tag determines which token this points to * struct{lhs: u32, rhs: u32} - enough to store 2 indexes to other AST nodes, the tag determines how to interpret this data You can see how a binary operation, such as `a * b` would fit into this structure perfectly. A unary operation, such as `a` would also fit, and leave `rhs` unused. So this is a total of 13 bytes per AST node. And again, we don't have to pay for the padding to round up to 16 because we store in struct-of-arrays format. I made a further observation: the only kind of data AST nodes need to store other than the main_token is indexes to sub-expressions. That's it. The only purpose of an AST is to bring a tree structure to a list of tokens. This observation means all the data that nodes store are only sets of u32 indexes to other nodes. The other tokens can be found later by the compiler, by poking around in the tokens array, which again is super fast because it is struct-of-arrays, so you often only need to look at the token tags array, which is an array of bytes, very cache friendly. So for nearly every kind of AST node, you can store it in 13 bytes. For the rarer AST nodes that have 3 or more indexes to other nodes to store, either the lhs or the rhs will be repurposed to be an index into an extra_data array which contains the extra AST node indexes. In other words, no hash table needed, it's just 1 big ArrayList with the extra data for AST Nodes. Final observation, no need to have a canonical tag for a given AST. For example: The expression `foo(bar)` is a function call. Function calls can have any number of parameters. However in this example, we can encode the function call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs for the function expr and rhs for the only parameter expr. Meanwhile if the code was `foo(bar, baz)` then the AST node would have to be `FunctionCall` with lhs still being the function expr, but rhs being the index into `extra_data`. Then because the tag is `FunctionCall` it means `extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end". Now the range `extra_data[start..end]` describes the list of parameters to the function. Point being, you only have to pay for the extra bytes if the AST actually requires it. There's no limit to the number of different AST tag encodings. Preliminary results: 15% improvement on cache-misses * 28% improvement on total instructions executed * 26% improvement on total CPU cycles * 22% improvement on wall clock time This is 1/4 items on the checklist before this can actually be merged: * [x] parser * [ ] render (zig fmt) * [ ] astgen * [ ] translate-c	2021-01-30 20:16:59 -07:00
LemonBoy	ac004e1bf1	stage1: Allow nameless test blocks Nameless blocks are never filtered, the test prefix is still applied.	2021-01-22 15:46:58 +01:00
Frank Denis	6c2e0c2046	Year++	2020-12-31 15:45:24 -08:00
Vexu	98d5bfbd4d	update grammar in langref	2020-11-22 21:30:09 +02:00
Vexu	a63fd34c50	return a valid node even if invalid deref was used	2020-10-29 19:20:15 +02:00
Travis	d7f9128b5d	add error message to zig side of tokenizing/parsing	2020-10-29 12:03:45 -05:00
Tadeo Kondrak	069fbb3c01	Add opaque type syntax	2020-10-06 22:08:24 -06:00
Andrew Kelley	4a69b11e74	add license header to all std lib files add SPDX license identifier copyright ownership is zig contributors	2020-08-20 16:07:04 -04:00
Andrew Kelley	9a5a1013a8	std.zig.ast: extract out Node.LabeledBlock from Node.Block This is part of an ongoing effort to reduce size of in-memory AST. This enum flattening pattern is widespread throughout the self-hosted compiler. This is a API breaking change for consumers of the self-hosted parser.	2020-08-14 22:50:00 -04:00
Vexu	f962315363	fix missing parser error for missing comma before eof Closes #5952	2020-07-30 13:10:55 +03:00
Andrew Kelley	aac6e8c418	self-hosted: AST flattening, astgen improvements, result locations, and more * AST: flatten ControlFlowExpression into Continue, Break, and Return. * AST: unify identifiers and literals into the same AST type: OneToken * AST: ControlFlowExpression uses TrailerFlags to optimize storage space. * astgen: support `var` as well as `const` locals, and support explicitly typed locals. Corresponding Module and codegen code is not implemented yet. * astgen: support result locations. * ZIR: add the following instructions (see the corresponding doc comments for explanations of semantics): - alloc - alloc_inferred - bitcast_result_ptr - coerce_result_block_ptr - coerce_result_ptr - coerce_to_ptr_elem - ensure_result_used - ensure_result_non_error - ret_ptr - ret_type - store - param_type * the skeleton structure for result locations is set up. It's looking pretty clean so far. * add compile error for unused result and compile error for discarding errors. * astgen: split builtin calls up to implemented manually, and implement `@as`, `@bitCast` (and others) with respect to result locations. * add CLI support for hex and raw object formats. They are not supported by the self-hosted compiler yet, and emit errors. * rename `--c` CLI to `-ofmt=[objectformat]` which can be any of the object formats. Only ELF and C are supported so far. Also added missing help to the help text. * Remove hard tabs from C backend test cases. Shame on you Noam, you are grounded, you should know better, etc. Bad boy. * Delete C backend code and test case that relied on comptime_int incorrectly making it all the way to codegen.	2020-07-23 23:05:26 -07:00
Andrew Kelley	7a1a924788	stage2: AST: (breaking) flatten out suffix operations	2020-07-21 10:52:24 -07:00
Andrew Kelley	1ac28eed83	stage2 AST: rename OptionalUnwrap to OrElse preparing to flatten suffix operations AST	2020-07-21 10:46:47 -07:00
Andrew Kelley	af12596e8d	stage2: breaking AST memory layout modifications InfixOp is flattened out so that each operator is an independent AST node tag. The two kinds of structs are now Catch and SimpleInfixOp. Beginning implementation of supporting codegen for const locals.	2020-07-15 19:39:18 -07:00
Andrew Kelley	f119092273	stage2: breaking AST memory layout modifications ast.Node.Id => ast.Node.Tag, matching recent style conventions. Now multiple different AST node tags can map to the same AST node data structures. In this commit, simple prefix operators now all map top SimplePrefixOp. `ast.Node.castTag` is now preferred over `ast.Node.cast`. Upcoming: InfixOp flattened out.	2020-07-15 18:15:59 -07:00
Andrew Kelley	804b51b179	stage2: VarDecl and FnProto take advantage of TrailerFlags API These AST nodes now have a flags field and then a bunch of optional trailing objects. The end result is lower memory usage and consequently better performance. This is part of an ongoing effort to reduce the amount of memory parsed ASTs take up. Running `zig fmt` on the std lib: * cache-misses: 2,554,321 => 2,534,745 * instructions: 3,293,220,119 => 3,302,479,874 * peak memory: 74.0 MiB => 73.0 MiB Holding the entire std lib AST in memory at the same time: 93.9 MiB => 88.5 MiB	2020-07-15 02:07:30 -07:00
Andrew Kelley	14cef9dd3d	stage2 parser: split out PrefixOp into separate AST Nodes This is part of a larger effort to improve the memory layout of AST nodes of the self-hosted parser to reduce wasted memory. Reduction of wasted memory also translates to improved performance because of fewer memory allocations, and fewer cache misses. Compared to master, when running `zig fmt` on the std lib: * cache-misses: 801,829 => 768,624 * instructions: 3,234,877,167 => 3,232,075,022 * peak memory: 81480 KB => 75964 KB	2020-07-13 20:13:51 -07:00
Vexu	3e095d8ef3	use 'anytype' in translate-c	2020-07-11 22:04:38 +03:00
Vexu	e85fe13e44	run zig fmt on std lib and self hosted	2020-07-11 20:41:19 +03:00
Vexu	c2fb4bfff3	add 'anytype' to self-hosted parser	2020-07-11 17:41:16 +03:00
Andrew Kelley	6938245fcc	Merge remote-tracking branch 'origin/master' into zig-ast-to-zir	2020-06-22 23:22:17 -04:00
Andrew Kelley	da549a72e1	zig fmt	2020-06-20 18:39:15 -04:00
Andrew Kelley	81f766eecd	self-hosted parser: make a function pointer comptime	2020-06-18 17:12:56 -04:00
Vexu	e7207bc267	add workaround for #5599	2020-06-14 20:13:02 +03:00
Vexu	e07b467c7c	fix missing compile error on while/for missing block	2020-05-25 23:25:06 +03:00
Andrew Kelley	69ff89fd12	stage2 parser: heuristics to pre-allocate token arrays throughput: 72.2 MiB/s => 75.3 MiB/s I also tried the idea from the deleted comment in this commit and it made the throughput worse.	2020-05-25 15:12:23 -04:00
Andrew Kelley	dd05f2be80	run zig fmt on std lib	2020-05-24 10:04:09 -04:00
Andrew Kelley	8df0841d6a	stage2 parser: token ids in their own array To prevent cache misses, token ids go in their own array, and the start/end offsets go in a different one. perf measurement before: 2,667,914 cache-misses:u 2,139,139,935 instructions:u 894,167,331 cycles:u perf measurement after: 1,757,723 cache-misses:u 2,069,932,298 instructions:u 858,105,570 cycles:u	2020-05-22 12:34:12 -04:00
Andrew Kelley	295bca9b5f	stage2 parser: don't append doc comments to the list The DocComment AST node now only points to the first doc comment token. API users are expected to iterate over the following tokens directly. After this commit there are no more linked lists in use in the self-hosted AST API. Performance impact is negligible. Memory usage slightly reduced.	2020-05-22 00:28:59 -04:00
Andrew Kelley	8252c8b9d6	stage2 parser: different multiline string literal parsing strategy and using flat memory rather than singly linked list roughly equivalent performance, slightly reduced memory usage, better API.	2020-05-21 23:25:15 -04:00
Andrew Kelley	19de259936	stage2 parser: arrays and structs directly in memory after the node Slightly reduced memory usage. Roughly the same API and perf.	2020-05-21 22:52:45 -04:00
Andrew Kelley	9377af934f	stage2 parser: SwitchCase uses intrusive array instead of linkedlist no perf impact, but the API is better	2020-05-21 22:28:30 -04:00
Andrew Kelley	d37b81d43b	stage2 parser performance/API improvements * Extract Call ast node tag out of SuffixOp; parameters go in memory after Call. * Demote AsmInput and AsmOutput from AST nodes to structs inside the Asm node. * The following ast nodes get their sub-node lists directly following them in memory: - ErrorSetDecl - Switch - BuiltinCall * ast.Node.Asm gets slices for inputs, outputs, clobbers instead of singly linked lists Performance changes: throughput: 72.7 MiB/s => 74.0 MiB/s maxrss: 72 KB => 69 KB (nice)	2020-05-21 22:01:16 -04:00
Andrew Kelley	32ecb416f3	fix compile errors when setting NodeIndex/TokenIndex to u32	2020-05-21 00:30:08 -04:00
Andrew Kelley	d57d9448aa	stage2 parsing: rework block statements AST memory layout block statements are now directly following the Block AST node rather than a singly linked list. This had negligible impact on performance: throughput: 72.3 MiB/s => 72.7 MiB/s however it greatly improves the API since the statements are laid out in a flat array in memory.	2020-05-20 23:47:04 -04:00
Andrew Kelley	688aa114e4	Revert "stage2 parser: elide memcpy of large initialization lists" This reverts commit 84df1d4f3d0312553f5a3857ed67042319c20846. Not worth the complexity! Always memcpy initialization lists into the arena.	2020-05-20 22:42:43 -04:00

1 2 3 4 5

211 Commits