mirror/zig - zig - Bouvais Git

mirror/zig

mirror of https://github.com/ziglang/zig.git synced 2026-02-16 06:18:32 +00:00

Author	SHA1	Message	Date
Veikka Tuominen	75acfcf0ea	stage2: reimplement switch	2021-02-01 15:45:11 +02:00
Veikka Tuominen	3ec5c9a3bc	stage2 cbe: implement not and some bitwise ops	2021-02-01 08:48:24 +02:00
Veikka Tuominen	106520329e	stage2 cbe: implement switchbr	2021-02-01 08:48:22 +02:00
Veikka Tuominen	258f3ec5ec	stage2 cbe: block results	2021-02-01 08:47:25 +02:00
Veikka Tuominen	bdfe3aeab8	stage2 cbe: condbr and breaks	2021-02-01 08:47:25 +02:00
Veikka Tuominen	6ca0ff90b6	stage2 cbe: use AutoIndentingStream	2021-02-01 08:47:25 +02:00
Veikka Tuominen	81c512f35b	stage2 cbe: loop instruction	2021-02-01 08:47:25 +02:00
Andrew Kelley	bf8fafc37d	stage2: tokenizer does not emit line comments anymore only std.zig.render cares about these, and it can find them in the original source easily enough.	2021-01-31 21:57:48 -07:00
Andrew Kelley	bf76501b5d	Merge pull request #7847 from ziglang/astgen-rl-rework stage2: rework astgen result locations	2021-01-31 20:15:08 -08:00
Andrew Kelley	0f5eda973e	stage2: delete astgen for switch expressions The astgen for switch expressions did not respect the ZIR rules of only referencing instructions that are in scope: %14 = block_comptime_flat({ %15 = block_comptime_flat({ %16 = const(TypedValue{ .ty = comptime_int, .val = 1}) }) %17 = block_comptime_flat({ %18 = const(TypedValue{ .ty = comptime_int, .val = 2}) }) }) %19 = block({ %20 = ref(%5) %21 = deref(%20) %22 = switchbr(%20, [%15, %17], { %15 => { %23 = const(TypedValue{ .ty = comptime_int, .val = 1}) %24 = store(%10, %23) %25 = const(TypedValue{ .ty = void, .val = {}}) %26 = break("label_19", %25) }, %17 => { %27 = const(TypedValue{ .ty = comptime_int, .val = 2}) %28 = store(%10, %27) %29 = const(TypedValue{ .ty = void, .val = {}}) %30 = break("label_19", %29) } }, { %31 = unreachable_safe() }, special_prong=else) }) In this snippet you can see that the comptime expr referenced %15 and %17 which are not in scope. There also was no test coverage for runtime switch expressions. Switch expressions will have to be re-introduced to follow these rules and with some test coverage. There is some usable code being deleted in this commit; it will be useful to reference when re-implementing switch later. A few more improvements to do while we're at it: * only use .ref result loc on switch target if any prongs obtain the payload with \|syntax\| - this improvement should be done to if, while, and for as well. - this will remove the needless ref/deref instructions above remove switchbr and add switch_block, which is both a block and a switch branch. - similarly we should remove loop and add loop_block. This commit introduces a "force_comptime" flag into the GenZIR scope. The main purpose of this will be to choose the "comptime" variants of certain key zir instructions, such as function calls and branches. We will be moving away from using the block_comptime_flat ZIR instruction, and eventually deleting it. This commit also contains miscellaneous fixes to this branch that bring it to the state of passing all the tests.	2021-01-31 21:09:22 -07:00
Andrew Kelley	de85c4ac42	astgen: rework for loops	2021-01-31 21:09:22 -07:00
Andrew Kelley	9f4ff80108	astgen: rework while	2021-01-31 21:09:22 -07:00
Andrew Kelley	e9e6cc2171	astgen: rework orelse/catch	2021-01-31 21:09:22 -07:00
Andrew Kelley	6c8985fcee	astgen: rework labeled blocks	2021-01-31 21:09:22 -07:00
Andrew Kelley	588171c30b	sema: after block gets peer type resolved, insert type coercions on the break instruction operands. This involves a new TZIR instruction, br_block_flat, which represents a break instruction where the operand is the result of a flat block. See the doc comments on the instructions for more details. How it works: when adding break instructions in semantic analysis, the underlying allocation is slightly padded so that it is the size of a br_block_flat instruction, which allows the break instruction to later be converted without removing instructions inside the parent body. The extra type coercion instructions go into the body of the br_block_flat, and backends are responsible for dispatching the instruction correctly (it should map to the same function calls for related instructions).	2021-01-31 21:09:22 -07:00
Andrew Kelley	06bb360dd2	astgen: respect a const local's type annotation	2021-01-31 21:09:22 -07:00
Andrew Kelley	2f992e1bb3	astgen: const locals that end up being rvalues do not alloc Local variable declarations now detect whether the result location for the initialization expression consumes the result location as a pointer. If it does, then the local is emitted as a LocalPtr. Otherwise it is emitted as a LocalVal. This results in clean, straightforward ZIR code for semantic analysis.	2021-01-31 21:09:22 -07:00
Andrew Kelley	093cbeb018	astgen: `@as` with block_ptr result location	2021-01-31 21:09:22 -07:00
Andrew Kelley	b7452fe35f	stage2: rework astgen result locations Motivating test case: ```zig export fn _start() noreturn { var x: u64 = 1; var y: u32 = 2; var thing: u32 = 1; const result = if (thing == 1) x else y; exit(); } ``` The main idea here is for astgen to output ideal ZIR depending on whether or not the sub-expressions of a block consume the result location. Here, neither `x` nor `y` consume the result location of the conditional expression block, and so the ZIR should communicate the result of the condbr using break instructions, not with the result location pointer. With this commit, this is accomplished: ``` %22 = alloc_inferred() %23 = block({ %24 = const(TypedValue{ .ty = type, .val = bool}) %25 = deref(%18) %26 = const(TypedValue{ .ty = comptime_int, .val = 1}) %27 = cmp_eq(%25, %26) %28 = as(%24, %27) %29 = condbr(%28, { %30 = deref(%4) < there is no longer a store instruction here > %31 = break("label_23", %30) }, { %32 = deref(%11) < there is no longer a store instruction here > %33 = break("label_23", %32) }) }) %34 = store_to_inferred_ptr(%22, %23) <-- the store is only here %35 = resolve_inferred_alloc(%22) ``` However if the result location gets consumed, the break instructions change to break_void, and the result value is communicated only by the stores, not by the break instructions. Implementation: * The GenZIR scope that conditional branches uses now has an optional result location pointer field and a count of how many times the result location ended up being an rvalue (not consumed). * When rvalue() is called on a result location for a block, it increments this counter. After generating the branches of a block, astgen for the conditional branch checks this count and if it is 2 then the store_to_block_ptr instructions are elided and it calls rvalue() using the block result (which will account for peer type resolution on the break operands). astgen has many functions disabled until they can be reworked with these new semantics. That will be done before merging the branch. There are some new rules for astgen to follow regarding result locations and what you are allowed/required to do depending on which one is passed to expr(). See the updated doc comments of ResultLoc for details. I also changed naming conventions of stuff in this commit, sorry about that.	2021-01-31 21:09:22 -07:00
daurnimator	e0a04e7f67	allow more complex comptime fields in std.json	2021-02-01 01:01:50 +11:00
daurnimator	f88bb56ee5	std.json union handling should bubble up AllocationRequired	2021-02-01 01:00:15 +11:00
daurnimator	33c0a01b08	std.json support for comptime fields Closes #6231	2021-01-31 23:41:32 +11:00
Veikka Tuominen	fdc875ed00	Merge pull request #7750 from tadeokondrak/6609-tagtype-tag Remove @TagType; std.meta.TagType -> std.meta.Tag	2021-01-31 12:37:12 +02:00
Andrew Kelley	4dca99d3f6	stage2: rework AST memory layout This is a proof-of-concept of switching to a new memory layout for tokens and AST nodes. The goal is threefold: * smaller memory footprint * faster performance for tokenization and parsing * most importantly, a proof-of-concept that can be also applied to ZIR and TZIR to improve the entire compiler pipeline in this way. I had a few key insights here: * Underlying premise: using less memory will make things faster, because of fewer allocations and better cache utilization. Also using less memory is valuable in and of itself. * Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of padding between the enum tag (which kind of token is it; which kind of AST node is it) and the next fields in the struct. It also improves cache coherence, since one can peek ahead in the tokens array without having to load the source locations of tokens. * Token memory can be conserved by only having the tag (1 byte) and byte offset (4 bytes) for a total of 5 bytes per token. It is not necessary to store the token ending byte offset because one can always re-tokenize later, but also most tokens the length can be trivially determined from the tag alone, and for ones where it doesn't, string literals for example, one must parse the string literal again later anyway in astgen, making it free to re-tokenize. * AST nodes do not actually need to store more than 1 token index because one can poke left and right in the tokens array very cheaply. So far we are left with one big problem though: how can we put AST nodes into an array, since different AST nodes are different sizes? This is where my key observation comes in: one can have a hash table for the extra data for the less common AST nodes! But it gets even better than that: I defined this data that is always present for every AST Node: * tag (1 byte) - which AST node is it * main_token (4 bytes, index into tokens array) - the tag determines which token this points to * struct{lhs: u32, rhs: u32} - enough to store 2 indexes to other AST nodes, the tag determines how to interpret this data You can see how a binary operation, such as `a * b` would fit into this structure perfectly. A unary operation, such as `a` would also fit, and leave `rhs` unused. So this is a total of 13 bytes per AST node. And again, we don't have to pay for the padding to round up to 16 because we store in struct-of-arrays format. I made a further observation: the only kind of data AST nodes need to store other than the main_token is indexes to sub-expressions. That's it. The only purpose of an AST is to bring a tree structure to a list of tokens. This observation means all the data that nodes store are only sets of u32 indexes to other nodes. The other tokens can be found later by the compiler, by poking around in the tokens array, which again is super fast because it is struct-of-arrays, so you often only need to look at the token tags array, which is an array of bytes, very cache friendly. So for nearly every kind of AST node, you can store it in 13 bytes. For the rarer AST nodes that have 3 or more indexes to other nodes to store, either the lhs or the rhs will be repurposed to be an index into an extra_data array which contains the extra AST node indexes. In other words, no hash table needed, it's just 1 big ArrayList with the extra data for AST Nodes. Final observation, no need to have a canonical tag for a given AST. For example: The expression `foo(bar)` is a function call. Function calls can have any number of parameters. However in this example, we can encode the function call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs for the function expr and rhs for the only parameter expr. Meanwhile if the code was `foo(bar, baz)` then the AST node would have to be `FunctionCall` with lhs still being the function expr, but rhs being the index into `extra_data`. Then because the tag is `FunctionCall` it means `extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end". Now the range `extra_data[start..end]` describes the list of parameters to the function. Point being, you only have to pay for the extra bytes if the AST actually requires it. There's no limit to the number of different AST tag encodings. Preliminary results: 15% improvement on cache-misses * 28% improvement on total instructions executed * 26% improvement on total CPU cycles * 22% improvement on wall clock time This is 1/4 items on the checklist before this can actually be merged: * [x] parser * [ ] render (zig fmt) * [ ] astgen * [ ] translate-c	2021-01-30 20:16:59 -07:00
Andrew Kelley	766b315b38	std.GeneralPurposeAllocator: logging improvements It now uses the log scope "gpa" instead of "std". Additionally, there is a new config option `verbose_log` which enables info log messages for every allocation. Can be useful when debugging. This option is off by default.	2021-01-30 20:15:26 -07:00
Andrew Kelley	0808d98e10	add std.MultiArrayList Also known as "Struct-Of-Arrays" or "SOA". The purpose of this data structure is to provide a similar API to ArrayList but instead of the element type being a struct, the fields of the struct are in N different arrays, all with the same length and capacity. Having this abstraction means we can put them in the same allocation, avoiding overhead with the allocator. It also saves a tiny bit of overhead from the redundant capacity and length fields, since each struct element shares the same value. This is an alternate implementation to #7854.	2021-01-30 20:12:13 -07:00
Tadeo Kondrak	0b5f3c2ef9	Replace @TagType uses, mostly with std.meta.Tag	2021-01-30 22:26:44 +02:00
rgreenblatt	78d2f2b819	FromWriteFileStep for all LibExeObjStep types	2021-01-30 17:50:41 +02:00
Tadeo Kondrak	1637d8ac80	remove @TagType	2021-01-30 13:19:58 +02:00
Tadeo Kondrak	b7767eb834	std.meta: rename TagPayloadType to TagPayload	2021-01-30 13:19:52 +02:00
Tadeo Kondrak	68ec54f386	std.meta: rename TagType to Tag	2021-01-30 13:19:52 +02:00
Dmitry Atamanov	290efc0747	Improve error messages in std.fmt (#7898 )	2021-01-30 13:12:44 +02:00
Michael Dusan	f9b85c6e50	stage1: add error for slice.len incr beyond bounds comptime direct slice.len increment dodges bounds checking but we can emit an error for it, at least in the simple case. - promote original assert to compile-error - add test case closes #7810	2021-01-30 11:19:25 +02:00
Martin Wickham	3d4eeafb47	Fill out more cases for std.meta.sizeof	2021-01-30 11:13:20 +02:00
Asherah Connor	e8740a90b9	complete {Z} deprecation in std.fmt.formatIntValue formatZigEscapes doesn't exist any more.	2021-01-29 20:46:39 +02:00
root	236db6232f	Fix interger overflow when calling joinZ with empty slices	2021-01-27 12:01:18 +02:00
Evan Haas	1ed8c54cd3	translate-c: add wide string literal support Adds support for wide, UTF-16, and UTF-32 string literals. If used to initialize an incomplete array, the same logic as narrow strings is used. Otherwise they are translated as global "anonymous" arrays of the relevant underlying char type. A dot is used in the name to ensure the generated names do not conflict with any other names in the translated program. For example: ```c void my_fn() { const uint32_t foo = U"foo"; } ``` becomes: ```zig const @"zig.UTF32_string_2" = [4]c_uint{ '\u{66}', '\u{6f}', '\u{6f}', 0, }; pub export fn my_fn() void { var foo: [c]const u32 = &@"zig.UTF32_string_2"; } ```	2021-01-26 21:13:06 -08:00
Luuk de Gram	cc46c1b902	Add tests, fix locals that are created in blocks like loops, and handle all breaks correctly	2021-01-26 19:47:15 +01:00
Jakub Konka	79730e6f5c	macho: add arm64 relocation type enum	2021-01-26 08:11:31 +01:00
Joran Dirk Greef	881ecdc72f	Add MAX_RW_COUNT limit to std.os.pread() Fixes: https://github.com/ziglang/zig/issues/7805	2021-01-25 10:41:38 -08:00
Koakuma	09450419d3	Fix f128 NaN check on big-endian hosts On big-endian hosts, zig_f128_isNaN() takes the high and low halves from the wrong element, resulting in buggy NaN detection behavior. This fixes it.	2021-01-25 10:40:23 -08:00
Timon Kruiper	e23bc1f76a	render: fix bug when rendering struct initializer with length 1 This crashed the compiler when running translate-c. See the added test.	2021-01-25 10:40:00 -08:00
Andrew Kelley	4ca1f4ec2e	Merge pull request #7846 from LemonBoy/filtertest stage1: don't filter test blocks with empty label	2021-01-25 10:39:11 -08:00
Evan Haas	57b2176e28	translate-c: Improve array support 1. For incomplete arrays with initializer list (`int x[] = {1};`) use the initializer size as the array size. 2. For arrays initialized with a string literal translate it as an array of character literals instead of `[*c]const u8` 3. Don't crash if an empty initializer is used for an incomplete array. 4. Add a test for multi-character character constants Additionally lay some groundwork for supporting wide string literals. fixes #4831 #7832 #7842	2021-01-25 10:37:23 -08:00
Joran Dirk Greef	68a040aec7	linux: add fallocate() to io_uring	2021-01-25 10:34:20 -08:00
Timon Kruiper	9238d12537	windows: make sure to handle PATH_NOT_FOUND when deleting files Fixes #7879	2021-01-25 10:33:08 -08:00
Andrew Kelley	0cfa39304b	zig cc: recognize more coff linker options Related: #7874	2021-01-24 14:30:28 -07:00
Andrew Kelley	b56e916fa1	Merge branch 'FireFox317-deadlock-windows-fix' Merges #7861	2021-01-24 12:22:51 -07:00
Andrew Kelley	2b321c25ce	std.Progress: call refreshWithHeldLock as appropriate	2021-01-24 12:22:17 -07:00
Timon Kruiper	4f7d76f19c	fix windows bug in Progress.zig This bug caused the compiler to deadlock when multiple c objects were build in parallel. Thanks @kprotty for finding this bug!	2021-01-24 12:20:51 -07:00

... 12 13 14 15 16 ...

13173 Commits