22 Commits

Author SHA1 Message Date
Andrew Kelley
bd2154da3d stage2: the code is compiling again
(with a lot of things commented out)
2021-03-18 22:48:28 -07:00
Andrew Kelley
66245ac834 stage2: Module and Sema are compiling again
Next up is reworking the seam between the LazySrcLoc emitted by Sema
and the byte offsets currently expected by codegen.

And then the big one: updating astgen.zig to use the new memory layout.
2021-03-17 22:54:56 -07:00
Andrew Kelley
38b3d4b00a stage2: work through some compile errors in Module and Sema 2021-03-17 00:56:08 -07:00
Andrew Kelley
aef3e534f5 stage2: *WIP*: rework ZIR memory layout; overhaul source locations
The memory layout for ZIR instructions is completely reworked. See
zir.zig for those changes. Some new types:

 * `zir.Code`: a "finished" set of ZIR instructions. Instead of allocating
   each instruction independently, there is now a Tag and 8 bytes of
   data available for all ZIR instructions. Small instructions fit
   within these 8 bytes; larger ones use 4 bytes for an index into
   `extra`. There is also `string_bytes` so that we can have 4 byte
   references to strings. `zir.Inst.Tag` describes how to interpret
   those 8 bytes of data.
   - This is shared by all `Block` scopes.

 * `Module.WipZirCode`: represents an in-progress `zir.Code`. In this
   structure, the arrays are mutable, and get resized as we add/delete
   things. There is extra state to keep track of things. This struct is
   stored on the stack. Once it is finished, it produces an immutable
   `zir.Code`, which will remain on the heap for the duration of a
   function's existence.
   - This is shared by all `GenZir` scopes.

 * `Sema`: represents in-progress semantic analysis of a `zir.Code`.
   This data is stored on the stack and is shared among all `Block`
   scopes. It is now the main "self" argument to everything in the file
   that was previously named `zir_sema.zig`.
   Additionally, I moved some logic that was in `Module` into here.

`Module.Fn` now stores its parameter names inside the `zir.Code`,
instead of inside ZIR instructions. When the TZIR memory layout
reworking time comes, codegen will be able to reference this data
directly instead of duplicating it.

astgen.zig is (so far) almost entirely untouched, but nearly all of it
will need to be reworked to adhere to this new memory layout structure.

I have no benchmarks to report yet, as I am still working through
compile errors and fixing various things that I broke in this branch.

Overhaul of Source Locations:

Previously we used `usize` everywhere to mean byte offset, but sometimes
also mean other stuff. This was error prone and also made us do
unnecessary work, and store unnecessary bytes in memory.

Now there are more types involved into source locations, and more ways
to describe a source location.

 * AllErrors.Message: embrace the assumption that files always have less
   than 2 << 32 bytes.
 * SrcLoc gets more complicated, to model more complicated source
   locations.
 * Introduce LazySrcLoc, which can model interesting source locations
   with very little stored state. Useful for avoiding doing unnecessary
   work when no compile errors occur.

Also, previously, we had `src: usize` on every ZIR instruction. This is
no longer the case. Each instruction now determines whether it even cares
about source location, and if so, how that source location is stored.
This requires more careful work inside `Sema`, but it results in fewer
bytes stored on the heap, without compromising accuracy and power of
compile error messages.

Miscellaneous:

 * std.zig: string literals have more helpful result values for
   reporting errors. There is now a lower level API and a higher level
   API.
   - side note: I noticed that the string literal logic needs some love.
     There is some unnecessarily hacky code there.
 * cut & pasted some TZIR logic that was in zir.zig to ir.zig. This
   probably broke stuff and needs to get fixed.
 * Removed type/Enum.zig, type/Union.zig, and type/Struct.zig. I don't
   think this quite how this code will be organized. Need some more
   careful planning about how to implement structs, unions, enums. They
   need to be independent Decls, just like a top level function.
2021-03-16 00:03:22 -07:00
joachimschmidt557
bdb917006c stage2 tzir: Add wrapping integer arithmetic instructions 2021-03-11 14:31:21 -05:00
joachimschmidt557
345ac53836
stage2 ARM: Implement basic integer multiplication 2021-03-02 00:14:56 +01:00
g-w1
153c97ac9e improve stage2 to allow catch at comptime:
* add error_union value tag.
* add analyzeIsErr
* add Value.isError
* add TZIR wrap_errunion_payload and wrap_errunion_err for
  wrapping from T -> E!T and E -> E!T
* add anlyzeInstUnwrapErrCode and analyzeInstUnwrapErr
* add analyzeInstEnsureErrPayloadVoid:
* add wrapErrorUnion
* add comptime error comparison for tests
* tests!
2021-02-25 16:41:16 -08:00
Andrew Kelley
7630a5c566 stage2: more progress towards Module/astgen building with new mem layout 2021-02-12 23:47:17 -07:00
Veikka Tuominen
106520329e
stage2 cbe: implement switchbr 2021-02-01 08:48:22 +02:00
Andrew Kelley
6c8985fcee astgen: rework labeled blocks 2021-01-31 21:09:22 -07:00
Andrew Kelley
588171c30b sema: after block gets peer type resolved, insert type coercions
on the break instruction operands. This involves a new TZIR instruction,
br_block_flat, which represents a break instruction where the operand is
the result of a flat block. See the doc comments on the instructions for
more details.

How it works: when adding break instructions in semantic analysis, the
underlying allocation is slightly padded so that it is the size of a
br_block_flat instruction, which allows the break instruction to later
be converted without removing instructions inside the parent body. The
extra type coercion instructions go into the body of the br_block_flat,
and backends are responsible for dispatching the instruction correctly
(it should map to the same function calls for related instructions).
2021-01-31 21:09:22 -07:00
Andrew Kelley
b7452fe35f stage2: rework astgen result locations
Motivating test case:

```zig
export fn _start() noreturn {
    var x: u64 = 1;
    var y: u32 = 2;
    var thing: u32 = 1;
    const result = if (thing == 1) x else y;
    exit();
}
```

The main idea here is for astgen to output ideal ZIR depending on
whether or not the sub-expressions of a block consume the result
location. Here, neither `x` nor `y` consume the result location of the
conditional expression block, and so the ZIR should communicate the
result of the condbr using break instructions, not with the result
location pointer.

With this commit, this is accomplished:

```
  %22 = alloc_inferred()
  %23 = block({
    %24 = const(TypedValue{ .ty = type, .val = bool})
    %25 = deref(%18)
    %26 = const(TypedValue{ .ty = comptime_int, .val = 1})
    %27 = cmp_eq(%25, %26)
    %28 = as(%24, %27)
    %29 = condbr(%28, {
      %30 = deref(%4)
      < there is no longer a store instruction here >
      %31 = break("label_23", %30)
    }, {
      %32 = deref(%11)
      < there is no longer a store instruction here >
      %33 = break("label_23", %32)
    })
  })
  %34 = store_to_inferred_ptr(%22, %23) <-- the store is only here
  %35 = resolve_inferred_alloc(%22)
```

However if the result location gets consumed, the break instructions
change to break_void, and the result value is communicated only by the
stores, not by the break instructions.

Implementation:

 * The GenZIR scope that conditional branches uses now has an optional
   result location pointer field and a count of how many times the
   result location ended up being an rvalue (not consumed).
 * When rvalue() is called on a result location for a block, it
   increments this counter. After generating the branches of a block,
   astgen for the conditional branch checks this count and if it is 2
   then the store_to_block_ptr instructions are elided and it calls
   rvalue() using the block result (which will account for peer type
   resolution on the break operands).

astgen has many functions disabled until they can be reworked with these
new semantics. That will be done before merging the branch.

There are some new rules for astgen to follow regarding result locations
and what you are allowed/required to do depending on which one is passed
to expr(). See the updated doc comments of ResultLoc for details.

I also changed naming conventions of stuff in this commit, sorry about
that.
2021-01-31 21:09:22 -07:00
Andrew Kelley
ecc246efa2 stage2: rework ZIR/TZIR for optionals and error unions
* fix wrong pointer const-ness when unwrapping optionals
 * allow grouped expressions and orelse as lvalues
 * ZIR for unwrapping optionals: no redundant deref
   - add notes to please don't use rlWrapPtr, this function should be
     deleted
 * catch and orelse: better ZIR for non-lvalue: no redundant deref;
   operate entirely on values. lvalue case still works properly.
   - properly propagate the result location into the target expression
 * Test harness: better output when tests fail due to compile errors.
 * TZIR: add instruction variants. These allow fewer TZIR instructions to
   be emitted from zir_sema. See the commit diff for per-instruction
   documentation.
   - is_null
   - is_non_null
   - is_null_ptr
   - is_non_null_ptr
   - is_err
   - is_err_ptr
   - optional_payload
   - optional_payload_ptr
 * TZIR: removed old naming convention instructions:
   - isnonnull
   - isnull
   - iserr
   - unwrap_optional
 * ZIR: add instruction variants. These allow fewer ZIR instructions to
   be emitted from astgen. See the commit diff for per-instruction
   documentation.
   - is_non_null
   - is_null
   - is_non_null_ptr
   - is_null_ptr
   - is_err
   - is_err_ptr
   - optional_payload_safe
   - optional_payload_unsafe
   - optional_payload_safe_ptr
   - optional_payload_unsafe_ptr
   - err_union_payload_safe
   - err_union_payload_unsafe
   - err_union_payload_safe_ptr
   - err_union_payload_unsafe_ptr
   - err_union_code
   - err_union_code_ptr
 * ZIR: removed old naming convention instructions:
   - isnonnull
   - isnull
   - iserr
   - unwrap_optional_safe
   - unwrap_optional_unsafe
   - unwrap_err_safe
   - unwrap_err_unsafe
   - unwrap_err_code
2021-01-18 19:29:18 -07:00
daurnimator
9c97a07f18 std: have std.meta.fieldInfo take an enum rather than a string 2021-01-01 15:48:46 -07:00
Andrew Kelley
a46d24af1c stage2: inferred local variables
This patch introduces the following new things:

Types:
 - inferred_alloc
   - This is a special value that tracks a set of types that have been stored
     to an inferred allocation. It does not support most of the normal type queries.
     However it does respond to `isConstPtr`, `ptrSize`, `zigTypeTag`, etc.
   - The payload for this type simply points to the corresponding Value
     payload.

Values:
 - inferred_alloc
   - This is a special value that tracks a set of types that have been stored
     to an inferred allocation. It does not support any of the normal value queries.

ZIR instructions:
 - store_to_inferred_ptr,
   - Same as `store` but the type of the value being stored will be used to infer
     the pointer type.
 - resolve_inferred_alloc
   - Each `store_to_inferred_ptr` puts the type of the stored value into a set,
     and then `resolve_inferred_alloc` triggers peer type resolution on the set.
     The operand is a `alloc_inferred` or `alloc_inferred_mut` instruction, which
     is the allocation that needs to have its type inferred.

Changes to the C backend:
 * Implements the bitcast instruction. If the source and dest types
   are both pointers, uses a cast, otherwise uses memcpy.
 * Tests are run with -Wno-declaration-after-statement. Someday we can
   conform to this but not today.

In ZIR form it looks like this:

```zir
fn_body main { // unanalyzed
  %0 = dbg_stmt()
=>%1 = alloc_inferred()
  %2 = declval_in_module(Decl(add))
  %3 = deref(%2)
  %4 = param_type(%3, 0)
  %5 = const(TypedValue{ .ty = comptime_int, .val = 1})
  %6 = as(%4, %5)
  %7 = param_type(%3, 1)
  %8 = const(TypedValue{ .ty = comptime_int, .val = 2})
  %9 = as(%7, %8)
  %10 = call(%3, [%6, %9], modifier=auto)
=>%11 = store_to_inferred_ptr(%1, %10)
=>%12 = resolve_inferred_alloc(%1)
  %13 = dbg_stmt()
  %14 = ret_type()
  %15 = const(TypedValue{ .ty = comptime_int, .val = 3})
  %16 = sub(%10, %15)
  %17 = as(%14, %16)
  %18 = return(%17)
} // fn_body main
```

I have not played around with very many test cases yet. Some interesting
ones that I want to look at before merging:

```zig
var x = blk: {
  var y = foo();
  y.a = 1;
  break :blk y;
};
```

In the above test case, x and y are supposed to alias.

```zig
var x = if (bar()) blk: {
  var y = foo();
  y.a = 1;
  break :blk y;
} else blk: {
  var z = baz();
  z.b = 1;
  break :blk z;
};
```

In the above test case, x, y, and z are supposed to alias.

I also haven't tested with `var` instead of `const` yet.
2020-12-31 01:54:02 -07:00
joachimschmidt557
82236a5029
stage2 ARM: implement basic binary bitwise operations 2020-12-21 19:24:21 +01:00
Vexu
769d5a9c43
stage2: switch comptime execution 2020-10-30 15:58:13 +02:00
Vexu
4155d2ae24
stage2: switch ranges and multi item prongs 2020-10-30 15:58:13 +02:00
Vexu
7db17a2d89
stage2: redesign switchbr
Switchbr now only  handles single item prongs.
Ranges and multi item prongs are checked with
condbrs after the switchbr.
2020-10-30 15:58:12 +02:00
Vexu
2020ca640e
stage2: switch emit zir 2020-10-30 15:58:12 +02:00
Vexu
11998d2972
stage2: basic switch analysis 2020-10-30 15:58:12 +02:00
Andrew Kelley
528832bd3a rename src-self-hosted/ to src/ 2020-09-21 18:38:55 -07:00