13568 Commits

Author SHA1 Message Date
daurnimator
f88bb56ee5
std.json union handling should bubble up AllocationRequired 2021-02-01 01:00:15 +11:00
daurnimator
33c0a01b08
std.json support for comptime fields
Closes #6231
2021-01-31 23:41:32 +11:00
Veikka Tuominen
fdc875ed00
Merge pull request #7750 from tadeokondrak/6609-tagtype-tag
Remove @TagType; std.meta.TagType -> std.meta.Tag
2021-01-31 12:37:12 +02:00
Andrew Kelley
4dca99d3f6 stage2: rework AST memory layout
This is a proof-of-concept of switching to a new memory layout for
tokens and AST nodes. The goal is threefold:

 * smaller memory footprint
 * faster performance for tokenization and parsing
 * most importantly, a proof-of-concept that can be also applied to ZIR
   and TZIR to improve the entire compiler pipeline in this way.

I had a few key insights here:

 * Underlying premise: using less memory will make things faster, because
   of fewer allocations and better cache utilization. Also using less
   memory is valuable in and of itself.
 * Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of
   padding between the enum tag (which kind of token is it; which kind
   of AST node is it) and the next fields in the struct. It also improves
   cache coherence, since one can peek ahead in the tokens array without
   having to load the source locations of tokens.
 * Token memory can be conserved by only having the tag (1 byte) and byte
   offset (4 bytes) for a total of 5 bytes per token. It is not necessary
   to store the token ending byte offset because one can always re-tokenize
   later, but also most tokens the length can be trivially determined from
   the tag alone, and for ones where it doesn't, string literals for
   example, one must parse the string literal again later anyway in
   astgen, making it free to re-tokenize.
 * AST nodes do not actually need to store more than 1 token index because
   one can poke left and right in the tokens array very cheaply.

So far we are left with one big problem though: how can we put AST nodes
into an array, since different AST nodes are different sizes?

This is where my key observation comes in: one can have a hash table for
the extra data for the less common AST nodes! But it gets even better than
that:

I defined this data that is always present for every AST Node:

 * tag (1 byte)
   - which AST node is it
 * main_token (4 bytes, index into tokens array)
   - the tag determines which token this points to
 * struct{lhs: u32, rhs: u32}
   - enough to store 2 indexes to other AST nodes, the tag determines
     how to interpret this data

You can see how a binary operation, such as `a * b` would fit into this
structure perfectly. A unary operation, such as `*a` would also fit,
and leave `rhs` unused. So this is a total of 13 bytes per AST node.
And again, we don't have to pay for the padding to round up to 16 because
we store in struct-of-arrays format.

I made a further observation: the only kind of data AST nodes need to
store other than the main_token is indexes to sub-expressions. That's it.
The only purpose of an AST is to bring a tree structure to a list of tokens.
This observation means all the data that nodes store are only sets of u32
indexes to other nodes. The other tokens can be found later by the compiler,
by poking around in the tokens array, which again is super fast because it
is struct-of-arrays, so you often only need to look at the token tags array,
which is an array of bytes, very cache friendly.

So for nearly every kind of AST node, you can store it in 13 bytes. For the
rarer AST nodes that have 3 or more indexes to other nodes to store, either
the lhs or the rhs will be repurposed to be an index into an extra_data array
which contains the extra AST node indexes. In other words, no hash table needed,
it's just 1 big ArrayList with the extra data for AST Nodes.

Final observation, no need to have a canonical tag for a given AST. For example:
The expression `foo(bar)` is a function call. Function calls can have any
number of parameters. However in this example, we can encode the function
call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs
for the function expr and rhs for the only parameter expr. Meanwhile if the
code was `foo(bar, baz)` then the AST node would have to be `FunctionCall`
with lhs still being the function expr, but rhs being the index into
`extra_data`. Then because the tag is `FunctionCall` it means
`extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end".
Now the range `extra_data[start..end]` describes the list of parameters
to the function.

Point being, you only have to pay for the extra bytes if the AST actually
requires it. There's no limit to the number of different AST tag encodings.

Preliminary results:

 * 15% improvement on cache-misses
 * 28% improvement on total instructions executed
 * 26% improvement on total CPU cycles
 * 22% improvement on wall clock time

This is 1/4 items on the checklist before this can actually be merged:

 * [x] parser
 * [ ] render (zig fmt)
 * [ ] astgen
 * [ ] translate-c
2021-01-30 20:16:59 -07:00
Andrew Kelley
766b315b38 std.GeneralPurposeAllocator: logging improvements
It now uses the log scope "gpa" instead of "std".

Additionally, there is a new config option `verbose_log` which enables
info log messages for every allocation. Can be useful when debugging.
This option is off by default.
2021-01-30 20:15:26 -07:00
Andrew Kelley
0808d98e10 add std.MultiArrayList
Also known as "Struct-Of-Arrays" or "SOA". The purpose of this data
structure is to provide a similar API to ArrayList but instead of
the element type being a struct, the fields of the struct are in N
different arrays, all with the same length and capacity.

Having this abstraction means we can put them in the same allocation,
avoiding overhead with the allocator. It also saves a tiny bit of
overhead from the redundant capacity and length fields, since each
struct element shares the same value.

This is an alternate implementation to #7854.
2021-01-30 20:12:13 -07:00
Tadeo Kondrak
0b5f3c2ef9
Replace @TagType uses, mostly with std.meta.Tag 2021-01-30 22:26:44 +02:00
rgreenblatt
78d2f2b819 FromWriteFileStep for all LibExeObjStep types 2021-01-30 17:50:41 +02:00
Tadeo Kondrak
1637d8ac80
remove @TagType 2021-01-30 13:19:58 +02:00
Tadeo Kondrak
b7767eb834
std.meta: rename TagPayloadType to TagPayload 2021-01-30 13:19:52 +02:00
Tadeo Kondrak
68ec54f386
std.meta: rename TagType to Tag 2021-01-30 13:19:52 +02:00
Dmitry Atamanov
290efc0747
Improve error messages in std.fmt (#7898) 2021-01-30 13:12:44 +02:00
Michael Dusan
f9b85c6e50 stage1: add error for slice.len incr beyond bounds
comptime direct slice.len increment dodges bounds checking but
we can emit an error for it, at least in the simple case.

- promote original assert to compile-error
- add test case

closes #7810
2021-01-30 11:19:25 +02:00
Martin Wickham
3d4eeafb47 Fill out more cases for std.meta.sizeof 2021-01-30 11:13:20 +02:00
Asherah Connor
e8740a90b9 complete {Z} deprecation in std.fmt.formatIntValue
formatZigEscapes doesn't exist any more.
2021-01-29 20:46:39 +02:00
root
236db6232f Fix interger overflow when calling joinZ with empty slices 2021-01-27 12:01:18 +02:00
Evan Haas
1ed8c54cd3 translate-c: add wide string literal support
Adds support for wide, UTF-16, and UTF-32 string literals. If used to initialize
an incomplete array, the same logic as narrow strings is used. Otherwise they
are translated as global "anonymous" arrays of the relevant underlying char type.
A dot is used in the name to ensure the generated names do not conflict with any
other names in the translated program.

For example:

```c
void my_fn() {
    const uint32_t *foo = U"foo";
}
```

becomes:
```zig
const @"zig.UTF32_string_2" = [4]c_uint{
    '\u{66}',
    '\u{6f}',
    '\u{6f}',
    0,
};
pub export fn my_fn() void {
    var foo: [*c]const u32 = &@"zig.UTF32_string_2";
}
```
2021-01-26 21:13:06 -08:00
Luuk de Gram
cc46c1b902
Add tests, fix locals that are created in blocks like loops, and handle all breaks correctly 2021-01-26 19:47:15 +01:00
Jakub Konka
79730e6f5c macho: add arm64 relocation type enum 2021-01-26 08:11:31 +01:00
Joran Dirk Greef
881ecdc72f Add MAX_RW_COUNT limit to std.os.pread()
Fixes: https://github.com/ziglang/zig/issues/7805
2021-01-25 10:41:38 -08:00
Koakuma
09450419d3 Fix f128 NaN check on big-endian hosts
On big-endian hosts, zig_f128_isNaN() takes the high and low halves
from the wrong element, resulting in buggy NaN detection behavior.
This fixes it.
2021-01-25 10:40:23 -08:00
Timon Kruiper
e23bc1f76a render: fix bug when rendering struct initializer with length 1
This crashed the compiler when running translate-c. See the added test.
2021-01-25 10:40:00 -08:00
Andrew Kelley
4ca1f4ec2e
Merge pull request #7846 from LemonBoy/filtertest
stage1: don't filter test blocks with empty label
2021-01-25 10:39:11 -08:00
Evan Haas
57b2176e28 translate-c: Improve array support
1. For incomplete arrays with initializer list (`int x[] = {1};`) use the
initializer size as the array size.

2. For arrays initialized with a string literal translate it as an array
of character literals instead of `[*c]const u8`

3. Don't crash if an empty initializer is used for an incomplete array.

4. Add a test for multi-character character constants

Additionally lay some groundwork for supporting wide string literals.

fixes #4831 #7832 #7842
2021-01-25 10:37:23 -08:00
Joran Dirk Greef
68a040aec7 linux: add fallocate() to io_uring 2021-01-25 10:34:20 -08:00
Timon Kruiper
9238d12537 windows: make sure to handle PATH_NOT_FOUND when deleting files
Fixes #7879
2021-01-25 10:33:08 -08:00
Andrew Kelley
0cfa39304b zig cc: recognize more coff linker options
Related: #7874
2021-01-24 14:30:28 -07:00
Andrew Kelley
b56e916fa1 Merge branch 'FireFox317-deadlock-windows-fix'
Merges #7861
2021-01-24 12:22:51 -07:00
Andrew Kelley
2b321c25ce std.Progress: call refreshWithHeldLock as appropriate 2021-01-24 12:22:17 -07:00
Timon Kruiper
4f7d76f19c fix windows bug in Progress.zig
This bug caused the compiler to deadlock when multiple c objects
were build in parallel.

Thanks @kprotty for finding this bug!
2021-01-24 12:20:51 -07:00
Andrew Kelley
15278b7f4b
Merge pull request #7856 from ziglang/lto
add LTO support
2021-01-24 11:09:48 -08:00
Luuk de Gram
a0d81caec9
Nested conditions and loops support 2021-01-24 14:38:35 +01:00
Luuk de Gram
ccef167e9d
Define wasm constants
Update link.Wasm.zig to use std.wasm for its constants

Make opcodes u8 and non-exhaustive

Update test and rename 'spec' to 'wasm'
2021-01-24 10:54:51 +01:00
Andrew Kelley
0d4b6ac741 add LTO support
The CLI gains -flto and -fno-lto options to override the default.
However, the cool thing about this is that the defaults are great! In
general when you use build-exe in release mode, Zig will enable LTO if
it would work and it would help.

zig cc supports detecting and honoring the -flto and -fno-lto flags as
well. The linkWithLld functions are improved to all be the same with
regards to copying the artifact instead of trying to pass single objects
through LLD with -r. There is possibly a future improvement here as
well; see the respective TODOs.

stage1 is updated to support outputting LLVM bitcode instead of machine
code when lto is enabled. This allows LLVM to optimize across the Zig and
C/C++ code boundary.

closes #2845
2021-01-23 18:18:07 -07:00
Andrew Kelley
ab4f3aee3d stage2: wasm arch does not support -mred-zone flags 2021-01-22 23:35:32 -07:00
Andrew Kelley
3647784d05 stage2: add missing frexpl.c to mingw c source file list 2021-01-22 23:35:13 -07:00
LemonBoy
134f5fd3d6 std: Update test "" to test where it makes sense 2021-01-22 15:46:58 +01:00
LemonBoy
ac004e1bf1 stage1: Allow nameless test blocks
Nameless blocks are never filtered, the test prefix is still applied.
2021-01-22 15:46:58 +01:00
Jakub Konka
843d91e75d Bring back stack trace printing on ARM Darwin
This temporary patch fixes a segfault caused by miscompilation
by the LLD when generating stubs for initialization of thread local
storage. We effectively bypass TLS in the default panic handler
so that no segfault is generated and the stack trace is correctly
reported back to the user.

Note that, this is linked directly to a bigger issue with LLD
ziglang/zig#7527 and when resolved, we only need to remove the
`comptime` code path introduced with this patch to use the default
panic handler that relies on TLS.

Co-authored-by: Andrew Kelley <andrew@ziglang.org>
2021-01-21 23:20:42 +01:00
LemonBoy
fc5ae1c409 stage1: don't filter test blocks with empty label
The common pattern of including a file containing all the tests in a
empty-label test block breaks down when using --test-filter.
2021-01-21 09:48:57 +01:00
Evan Haas
bea791b639 translate-c: fix variadic function calls
1702b413 introduced a bug with variadic function calls - trying to access the
paramType of non-existent parameters.
2021-01-20 22:26:18 -08:00
Jakub Konka
58344e0017
Merge pull request #7829 from kubkon/macho-safer
stage2 macho: make int casts fallible where necessary
2021-01-20 17:28:31 +01:00
Andrew Kelley
8098b3f84c stage2: implement TZIR printing for call instruction 2021-01-19 21:09:46 -07:00
Rafael Ristovski
41e6aa78bb zig cc: Support reading input from stdin
This fixes #6271, which allows using `zig cc` with meson.
2021-01-19 17:23:44 -08:00
Andrew Kelley
072d1e088c stage2: fix anonymous Decl ty/val wrong arena
string literals and error set types were allocating the ty/val fields of
the anonymous Decl into the owner Decl's arena, rather than the
new anonymous Decl's arena as intended. This caused use of undefined
value later on in the pipeline.
2021-01-19 16:25:55 -07:00
Andrew Kelley
1af31baf0b stage2: -Dlog enables all logging, log scopes can be set at runtime
Previously you had to recompile if you wanted to change the log scopes
that get printed. Now, log scopes can be set at runtime, and -Dlog
controls whether all logging is available at runtime.

Purpose here is a nicer development experience. Most likely stage2
developers will always want -Dlog enabled and then pass --debug-log
scopes when debugging particular issues.
2021-01-19 15:49:08 -07:00
Jakub Konka
a26ab9afee Backport Elf changes from d5d0619 2021-01-19 22:54:34 +01:00
Jakub Konka
0e56d4cc02 stage2: converge x86_64 and aarch64 tests on macOS 2021-01-19 22:39:49 +01:00
Jakub Konka
5d4401ceec macho: fix overflowing u64 range 2021-01-19 22:39:49 +01:00
Jakub Konka
e726868b02 macho: reuse existing names from the string table 2021-01-19 22:39:49 +01:00