Fixes #2379 = Overlong (non-shortest) sequences UTF-8's unique encoding scheme allows for some Unicode codepoints to be represented in multiple ways. For any of these characters, the spec forbids all but the shortest form. These disallowed longer sequences are called "overlong". As an interesting side effect of this rule, the bytes C0 and C1 never appear in valid UTF-8. = Codepoint range UTF-8 disallows representation of codepoints beyond U+10FFFF, which is the highest character which can be encoded in UTF-16. Because a 4-byte sequence is capable of resulting in such characters, they must be explicitly rejected. This rule also has an interesting side effect, which is that bytes F5 to FF never appear. = References Detecting an overlong version of a codepoint could get gnarly, but luckily The Unicode Consortium did the hard work by creating this handy table of valid byte sequences: https://unicode.org/versions/corrigendum1.html I thought this mapped nicely to the parser's state machine, so I rearranged the relevant states to make use of it.
A general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Resources
- Introduction
- Download & Documentation
- Community
- Contributing
- Frequently Asked Questions
- Community Projects
Building from Source
Note that you can download a binary of master branch.
Stage 1: Build Zig from C++ Source Code
Dependencies
POSIX
- cmake >= 2.8.5
- gcc >= 5.0.0 or clang >= 3.6.0
- LLVM, Clang, LLD development libraries == 9.x, compiled with the same gcc or clang version above
- Use the system package manager, or build from source.
Windows
- cmake >= 3.15.3
- Microsoft Visual Studio. Supported versions:
- 2015 (version 14)
- 2017 (version 15.8)
- 2019 (version 16)
- LLVM, Clang, LLD development libraries == 9.x
- Use the pre-built binaries or build from source.
Instructions
POSIX
mkdir build
cd build
cmake ..
make install
MacOS
brew install cmake llvm@9
brew outdated llvm@9 || brew upgrade llvm@9
mkdir build
cd build
cmake .. -DCMAKE_PREFIX_PATH=$(brew --prefix llvm)
make install
Windows
See https://github.com/ziglang/zig/wiki/Building-Zig-on-Windows
Stage 2: Build Self-Hosted Zig from Zig Source Code
Note: Stage 2 compiler is not complete. Beta users of Zig should use the Stage 1 compiler for now.
Dependencies are the same as Stage 1, except now you can use stage 1 to compile Zig code.
bin/zig build --prefix $(pwd)/stage2
This produces ./stage2/bin/zig which can be used for testing and development.
Once it is feature complete, it will be used to build stage 3 - the final compiler
binary.
Stage 3: Rebuild Self-Hosted Zig Using the Self-Hosted Compiler
Note: Stage 2 compiler is not yet able to build Stage 3. Building Stage 3 is not yet supported.
Once the self-hosted compiler can build itself, this will be the actual compiler binary that we will install to the system. Until then, users should use stage 1.
Debug / Development Build
./stage2/bin/zig build --prefix $(pwd)/stage3
Release / Install Build
./stage2/bin/zig build install -Drelease