hryx 2933a8241a json: disallow overlong and out-of-range UTF-8
Fixes #2379

= Overlong (non-shortest) sequences

UTF-8's unique encoding scheme allows for some Unicode codepoints
to be represented in multiple ways. For any of these characters,
the spec forbids all but the shortest form. These disallowed longer
sequences are called "overlong". As an interesting side effect of
this rule, the bytes C0 and C1 never appear in valid UTF-8.

= Codepoint range

UTF-8 disallows representation of codepoints beyond U+10FFFF,
which is the highest character which can be encoded in UTF-16.
Because a 4-byte sequence is capable of resulting in such characters,
they must be explicitly rejected. This rule also has an interesting
side effect, which is that bytes F5 to FF never appear.

= References

Detecting an overlong version of a codepoint could get gnarly, but
luckily The Unicode Consortium did the hard work by creating this
handy table of valid byte sequences:

https://unicode.org/versions/corrigendum1.html

I thought this mapped nicely to the parser's state machine, so I
rearranged the relevant states to make use of it.
2020-01-07 12:07:44 -05:00
2019-10-12 10:57:11 +02:00
2020-01-06 18:26:20 -05:00
2015-08-05 16:22:18 -07:00
2019-12-30 18:17:13 -05:00

ZIG

A general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

Resources

Building from Source

Build Status

Note that you can download a binary of master branch.

Stage 1: Build Zig from C++ Source Code

Dependencies

POSIX
  • cmake >= 2.8.5
  • gcc >= 5.0.0 or clang >= 3.6.0
  • LLVM, Clang, LLD development libraries == 9.x, compiled with the same gcc or clang version above
Windows
  • cmake >= 3.15.3
  • Microsoft Visual Studio. Supported versions:
    • 2015 (version 14)
    • 2017 (version 15.8)
    • 2019 (version 16)
  • LLVM, Clang, LLD development libraries == 9.x

Instructions

POSIX
mkdir build
cd build
cmake ..
make install
MacOS
brew install cmake llvm@9
brew outdated llvm@9 || brew upgrade llvm@9
mkdir build
cd build
cmake .. -DCMAKE_PREFIX_PATH=$(brew --prefix llvm)
make install
Windows

See https://github.com/ziglang/zig/wiki/Building-Zig-on-Windows

Stage 2: Build Self-Hosted Zig from Zig Source Code

Note: Stage 2 compiler is not complete. Beta users of Zig should use the Stage 1 compiler for now.

Dependencies are the same as Stage 1, except now you can use stage 1 to compile Zig code.

bin/zig build --prefix $(pwd)/stage2

This produces ./stage2/bin/zig which can be used for testing and development. Once it is feature complete, it will be used to build stage 3 - the final compiler binary.

Stage 3: Rebuild Self-Hosted Zig Using the Self-Hosted Compiler

Note: Stage 2 compiler is not yet able to build Stage 3. Building Stage 3 is not yet supported.

Once the self-hosted compiler can build itself, this will be the actual compiler binary that we will install to the system. Until then, users should use stage 1.

Debug / Development Build

./stage2/bin/zig build --prefix $(pwd)/stage3

Release / Install Build

./stage2/bin/zig build install -Drelease
Description
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Readme MIT 711 MiB
Languages
Zig 98.3%
C 1.1%
C++ 0.2%
Python 0.1%