Igor Anić d645114f7e add deflate implemented from first principles
Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format.

Fixes #18062.

This PR replaces current compress/gzip and compress/zlib packages. Deflate package is renamed to flate. Flate is common name for deflate/inflate where deflate is compression and inflate decompression.

There are breaking change. Methods signatures are changed because of removal of the allocator, and I also unified API for all three namespaces (flate, gzip, zlib).

Currently I put old packages under v1 namespace they are still available as compress/v1/gzip, compress/v1/zlib, compress/v1/deflate. Idea is to give users of the current API little time to postpone analyzing what they had to change. Although that rises question when it is safe to remove that v1 namespace.

Here is current API in the compress package:

```Zig
// deflate
    fn compressor(allocator, writer, options) !Compressor(@TypeOf(writer))
    fn Compressor(comptime WriterType) type

    fn decompressor(allocator, reader, null) !Decompressor(@TypeOf(reader))
    fn Decompressor(comptime ReaderType: type) type

// gzip
    fn compress(allocator, writer, options) !Compress(@TypeOf(writer))
    fn Compress(comptime WriterType: type) type

    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// zlib
    fn compressStream(allocator, writer, options) !CompressStream(@TypeOf(writer))
    fn CompressStream(comptime WriterType: type) type

    fn decompressStream(allocator, reader) !DecompressStream(@TypeOf(reader))
    fn DecompressStream(comptime ReaderType: type) type

// xz
   fn decompress(allocator: Allocator, reader: anytype) !Decompress(@TypeOf(reader))
   fn Decompress(comptime ReaderType: type) type

// lzma
    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// lzma2
    fn decompress(allocator, reader, writer !void

// zstandard:
    fn DecompressStream(ReaderType, options) type
    fn decompressStream(allocator, reader) DecompressStream(@TypeOf(reader), .{})
    struct decompress
```

The proposed naming convention:
 - Compressor/Decompressor for functions which return type, like Reader/Writer/GeneralPurposeAllocator
 - compressor/compressor for functions which are initializers for that type, like reader/writer/allocator
 - compress/decompress for one shot operations, accepts reader/writer pair, like read/write/alloc

```Zig
/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void

/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@TypeOf(writer))

/// Compressor type
fn Compressor(comptime WriterType: type) type

/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@TypeOf(reader)

/// Decompressor type
fn Decompressor(comptime ReaderType: type) type

```

Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases.
More resutls in [this](https://github.com/ianic/flate) repo.

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K.
For inflate this library allocates 74.5K while std around 36K.

Inflate difference is because we here use 64K history instead of 32K in std.

If merged existing usage of compress gzip/zlib/deflate need some changes. Here is example with necessary changes in comments:

```Zig

const std = @import("std");

// To get this file:
// wget -nc -O war_and_peace.txt https://www.gutenberg.org/ebooks/2600.txt.utf-8
const data = @embedFile("war_and_peace.txt");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    try oldDeflate(allocator);
    try new(std.compress.flate, allocator);

    try oldZlib(allocator);
    try new(std.compress.zlib, allocator);

    try oldGzip(allocator);
    try new(std.compress.gzip, allocator);
}

pub fn new(comptime pkg: type, allocator: std.mem.Allocator) !void {
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    var cmp = try pkg.compressor(buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    var dcp = pkg.decompressor(fbs.reader());

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldDeflate(allocator: std.mem.Allocator) !void {
    const deflate = std.compress.v1.deflate;

    // Compressor
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();
    // Remove allocator
    // Rename deflate -> flate
    var cmp = try deflate.compressor(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finish
    cmp.deinit(); // Remove

    // Decompressor
    var fbs = std.io.fixedBufferStream(buf.items);
    // Remove allocator and last param
    // Rename deflate -> flate
    // Remove try
    var dcp = try deflate.decompressor(allocator, fbs.reader(), null);
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldZlib(allocator: std.mem.Allocator) !void {
    const zlib = std.compress.v1.zlib;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compressStream => compressor
    // Remove allocator
    var cmp = try zlib.compressStream(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // decompressStream => decompressor
    // Remove allocator
    // Remove try
    var dcp = try zlib.decompressStream(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldGzip(allocator: std.mem.Allocator) !void {
    const gzip = std.compress.v1.gzip;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compress => compressor
    // Remove allocator
    var cmp = try gzip.compress(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finisho
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // Rename decompress => decompressor
    // Remove allocator
    // Remove try
    var dcp = try gzip.decompress(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

```
2024-02-14 18:28:20 +01:00
2024-02-11 13:38:56 -07:00
2024-02-09 23:11:46 +02:00
2024-02-05 11:55:14 +03:30
2024-02-09 18:49:54 +02:00
2023-08-04 11:01:18 -07:00

ZIG

A general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

https://ziglang.org/

Documentation

If you are looking at this README file in a source tree, please refer to the Release Notes, Language Reference, or Standard Library Documentation corresponding to the version of Zig that you are using by following the appropriate link on the download page.

Otherwise, you're looking at a release of Zig, and you can find documentation here:

  • doc/langref.html
  • doc/std/index.html

Installation

A Zig installation is composed of two things:

  1. The Zig executable
  2. The lib/ directory

At runtime, the executable searches up the file system for the lib/ directory, relative to itself:

  • lib/
  • lib/zig/
  • ../lib/
  • ../lib/zig/
  • (and so on)

In other words, you can unpack a release of Zig anywhere, and then begin using it immediately. There is no need to install it globally, although this mechanism supports that use case too (i.e. /usr/bin/zig and /usr/lib/zig/).

Building from Source

Ensure you have the required dependencies:

  • CMake >= 3.5
  • System C/C++ Toolchain
  • LLVM, Clang, LLD development libraries == 17.x

Then it is the standard CMake build process:

mkdir build
cd build
cmake ..
make install

For more options, tips, and troubleshooting, please see the Building Zig From Source page on the wiki.

Building from Source without LLVM

In this case, the only system dependency is a C compiler.

cc -o bootstrap bootstrap.c
./bootstrap

This produces a zig2 executable in the current working directory. This is a "stage2" build of the compiler, without LLVM extensions, and is therefore lacking these features:

However, a compiler built this way does provide a C backend, which may be useful for creating system packages of Zig projects using the system C toolchain. In such case, LLVM is not needed!

Contributing

Donate monthly.

Zig is Free and Open Source Software. We welcome bug reports and patches from everyone. However, keep in mind that Zig governance is BDFN (Benevolent Dictator For Now) which means that Andrew Kelley has final say on the design and implementation of everything.

One of the best ways you can contribute to Zig is to start using it for an open-source personal project.

This leads to discovering bugs and helps flesh out use cases, which lead to further design iterations of Zig. Importantly, each issue found this way comes with real world motivations, making it straightforward to explain the reasoning behind proposals and feature requests.

You will be taken much more seriously on the issue tracker if you have a personal project that uses Zig.

The issue label Contributor Friendly exists to help you find issues that are limited in scope and/or knowledge of Zig internals.

Please note that issues labeled Proposal but do not also have the Accepted label are still under consideration, and efforts to implement such a proposal have a high risk of being wasted. If you are interested in a proposal which is still under consideration, please express your interest in the issue tracker, providing extra insights and considerations that others have not yet expressed. The most highly regarded argument in such a discussion is a real world use case.

For more tips, please see the Contributing page on the wiki.

Community

The Zig community is decentralized. Anyone is free to start and maintain their own space for Zig users to gather. There is no concept of "official" or "unofficial". Each gathering place has its own moderators and rules. Users are encouraged to be aware of the social structures of the spaces they inhabit, and work purposefully to facilitate spaces that align with their values.

Please see the Community wiki page for a public listing of social spaces.

Description
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Readme MIT 711 MiB
Languages
Zig 98.3%
C 1.1%
C++ 0.2%
Python 0.1%