This is a minimal, self-contained Zig library designed to simplify running compute shaders using WebGPU. It abstracts away much of the boilerplate required for GPU device initialization, memory management, and pipeline execution.

Core Modules

The library exports five primary components:

GpuDevice: Initializes the WebGPU instance, adapter, device, and queue. It is configured to prioritize high performance and automatically requests the ShaderF16 feature if the adapter supports it. By default, it enforces a 2 GB VRAM limit.
GpuArena / GpuAllocator: A memory management layer that tracks allocated VRAM bytes to prevent exceeding the device budget. The arena automatically destroys and releases all tracked WebGPU buffers when deinitialized.
GpuBuffer: Wraps native WebGPU buffers. It automatically aligns buffer sizes forward to a multiple of 4 bytes. It provides a .load() method for CPU-to-GPU data transfers (handling both aligned and unaligned lengths smoothly) and a .read() method that utilizes a staging buffer to map GPU data back to the CPU.
GpuProcess: Compiles WGSL source code into a compute pipeline. When running a process, it automatically splits the work into manageable chunks (up to 1 GB at a time) and dispatches workgroups of size 256.

Quick Start Example

Below is a complete, self-contained example demonstrating how to initialize the GPU, load data, run a compute shader, and read the results back to the CPU:


const std = @import("std");
const gpu = @import("gpu");
const GpuDevice = gpu.GpuDevice;
const GpuArena = gpu.GpuArena;
const GpuBuffer = gpu.GpuBuffer;
const GpuProcess = gpu.GpuProcess;

pub fn main(init: std.process.Init) !void {
    const allocator = init.gpa;

    // 1. Open GPU Device
    const device = try GpuDevice.init(.{});
    defer device.deinit();

    // 2. Create a GPU Arena to manage VRAM
    var grena = GpuArena.init(allocator, device);
    defer grena.deinit();
    const gloc = grena.gpuAllocator();

    // 3. Load the WGSL compute pipeline
    const add_process = try GpuProcess.init(device, @embedFile("shaders/add.wgsl"));
    defer add_process.deinit();

    // 4. Setup CPU data
    const len: usize = 16;
    const data_a = try allocator.alloc(f16, len);
    defer allocator.free(data_a);
    const data_b = try allocator.alloc(f16, len);
    defer allocator.free(data_b);

    for (0..len) |i| {
        data_a[i] = @floatFromInt(i);
        data_b[i] = @floatFromInt(len - 1 - i);
    }

    // 5. Initialize raw GPU Buffers
    // We pass the EnumSet inline using `.initMany` since the Enum itself isn't exported
    const byte_size = len * @sizeOf(f16);
    const buf_a = try GpuBuffer.init(gloc, byte_size, .initMany(&.{ .Storage, .CopyDst, .CopySrc }));
    const buf_b = try GpuBuffer.init(gloc, byte_size, .initMany(&.{ .Storage, .CopyDst, .CopySrc }));
    const buf_out = try GpuBuffer.init(gloc, byte_size, .initMany(&.{ .Storage, .CopyDst, .CopySrc }));

    // Note: The buffers are safely tied to the GpuArena which will automatically
    // release them at the end. You can also manually call buf_x.deinit() if desired.

    // 6. Transfer data from CPU slices to GPU Buffers
    try buf_a.load(f16, data_a);
    try buf_b.load(f16, data_b);

    // 7. Dispatch the Compute Process
    // We pass the data type (f16) to allow GpuProcess to calculate chunks correctly
    try add_process.run(gloc, f16, buf_a, buf_b, buf_out);

    // 8. Map and copy the resulting buffer back to the CPU
    const out = try buf_out.read(allocator, f16);
    defer allocator.free(out);

    std.debug.print("Result: {any}\n", .{out});
}

Dependencies

wgpu.h: The library relies on the WebGPU C API headers to bind to the native system graphics.