7.9 KiB
ZipponData
ZipponData is a library developped in the context of ZipponDB.
The library intent to create a simple way to store and parse data from a file in the most efficient and fast way possible.
There is 6 data type available in ZipponData:
Type | Zig type | Bytes in file |
---|---|---|
int | i32 | 4 |
float | f64 | 8 |
bool | bool | 1 |
str | []u8 | 4 + len |
uuid | [16]u8 | 16 |
unix | u64 | 8 |
Each type have its array equivalent.
Quickstart
- Create a file with
createFile
- Create some
Data
- Create a
DataWriter
- Write the data
- Create a schema
- Create an iterator with
DataIterator
- Iterate over all value
- Delete the file with
deleteFile
Here an example of how to use it:
const std = @import("std");
pub fn main() !void {
const allocator = std.testing.allocator;
// 0. Make a temporary directory
try std.fs.cwd().makeDir("tmp");
const dir = try std.fs.cwd().openDir("tmp", .{});
// 1. Create a file
try createFile("test", dir);
// 2. Create some Data
const data = [_]Data{
Data.initInt(1),
Data.initFloat(3.14159),
Data.initInt(-5),
Data.initStr("Hello world"),
Data.initBool(true),
Data.initUnix(2021),
};
// 3. Create a DataWriter
var dwriter = try DataWriter.init("test", dir);
defer dwriter.deinit(); // This just close the file
// 4. Write some data
try dwriter.write(&data);
try dwriter.write(&data);
try dwriter.write(&data);
try dwriter.write(&data);
try dwriter.write(&data);
try dwriter.write(&data);
try dwriter.flush(); // Dont forget to flush !
// 5. Create a schema
// A schema is how the iterator will parse the file.
// If you are wrong here, it will return wrong/random data
// And most likely an error when iterating in the while loop
const schema = &[_]DType{
.Int,
.Float,
.Int,
.Str,
.Bool,
.Unix,
};
// 6. Create a DataIterator
var iter = try DataIterator.init(allocator, "test", dir, schema);
defer iter.deinit();
// 7. Iterate over data
while (try iter.next()) |row| {
std.debug.print("Row: {any}\n", .{ row });
}
// 8. Delete the file (Optional ofc)
try deleteFile("test", dir);
try std.fs.cwd().deleteDir("tmp");
}
Note: The dir can be null and it will use cwd.
Array
All data type have an array equivalent. To write an array, you need to first encode it using allocEncodArray
before writing it.
This use an allocator so you need to free what it return.
When read, an array is just the raw bytes. To get the data itself, you need to create an ArrayIterator
. Here an example:
pub fn main() !void {
const allocator = std.testing.allocator;
// 0. Make a temporary directory
try std.fs.cwd().makeDir("array_tmp");
const dir = try std.fs.cwd().openDir("array_tmp", .{});
// 1. Create a file
try createFile("test", dir);
// 2. Create and encode some Data
const int_array = [4]i32{ 32, 11, 15, 99 };
const data = [_]Data{
Data.initIntArray(try allocEncodArray.Int(allocator, &int_array)), // Encode
};
defer allocator.free(data[0].IntArray); // DOnt forget to free it
// 3. Create a DataWriter
var dwriter = try DataWriter.init("test", dir);
defer dwriter.deinit();
// 4. Write some data
try dwriter.write(&data);
try dwriter.flush();
// 5. Create a schema
const schema = &[_]DType{
.IntArray,
};
// 6. Create a DataIterator
var iter = try DataIterator.init(allocator, "test", dir, schema);
defer iter.deinit();
// 7. Iterate over data
var i: usize = 0;
if (try iter.next()) |row| {
// 8. Iterate over array
var array_iter = ArrayIterator.init(&row[0]); // Sub array iterator
while (array_iter.next()) |d| {
try std.testing.expectEqual(int_array[i], d.Int);
i += 1;
}
}
try deleteFile("test", dir);
try std.fs.cwd().deleteDir("array_tmp");
}
Benchmark
Done on a AMD Ryzen 7 7800X3D with a Samsung SSD 980 PRO 2TB (up to 7,000/5,100MB/s for read/write speed) on one thread.
Rows | Write Time (ms) | Average Write Time (μs) | Read Time (ms) | Average Read Time (μs) | File Size (kB) |
---|---|---|---|---|---|
1 | 0.01 | 13.63 | 0.025 | 25.0 | 0.04 |
10 | 0.01 | 1.69 | 0.03 | 3.28 | 0.4 |
100 | 0.04 | 0.49 | 0.07 | 0.67 | 4.0 |
1_000 | 0.36 | 0.36 | 0.48 | 0.48 | 40 |
10_000 | 3.42 | 0.34 | 4.67 | 0.47 | 400 |
100_000 | 36.39 | 0.36 | 48.00 | 0.49 | 4_000 |
1_000_000 | 361.41 | 0.36 | 481.00 | 0.48 | 40_000 |
TODO: Update number to use Unix one. Benchmark on my laptop and maybe on some cloud VM.
Data use:
const schema = &[_]DType{
.Int,
.Float,
.Int,
.Str,
.Bool,
.Unix,
};
const data = &[_]Data{
Data.initInt(1),
Data.initFloat(3.14159),
Data.initInt(-5),
Data.initStr("Hello world"),
Data.initBool(true),
Data.initUnix(2021),
};
Note: You can check Benchmark.md in ZipponDB to see performance using multi-threading. Was able to parse 1_000_000 users in less than 100ms
Importing the package
Create a build.zig.zon
next to build.zig
if not already done.
Add this dependencies in build.zig.zon
:
.ZipponData = .{
.url = "git+https://github.com/MrBounty/ZipponData",
//the correct hash will be suggested by zig},
Here what my complete build.zig.zon
is for my project ZipponDB:
.{
.name = "ZipponDB",
.version = "0.1.4",
.dependencies = .{
.ZipponData = .{
.url = "git+https://github.com/MrBounty/ZipponData",
//the correct hash will be suggested by zig},
},
.paths = .{
"",
},
}
And in build.zig
you can import the module like this:
const zid = b.dependency("ZipponData", .{});
exe.root_module.addImport("ZipponData", zid.module("ZipponData"));
And you can now import it like std in your project:
const zid = @import("ZipponData");
zid.createFile("Hello.zid", null);
What you can't do
You can't update files. You gonna need to implement that yourself. The easier way (and only I know), is to parse the entire file and write it into another.
Here an example that evaluate all struct using a Filter
and write only struct that are false. (A filter can be like age > 20
, if the member age
of the struct is > 20
, it is true):
pub fn delete(file_name: []const u8, dir: std.fs.Dir, filter: Filter) !void {
// 1. Create the iterator of the current file
var iter = try zid.DataIterator.init(self.allocator, file_name, dir, sstruct.zid_schema);
defer iter.deinit();
// 2. Create a new file
const new_path_buff = try std.fmt.allocPrint(self.allocator, "{s}.new", .{file_name});
defer self.allocator.free(new_path_buff);
try zid.createFile(new_path_buff, dir);
// 3. Create a writer of the new data
var new_writer = try zid.DataWriter.init(new_path_buff, dir);
defer new_writer.deinit();
// 4. For all struct, evaluate and write to new file if false
while (try iter.next()) |row| {
if (!filter.evaluate(row)) {
try new_writer.write(row);
}
}
// 5. Flush, delete old file and rename new file to previous file
try new_writer.flush();
try dir.deleteFile(path_buff);
try dir.rename(new_path_buff, path_buff);
}
Potential update
I don't plan do update this but it will depend if my other project need it.
- Functions to update files
- Add a header with the data type at the beginning of the file so no need to make a schema and I can check everytime I write if it's in the good format
- More type
- Multi threading