ZipponDB/Technical docs.md

4.8 KiB

Intro

TODO

Note: Code snipped do not necessary represent the actual codebase but are use to explain principle.

Tokenizers

All Tokenizer work similary and are based on the zig tokenizer.

The Tokenizer role is to take a buffer string and convert it into a list of Token. A token have an enum Tag that represent what the token is, for example = is the tag equal, and a Loc with a start and end usize that represent the emplacement in the buffer.

The Tokenizer itself have 2 methods: next that return the next Token. And TODO that return the slice of the buffer that represent the Token, using it's Loc.

This is how to use it:

const toker = Tokenizer.init(buff);
const token = toker.next();
std.debug.print("{s}", .{toker.xxxx(token)});

I usually use a Tokenizer in a loop until the Tag is end. And in each loop I take the next token and will use a switch on the Tag to do stuffs.

Here a simple example:

const toker = Tokenizer.init(buff);
var token = toker.next();
while (token.tag != .end) : (token = toker.next()) switch (token.tag) {
  .equal => std.debug.print("{s}", .{toker.xxxx(token)}),
  else => {},
}

All tokenizers

There is 4 differents tokenizer in ZipponDB, I know, that a lot. Here the list:

  • ZiQL: Tokenizer for the query language.
  • cli: Tokenizer the commands.
  • schema: Tokenizer for the schema file.
  • data: Tokenizer for csv file.

They all have different Tag and way to parse the array of bytes but overall are very similar. The only noticable difference is that some use a null terminated string (based on the zig tokenizer) and other not. Mostly because I need to use dupeZ to get a new null terminated array, not necessary.

Parser

Parser are the next step after the tokenizer. Its role is to take Token and do stuff or raise error. There is 3 Parser, the main one is for ZiQL, one for the schema and one for the cli. Note that the cli one is just the main function in main.zig and not it's own struct but overall do the same thing.

A Parser have a State and a Tokenizer as member and have a parse method. Similary to Tokenizer, it will enter a while loop. This loop will continue until the State is end.

Let's take as example the schema parser that need to parse this file:

User (name: str)

When I run the parse method, it will init the State as start. When in start, I check if the Token is a identifier (a variable name), if it is one I add it to the list of struct in the current schema, if not I raise an error pointing to this token. Here the idea for a parse method:

var state = .start;
var token = self.toker.next();
while (state != .end) : (token = self.toker.next()) switch (state) {
  .start => switch (token.tag) {
    .identifier => self.addStruct(token),
    else => printError("Error: Expected a struct name.", token),
  },
  else => {},
}

The issue here is obviously that we are in an infinite loop that just going to add struct or print error. I need to change the state based on the combinaison of the current state and token.tag. For that I usually use very implicite name for State. For example in this situation, after a struct name, I expect ( so I will call it something like expect_l_paren. Here the idea:

var state = .start;
var token = self.toker.next();
while (state != .end) : (token = self.toker.next()) switch (state) {
  .start => switch (token.tag) {
    .identifier => {
        self.addStruct(token);
        state = .expect_l_parent;
      },
    else => printError("Error: Expected a struct name.", token),
  },
  .expect_l_parent => switch (token.tag) {
    .l_paren => {},
    else => printError("Error: Expected (.", token),
  },
  else => {},
}

And that's basicly it, the entire Parser work like that. It is fairly easy to debug as I can print the state and token.tag at each iteration and follow the path of the Parser.

Note that the ZiQLParser use different methods for parsing:

  • parse: The main one that will then use the other.
  • parseFilter: This will populate an array of UUID based on what is between {}.
  • parseCondition: Create a Condition struct based on a part of what is between {}. E.g. name = 'Bob'.
  • parseAdditionalData: Populate the AdditionalData struct that represent what is between [].
  • parseNewData: Return a string map with key as member name and value as value of what is between (). E.g. (name = 'Bob') will return a map with one key name with the value Bob.
  • parseOption: Not done yet. Parse what is between ||

FileEngine

The FileEngine is that is managing files, everything that need to read or write into files is here.

I am not goind into too much detail here as I think this will change in the futur.