* update to the new cache hash API
* std.Target defaultVersionRange moves to std.Target.Os.Tag
* std.Target.Os gains getVersionRange which returns a tagged union
* start the process of splitting Module into Compilation and "zig
module".
- The parts of Module having to do with only compiling zig code are
extracted into ZigModule.zig.
- Next step is to rename Module to Compilation.
- After that rename ZigModule back to Module.
* implement proper cache hash usage when compiling C objects, and
properly manage the file lock of the build artifacts.
* make versions optional to match recent changes to master branch.
* proper cache hash integration for compiling zig code
* proper cache hash integration for linking even when not compiling zig
code.
* ELF LLD linking integrates with the caching system. A comment from
the source code:
Here we want to determine whether we can save time by not invoking LLD when the
output is unchanged. None of the linker options or the object files that are being
linked are in the hash that namespaces the directory we are outputting to. Therefore,
we must hash those now, and the resulting digest will form the "id" of the linking
job we are about to perform.
After a successful link, we store the id in the metadata of a symlink named "id.txt" in
the artifact directory. So, now, we check if this symlink exists, and if it matches
our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD.
* implement disable_c_depfile option
* add tracy to a few more functions
into smaller exposed components and expose all of them. This makes it
more flexible.
`*const Cache` is now passed in with an open manifest dir handle which
the caller is responsible for managing.
Expose some of the base64 stuff.
Extract the hash helper functions into `HashHelper` and add some more
methods such as addOptional and addListOfFiles.
Add `CacheHash.toOwnedLock` so that you can deinitialize everything
except the open file handle which represents the file system lock on the
build artifacts.
Use ArrayListUnmanaged, saving space per allocated CacheHash.
Avoid 1 memory allocation in hit() with a static buffer.
hit() returns a bool; caller code is responsible for calling final() in
either case. This is a simpler and easier to use API.
writeManifest() is no longer called from deinit() with errors ignored.
* std.cache_hash exposes Hasher type
* std.cache_hash makes hasher_init a global const
* std.cache_hash supports cloning so that clones can share the same
open manifest dir handle as well as fork from shared hasher state
* start to populate the cache_hash for stage2 builds
* remove a footgun from std.cache_hash add function
* get rid of std.Target.ObjectFormat.unknown
* rework stage2 logic for resolving output artifact names by adding
object_format as an optional parameter to std.zig.binNameAlloc
* support -Denable-llvm in stage2 tests
* Module supports the use case when there are no .zig files
* introduce c_object_table and failed_c_objects to Module
* propagate many new kinds of data from CLI into Module and into
linker.Options
* introduce -fLLVM, -fLLD, -fClang and their -fno- counterparts.
closes#6251.
- add logic for choosing when to use LLD or zig's self-hosted linker
* stub code for implementing invoking Clang to build C objects
* add -femit-h, -femit-h=foo, and -fno-emit-h CLI options
This makes the `cache_hash` hash function easier to replace.
BLAKE3 would be a natural fit for hashing large files, but:
- second preimage resistance is not necessary for the cache_hash use cases
- our BLAKE3 implementation is currently very slow
Switch to SipHash128, which gives us an immediate speed boost.
- This avoids having multiple `init()` functions for every combination
of optional parameters
- The API is consistent across all hash functions
- New options can be added later without breaking existing applications.
For example, this is going to come in handy if we implement parallelization
for BLAKE2 and BLAKE3.
- We don't have a mix of snake_case and camelCase functions any more, at
least in the public crypto API
Support for BLAKE2 salt and personalization (more commonly called context)
parameters have been implemented by the way to illustrate this.
Instead of having all primitives and constructions share the same namespace,
they are now organized by category and function family.
Types within the same category are expected to share the exact same API.
I have observed on Linux writing and reading the same file many times
without the mtime changing, despite the file system having nanosecond
granularity (and about 1 millisecond worth of nanoseconds passing between
modifications). I am calling this a Linux Kernel Bug and adding file
size to the cache hash manifest as a mitigation. As evidence, macOS does
not exhibit this behavior.
This means it is possible, on Linux, for a file to be added to the cache
hash, and, if it is updated with the same file size, same inode, within
about 1 millisecond, the cache system will give us a false positive,
saying it is unmodified. I don't see any way to improve this situation
without fixing the bug in the Linux kernel.
closes#6082
I empirically observed mtime not changing when rapidly writing the same
file name within the same millisecond of wall clock time, despite the
mtime field having nanosecond precision.
I believe this fixes the CI test failures.
* change miscellaneous things to more idiomatic zig style
* change the digest length to 24 bytes instead of 48. This is
still 70 more bits than UUIDs. For an analysis of probability of
collisions, see:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
* fix the API having the possibility of mismatched allocators
* fix some error paths to behave properly
* modify the guarantees about when file contents are loaded for input files
* pwrite instead of seek + write
* implement isProblematicTimestamp
* fix tests with regards to a working isProblematicTimestamp function.
this requires sleeping until the current timestamp becomes
unproblematic.
* introduce std.fs.File.INode, a cross platform type abstraction
so that cache hash implementation does not need to reach into std.os.
People using the API as intended would never trigger this assertion
anyway, but if someone has a non standard use case, I see no reason
to make the program panic.
If a user doesn't care that the manifest failed to be written, they can
simply ignore it. The program will still work; that particular cache
item will simply not be cached.
It checks whether the cache will respond correctly to inputs that don't
initially depend on filesystem state. In that case, we have to check
for the existence of a manifest file, instead of relying on reading the
list of entries to tell us if the cache is invalid.
Instead of releasing the manifest file when an error occurs, it is
only released when when `CacheHash.release` is called. This maps better
to what a zig user expects when they do `defer cache_hash.release()`.
A file handle is not the same thing as an inode index number.
Eventually the inode will be checked as well, but there needs to be
a way to get the inode in `std` first.