Highlights from Git 2.45

Git 2.45 is here with experimental support for reftables, and SHA-256 interoperability. Get our take on the latest here.

|
| 10 minutes

The open source Git project just released Git 2.45 with features and bug fixes from over 96 contributors, 38 of them new. We last caught up with you on the latest in Git back when 2.44 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Preliminary reftable support

Git 2.45 introduces preliminary support for a new reference storage backend called “reftable,” promising faster lookups, reads, and writes for repositories with any number of references.

If you’re unfamiliar with our previous coverage of the new reftable format, don’t worry, this post will catch you up to speed (and then some!). But if you just want to play around with the new reference backend, you can initialize a new repository with --ref-format=reftable like so:

$ git init --ref-format=reftable /path/to/repo
Initialized empty Git repository in /path/to/repo/.git
$ cd /path/to/repo
$ git commit --allow-empty -m 'hello reftable!'
[main (root-commit) 2eb0810] hello reftable!
$ ls -1 .git/reftable/
0x000000000001-0x000000000002-565c6bf0.ref
tables.list
$ cat .git/reftable/tables.list
0x000000000001-0x000000000002-565c6bf0.ref

With that out of the way, let’s jump into the details. If you’re new to this series, or didn’t catch our initial coverage of the reftable feature, don’t worry, here’s a refresher. When we talk about references in Git, we’re referring to the branches and tags that make up your repository. In essence, a reference is nothing more than a name (like refs/heads/my-feature, or refs/tags/v1.0.0) and the object ID of the thing that reference points at.

Git has historically stored references in your repository in one of two ways: either “loose” as a file inside of $GIT_DIR/refs (like $GIT_DIR/refs/heads/my-feature) or “packed” as an entry inside of the file at $GIT_DIR/packed_refs.

For most repositories today, the existing reference backend works fine. For repositories with a truly gigantic number of references, however, the existing backend has some growing pains. For instance, storing a large number of references as “loose” can lead to directories with a large number of entries (slowing down lookups within that directory) and/or inode exhaustion. Likewise, storing all references in a single packed_refs file can become expensive to maintain, as even small reference updates require a significant I/O-cost to rewrite the entire packed_refs file on each update.

That’s where the reftable format comes in. Reftable is an entirely new format for storing Git references. Instead of storing loose references, or constantly updating a large packed_refs file, reftable implements a binary format for storing references that promises to achieve:

  • Near constant-time lookup for individual references, and near constant-time verification that a given object ID is referred to by at least one reference.
  • Efficient lookup of entire reference namespaces through prefix compression.
  • Atomic reference updates that scale with the size of the reference update, not the number of overall references.

The reftable format is incredibly detailed (curious readers can learn more about it in more detail by reading the original specification), but here’s a high-level overview. A repository can have any number of reftables (stored as *.ref files), each of which is organized into variable-sized blocks. Blocks can store information about a collection of references, refer to the contents of other blocks when storing references across a collection of blocks, and more.

The format is designed to both (a) take up a minimal amount of space (by storing reference names with prefix compression) and (b) support fast lookups, even when reading the .ref file(s) from a cold cache.

Most importantly, the reftable format supports multiple *.ref files, meaning that each reference update transaction can be processed individually without having to modify existing *.ref files. A separate compaction process describes how to “merge” a range of adjacent *.ref files together into a single *.ref file to maintain read performance.

The reftable format was originally designed by Shawn Pearce for use in JGit to better support the large number of references stored by Gerrit. Back in our Highlights from Git 2.35 post, we covered that an implementation of the reftable format had landed in Git. In that version, Git did not yet know how to use the new reftable code in conjunction with its existing reference backend system, meaning that you couldn’t yet create repositories that store references using reftable.

In Git 2.45, support for a reftable-powered storage backend has been integrated into Git’s generic reference backend system, meaning that you can play with reftable on your own repository by running:

$ git init --ref-format=reftable /path/to/repo

[source, source, source, source, source, source, source, source, source, source]

Preliminary support for SHA-1 and SHA-256 interoperability

Returning readers of this series will be familiar with our ongoing coverage of the Git project’s hash function transition. If you’re new around here, or need a refresher, don’t worry!

Git identifies objects (the blobs, trees, commits, and tags that make up your repository) by a hash of their contents. Since its inception, Git has used the SHA-1 hash function to hash and identify objects in a repository.

However, the SHA-1 function has known collision attacks (e.g., Shattered, and Shambles), meaning that a sufficiently motivated attacker can generate a colliding pair of SHA-1 inputs, which have the same SHA-1 hash despite containing different contents. (Many providers, like GitHub, use a SHA-1 implementation that detects and rejects inputs that contain the telltale signs of being part of a colliding pair attack. For more details, see our post, SHA-1 collision detection on GitHub.com).

Around this time, the Git project began discussing a plan to transition from SHA-1 to a more secure hash function that was not susceptible to the same chosen-prefix attacks. The project decided on SHA-256 as the successor to Git’s use of SHA-1 and work on supporting the new hash function began in earnest. In Git 2.29 (released in October 2020), Git gained experimental support for using SHA-256 instead of SHA-1 in specially-configured repositories. That feature was declared no longer experimental in Git 2.42 (released in August 2023).

One of the goals of the hash function transition was to introduce support for repositories to interoperate between SHA-1 and SHA-256, meaning that repositories could in theory use one hash function locally, while pushing to another repository that uses a different hash function.

Git 2.45 introduces experimental preliminary support for limited interoperability between SHA-1 and SHA-256. To do this, Git 2.45 introduces a new concept called the “compatibility” object format, and allows you to refer to objects by either their given hash, or their “compatibility” hash. An object’s compatibility hash is the hash of an object as it would have been written under the compatibility hash function.

To give you a better sense of how this new feature works, here’s a short demo. To start, we’ll initialize a repository in SHA-256 mode, and declare that SHA-1 is our compatibility hash function:

$ git init --object-format=sha256 /path/to/repo
Initialized empty Git repository in /path/to/repo/.git
$ cd /path/to/repo
$ git config extensions.compatObjectFormat sha1

Then, we can create a simple commit with a single file (README) whose contents are “Hello, world!”:

$ echo 'Hello, world!' >README
$ git add README
$ git commit -m "initial commit"
[main (root-commit) 74dcba4] initial commit
 Author: A U Thor <[email protected]>
 1 file changed, 1 insertion(+)
 create mode 100644 README

Now, we can ask Git to show us the contents of the commit object we just created with cat-file. As we’d expect, the hash of the commit object, as well as its root tree are computed using SHA-256:

$ git rev-parse HEAD | git cat-file --batch
74dcba4f8f941a65a44fdd92f0bd6a093ad78960710ac32dbd4c032df66fe5c6 commit 202
tree ace45d916e870ce0fadbb8fc579218d01361da4159d1e2b5949f176b1f743280
author A U Thor <[email protected]> 1713990043 -0400
committer C O Mitter <[email protected]> 1713990043 -0400

initial commit

But we can also tell git rev-parse to output any object IDs using the compatibility hash function, allowing us to ask for the SHA-1 object ID of that same commit object. When we print its contents out using cat-file, its root tree OID is a different value (starting with 7dd4941980 instead of ace45d916e), this time computed using SHA-1 instead of SHA-256:

$ git rev-parse --output-object-format=sha1 HEAD
2a4f4a2182686157a2dc887c46693c988c912533

$ git rev-parse --output-object-format=sha1 HEAD | git cat-file --batch
2a4f4a2182686157a2dc887c46693c988c912533 commit 178
tree 7dd49419807b37a3afd2f040891a64d69abb8df1
author A U Thor <[email protected]> 1713990043 -0400
committer C O Mitter <[email protected]> 1713990043 -0400

initial commit

Support for this new feature is still considered experimental, and many features may not work quite as you expect them to. There is still much work ahead for full interoperability between SHA-1 and SHA-256 repositories, but this release delivers an important first step towards full interoperability support.

[source]


  • If you’ve ever scripted around your repository, then you have no doubt used git rev-list to list commits or objects reachable from some set of inputs. rev-list can also come in handy when trying to diagnose repository corruption, including investigating missing objects.

    In the past, you might have used something like git rev-list --missing=print to gather a list of objects which are reachable from your inputs, but are missing from the local repository. But what if there are missing objects at the tips of your reachability query itself? For instance, if the tip of some branch or tag is corrupt, then you’re stuck:

    $ git rev-parse HEAD | tr 'a-f1-9' '1-9a-f' >.git/refs/heads/missing
    $ git rev-list --missing=print --all | grep '^?'
    fatal: bad object refs/heads/missing
    

    Here, Git won’t let you continue, since one of the inputs to the reachability query itself (refs/heads/missing, via --all) is missing. This can make debugging missing objects in the reachable parts of your history more difficult than necessary.

    But with Git 2.45, you can debug missing objects even when the tips of your reachability query are themselves missing, like so:

    $ git rev-list --missing=print --all | grep '^?'
    ?70678e7afeacdcba1242793c3d3d28916a2fd152
    

    [source]

  • One of Git’s lesser-known features are “reference logs,” or “reflogs” for short. These reference logs are extremely useful when asking questions about the history of some reference, such as: “what was main pointing at two weeks ago?” or “where was I before I started this rebase?”.

    Each reference has its own corresponding reflog, and you can use the git reflog command to see the reflog for the currently checked-out reference, or for an arbitrary reference by running git reflog refs/heads/some/branch.

    If you want to see what branches have corresponding reflogs, you could look at the contents of .git/logs like so:

    $ find .git/logs/refs/heads -type f | cut -d '/' -f 3-
    

    But what if you’re using reftable? In that case, the reflogs are stored in a binary format, leaving tools like find out of your reach.

    Git 2.45 introduced a new sub-command git reflog list to show which references have corresponding reflogs available to them, regardless of whether or not you are using reftable.

    [source]

  • If you’ve ever looked closely at Git’s diff output, you might have noticed the prefixes a/ and b/ used before file paths to indicate the before and after versions of each file, like so:

    $ git diff HEAD^ -- GIT-VERSION-GEN
    diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
    index dabd2b5b89..c92f98b3db 100755
    --- a/GIT-VERSION-GEN
    +++ b/GIT-VERSION-GEN
    @@ -1,7 +1,7 @@
    #!/bin/sh
    
    GVF=GIT-VERSION-FILE
    -DEF_VER=v2.45.0-rc0
    +DEF_VER=v2.45.0-rc1
    
    LF='
    '
    

    In Git 2.45, you can now configure alternative prefixes by setting the diff.srcPrefix and diff.dstPrefix configuration options. This can come in handy if you want to make clear which side is which (by setting them to something like “before” and “after,” respectively). Or if you’re viewing the output in your terminal, and your terminal supports hyperlinking to paths, you could change the prefix to ./ to allow you to click on filepaths within a diff output.

    [source]

  • When writing a commit message, Git will open your editor with a mostly blank file containing some instructions, like so:

    # Please enter the commit message for your changes. Lines starting
    # with '#' will be ignored, and an empty message aborts the commit.
    #
    # On branch main
    # Your branch is up to date with 'origin/main.
    

    Since 2013, Git has supported customizing the comment character to be something other than the default #. This can come in handy, for instance, if you’re trying to refer to a GitHub issue by its numeric shorthand (e.g. #12345). If you write #12345 at the beginning of a line in your commit message, Git will treat the entire line as a comment and ignore it.

    In Git 2.45, Git allows not just any single ASCII character, but any arbitrary multi-byte character or even an arbitrary string. Now, you can customize your commit message template by setting core.commentString (or core.commentChar, the two are synonyms for one another) to your heart’s content.

    [source]

  • Speaking of comments, git config learned a new option to help document your .gitconfig file. The .gitconfig file format allows for comments beginning with a # character, meaning that everything following that # until the next newline will be ignored.

    The git config command gained a new --comment option, which allows specifying an optional comment to leave at the end of the newly configured line, like so:

    $ git config --comment 'to show the merge base' merge.conflictStyle diff3
    $ tail -n 2 .git/config
    [merge]
    conflictStyle = diff3 # to show the merge base
    

    This can be helpful when tweaking some of Git’s more esoteric settings to try and remember why you picked a particular value.

    [source]

  • Sometimes when you are rebasing or cherry-picking a series of commits, one or more of those commits become “empty” (i.e., because they contain a subset of changes that have already landed on your branch).

    When rebasing, you can use the --empty option to specify how to handle these commits. --empty supports a few options: “drop” (to ignore those commits), “keep” (to keep empty commits), or “stop” which will halt the rebase and ask for your input on how to proceed.

    Despite its similarity to git rebase, git cherry-pick never had an equivalent option to --empty. That meant that if you were cherry-picking a long sequence of commits, some of which became empty, you’d have to type either git cherry-pick --skip (to drop the empty commit), or git commit --allow-empty (to keep the empty commit).

    In Git 2.45, git cherry-pick learned the same --empty option from git rebase, meaning that you can specify the behavior once at the beginning of your cherry-pick operation, instead of having to specify the same thing each time you encounter an empty commit.

    [source]

The rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.45, or any previous version in the Git repository.

Tags:

Written by

Related posts

Game Off 2024 theme announcement

GitHub’s annual month-long game jam, where creativity knows no limits! Throughout November, dive into your favorite game engines, libraries, and programming languages to bring your wildest game ideas to life. Whether you’re a seasoned dev or just getting started, it’s all about having fun and making something awesome!