Generating a version number from your git repo

5 Nov 2022 10:08 git

I’d like my application to display some kind of version number, ideally derived from the git tag.

This will allow me to be sure of which version is running in production (verifying that my upgrade script ran), and – if anyone reports a bug – will allow me to figure out which version they were using at the time.

It would be nice if this was some kind of version number. They’re easier to read; they allow package and dependency managers to figure out which version is newer.

git SHA

If you just want the git SHA, you can use git rev-parse HEAD:

git rev-parse HEAD
2ec9c07923ad96c3446159738b811f4dca9acba4

This displays the full SHA of the current state of your git work tree. You could, for example, display this discreetly in the corner of your web app, or in an HTML comment (where View Source can find it).

It’s a bit long, though, so you might want to shorten it:

git rev-parse --short HEAD
2ec9c07

For quite a long time, we used this at Electric Imp to identify the backend components: they were versioned 1.2ec9c07. They were packaged as .deb files, and the 1. is to keep the Debian packaging scripts happy.

We did spot, however, that the short SHA varies in length, depending on how many characters are required for it to be unique within the repository. This can sometimes break scripts that expect a specific length, and (more importantly) it’s only unique at that point in time. A later commit might cause a collision (with the short SHA; it’s unlikely to collide with the long one).

For this reason, you might choose to use a longer, but still short, SHA:

git rev-parse --short=10 HEAD
2ec9c07923

Using a tag

If you’re using tags to describe your versions:

$ git tag -a -m "v1.0.2" v1.0.2

…then you can use the git describe command to get the current tag:

$ git describe
v1.0.2

If there is no current tag, you’ll get something like the following, of which more later.

$ git describe
v1.0.2-2-g2ec9c07

To avoid that, you can use git describe --exact-match, which will fail if you’re not currently at an exact tag:

$ git describe --exact-match
fatal: no tag exactly matches '2ec9c07923ad96c3446159738b811f4dca9acba4'

This could be useful in your CI pipeline, to ensure that you have tagged what you’re planning on releasing. It doesn’t help if you’re doing local development, and you want some kind of version number.

If you have no tags yet, you’ll get the following:

$ git describe
fatal: No names found, cannot describe anything.

git describe

As mentioned above, if you’re not at a tag, you’ll get the following:

$ git describe
v1.0.2-2-g2ec9c07

This is in 4 parts:

  • The most-recent tag. In this example, it’s v1.0.2.
  • A count of commits since that tag. Here, 2.
  • A g, presumably for “git”.
  • A short SHA. Here it’s 2ec9c07.

The example above looks a bit like a SemVer, but that’s only because the most-recent tag was v1.0.2. If, instead, I’d tagged it as escrow-nov22, the result would have looked like escrow-nov22-2-g2ec9c07. To avoid that kind of thing, use --match:

git describe --match 'v*'

If you’re using something else for your versions, like release-, you can do this instead:

git describe --match 'release-*'

Don’t forget to use quotes, otherwise your shell is likely to treat this as a wildcard (glob) pattern, which would be bad.

As mentioned above, git will shorten the SHA to the point where it’s unambiguous. You can force it to a particular length:

git describe --abbrev=10 --match 'v*'

Gotta love that consistency between --short=10 and --abbrev=10.

If you’re at a tag, but you want the long form anyway, pass --long:

git describe --long --abbrev=10 --match 'v*'

Semantic Versioning

I’m going to ignore the “semantic” part of “semantic versioning” in this post, but it defines a particular format for version numbers, and various tools require version numbers to be in this format.

We can almost get away with the output from git describe, as long as we discard the v prefix, because SemVer allows for the hyphen:

$ git describe --long --abbrev=10 --match 'v*' | cut -b2-
1.0.2-2-g2ec9c07923

Unfortunately, SemVer defines the hyphen as meaning a pre-release, but the tag is from a previous point in git history, so this is … a post-release, I guess…?

Parsing git describe

We can do better if we parse the output from git describe by using a regex:

description=$(git describe --long --abbrev=10 --match 'v*')
re='^v([0-9]+)\.([0-9]+)\.([0-9]+)-([0-9]+)-g([0-9a-f]+)$'

if [[ $description =~ $re ]]; then
    ver_major="${BASH_REMATCH[1]}"
    ver_minor="${BASH_REMATCH[2]}"
    ver_patch="${BASH_REMATCH[3]}"
    commits="${BASH_REMATCH[4]}"
    sha="${BASH_REMATCH[5]}"
else
    echo "Error: '${description}' didn't match regex"
    exit 1
fi

echo "${ver_major}.${ver_minor}.${ver_patch}"

Unfortunately, SemVer values only allow for 3 version components – MAJOR.MINOR.PATCH; we’ve thrown away the commit count, meaning that this is no longer an unambiguous way to identify the exact commit we built.

Fortunately, SemVer allows for arbitrary metadata to be appended to the version number. You can follow the version number with a + and then add dot-separated fields. We can use this to put the SHA in the version number, meaning we can have both: a readable version number, a commit count, and an exact commit.

echo "${ver_major}.${ver_minor}.${ver_patch}+${commits}.${sha}"

This isn’t perfect: the commit count isn’t obvious, so a human might still confuse the two version numbers. We’ll come back to that once we’ve dealt with a different problem: local changes.

Developer builds

If the project is being built on the CI server, then this is enough. But we’re going to be building the code locally as well, and we’d like to be able to spot that.

One way to do that is to detect that we’re not on the CI server (Jenkins, e.g., sets specific environment variables), and to add some metadata to denote that: 1.0.2+2.2ec9c07923.dev or something.

Or we could put the user name and hostname in the version. Or any of a number of other schemes.

At Electric Imp, however, the impOS is built on someone’s workstation, because the CI system doesn’t have access to the code signing keys (they’re in a physical HSM which is kept in a safe).

More important than where it was built, then, is whether the build is repeatable: are there local modifications that aren’t pushed to the shared repo?

Yes, there are a lot more things to worry about for reproducible builds. They’re out of scope here. That’s why I’m using “repeatable”.

Local changes

We’d like to find out whether there are local modifications. We can do this with git diff-index HEAD:

if git diff-index --quiet HEAD; then
    # there are local modifications
fi

This will tell us whether we have any files in our working copy, or staged, that aren’t in the index.

Note that, according to some notes I found in a years-old draft of this post, you need to run git status to refresh the index first. I can’t find an definitive reference for this, however.

Is it pushed?

That’s not quite enough, though. We might have committed all of our changes locally, but until we’ve pushed them to origin (e.g. GitHub), our CI pipeline can’t find them and they don’t exist in any repeatable way.

We can find out whether our commit is in a remote branch with git branch --remote --contains HEAD.

Caveats

Note that this isn’t completely reliable, because it only checks the tracking branch, not the actual remote branch. That is: the commit could have been in the remote when you pulled it, but have been deleted from the remote afterwards.

You could fix that with git fetch origin or git remote update first, but that’s slow and requires a connection; we decided it was a fairly low risk.

It also assumes that all remotes are equally valid, so it could get confused if you’ve been pushing commits to your colleagues, but not to the repository that your CI pipeline is using.

You could check that the returned branches are all in the expected remote, but we decided not to worry about it.

Even if the branch is in the correct remote, it might not be a production/release branch. That is: if you’ve pushed the changes to a feature branch, or to a private branch, it’ll assume that’s OK. Again: we punted that.

I should be clear: at Electric Imp, in approximately 10 years of using the scripts that inspired this post, we never ran into any of the above problems.

Dirty

Combining the two sections above, we end up with something like this:

# refresh the index
git status >/dev/null 2>&1

# are there local/staged changes?
dirty=$(git diff-index --quiet HEAD || echo ".dirty")

# is the current commit pushed to a remote branch?
remote_branch=$(git branch -r --contains HEAD 2>/dev/null)
if [ "$remote_branch" == "" ]; then
    dirty=".dirty"
fi
echo "${ver_major}.${ver_minor}.${ver_patch}+${commits}.${sha}${dirty}"

This will result in a version number that looks something like this:

v1.0.2+2.2ec9c07923.dirty
The git-vsn script can be found at https://github.com/rlipscombe/git-vsn; this blog post refers to the v1.0.0 tag.

I wrote a follow-up post, which talks about overriding the behaviour, and how to deal with the commit count properly.