Skip to content

Git on Vercel

7 min read

Ever wrote a project that relied on the git history for some of its features? I did, and if you said no, you may be using a library or a framework that does. This very site, for example, is built using Starlight, Astro’s documentation theme, which uses the git history to generate the “last edited” date on each page. I expanded on that feature to include the first published data of each page as well. That raises a problem, though.

Some background

Some projects might have a very long history, thousands or even hundreds of thousands of commits. Vercel, and multiple other platforms, have integrations with git platforms that allow your site to be rebuilt on Vercel’s own build system every time you push or merge some changes. Loading all of that history takes time for every build and resources that Vercel is not charging you for, but still has to pay for (they limit the time spent on each build to 45 minutes, but they don’t charge you for it, it is the same even on the free tier).

One of the repositories that I work with has almost a million commits, and cloning it requires fetching, decompressing and computing the delta for over 4.6 billion git objects. Using the official git client, this takes about 14 minutes on my machine (gigabit internet and a 12 core 11th gen i7 CPU writing to a PCIe 4.0 NVMe SSD). Vercel’s build system is not as fast, and it takes about 30 minutes to clone the repository. That is 30 minutes that Vercel is paying for, and 30 minutes that I am waiting for the build to finish.

Faster cloning

For most projects, this is completely unnecessary. You don’t generally need the entire history of a project to build its site, you only need the latest version of the code. And that is where the --depth flag comes in. It allows you to specify how many commits you want to fetch from the remote repository, and it will only fetch those, with the git platform collapsing all the previous commits into a single snapshot and caching that result.

Cloning that same million-commit repository with a depth of 1 takes about 4 seconds on my machine and 14 seconds on Vercel’s build system. That is such a huge improvement and at such a low cost that Vercel does it by default for every project. In fact, they don’t even allow you to change it, and that is where it may become a problem.

When it becomes a problem

If your project, be it directly or through a dependency, relies on the git history, you may start seeing some weird behavior. The worst thing about this problem is how hard it is to spot. Nothing will break per se, whenever you try to resolve a commit, it won’t resolve to what you expect, but it will still resolve to a valid commit.

Using Starlight as an example, the “last edited” date is generated by resolving the commit that last modified the file. If you have a file that was last edited more than 10 commits ago, it won’t resolve to the commit that actually last edited it, it will resolve to “10 commits ago” since that is “first commit” of the repo, a single commit that contains everything that happened before it, So the “last edited” date of a file that you haven’t edited in a while will be updated every time you push a new change, even if that change was not on that file.

This may confuse people, especially if someone tries to contribute to your project. Imagine you see a typo on a page that shows it was last edited a few days ago. You click on the “Edit this page” button and are directed to a file that was last edited years ago. What would you think? Would you think that the “last edited” date is wrong or that there are edits are not made on the file you are changing? I would think the latter. This problem is called out on Starlight’s documentation and they provide a way for you to manually set the “last edited” date per page, but that is not a solution, it is a workaround.

Retrieving the full history

First, what exactly does Vercel do? On their “Debugging Command Locally” section, they say you should use the command git clone --depth 10 to reproduce the same behavior locally. When you clone a repository with a custom depth, you can later fetch more commits (known as “deepening”) or even all the commits (known as “unshallowing”).

If you want to try that locally, you can use the command git fetch --depth=<number> with a larger number to deepen to that number of commits. Alternatively, you can run git fetch --deepen=<number> to deepen by that number of commits (added to however many you already have). And finally, you can run git fetch --unshallow to fetch all the commits that you don’t have yet.

At least that is how you could get the full history if the repo was cloned using the command Vercel tells us to use for reproducing their environment. But that is not what they do. In fact, if you want to reproduce their environment, you should not run the git clone command at all.

To retrieve the full history up to the commit that triggered the build, you need to fetch from your repository explicitly. Going the long way around, you can do this:

Terminal window
git pull --unshallow <your repo git url> <commit that triggered the build>:master

This will pull from your remote repo and point the local master branch to the commit that triggered the build in that remote repo. The commit itself will have the exact same hash, they are the same commit. However, by repositioning the local branch explicitly, git will point the head to the commit object that is part of the full history instead of the git object attached to the shallow history.

If you want to build that exclusively from Vercel’s environment variables, this is the command:

Terminal window
git pull --unshallow \
"https://${PUBLIC_VERCEL_GIT_PROVIDER}.com/${PUBLIC_VERCEL_GIT_REPO_OWNER}/${PUBLIC_VERCEL_GIT_REPO_SLUG}.git" \
${PUBLIC_VERCEL_GIT_COMMIT_SHA}:master