Git on Vercel
Ever wrote a project that relied on the git history for some of its features? I did, and if you said no, you may be using a library or a framework that does. This very site, for example, is built using Starlight, Astro’s documentation theme, which uses the git history to generate the “last edited” date on each page. I expanded on that feature to include the first published data of each page as well. That raises a problem, though.
Some background
Some projects might have a very long history, thousands or even hundreds of thousands of commits. Vercel, and multiple other platforms, have integrations with git platforms that allow your site to be rebuilt on Vercel’s own build system every time you push or merge some changes. Loading all of that history takes time for every build and resources that Vercel is not charging you for, but still has to pay for (they limit the time spent on each build to 45 minutes, but they don’t charge you for it, it is the same even on the free tier).
One of the repositories that I work with has almost a million commits, and cloning it requires fetching, decompressing and computing the delta for over 4.6 billion git objects. Using the official git client, this takes about 14 minutes on my machine (gigabit internet and a 12 core 11th gen i7 CPU writing to a PCIe 4.0 NVMe SSD). Vercel’s build system is not as fast, and it takes about 30 minutes to clone the repository. That is 30 minutes that Vercel is paying for, and 30 minutes that I am waiting for the build to finish.
Faster cloning
For most projects, this is completely unnecessary. You don’t generally need the entire
history of a project to build its site, you only need the latest version of the code. And
that is where the --depth
flag comes in. It allows you to specify how many commits you
want to fetch from the remote repository, and it will only fetch those, with the git
platform collapsing all the previous commits into a single snapshot and caching that
result.
Cloning that same million-commit repository with a depth of 1 takes about 4 seconds on my machine and 14 seconds on Vercel’s build system. That is such a huge improvement and at such a low cost that Vercel does it by default for every project. In fact, they don’t even allow you to change it, and that is where it may become a problem.
When it becomes a problem
If your project, be it directly or through a dependency, relies on the git history, you may start seeing some weird behavior. The worst thing about this problem is how hard it is to spot. Nothing will break per se, whenever you try to resolve a commit, it won’t resolve to what you expect, but it will still resolve to a valid commit.
Using Starlight as an example, the “last edited” date is generated by resolving the commit that last modified the file. If you have a file that was last edited more than 10 commits ago, it won’t resolve to the commit that actually last edited it, it will resolve to “10 commits ago” since that is “first commit” of the repo, a single commit that contains everything that happened before it, So the “last edited” date of a file that you haven’t edited in a while will be updated every time you push a new change, even if that change was not on that file.
This may confuse people, especially if someone tries to contribute to your project. Imagine you see a typo on a page that shows it was last edited a few days ago. You click on the “Edit this page” button and are directed to a file that was last edited years ago. What would you think? Would you think that the “last edited” date is wrong or that there are edits are not made on the file you are changing? I would think the latter. This problem is called out on Starlight’s documentation and they provide a way for you to manually set the “last edited” date per page, but that is not a solution, it is a workaround.
Retrieving the full history
First, what exactly does Vercel do? On their “Debugging Command Locally” section,
they say you should use the command git clone --depth 10
to reproduce the same behavior
locally. When you clone a repository with a custom depth, you can later fetch more commits
(known as “deepening”) or even all the commits (known as “unshallowing”).
If you want to try that locally, you can use the command git fetch --depth=<number>
with
a larger number to deepen to that number of commits. Alternatively, you can run
git fetch --deepen=<number>
to deepen by that number of commits (added to however many
you already have). And finally, you can run git fetch --unshallow
to fetch all the
commits that you don’t have yet.
At least that is how you could get the full history if the repo was cloned using the
command Vercel tells us to use for reproducing their environment. But that is not what
they do. In fact, if you want to reproduce their environment, you should not run the
git clone
command at all.
To retrieve the full history up to the commit that triggered the build, you need to fetch from your repository explicitly. Going the long way around, you can do this:
This will pull from your remote repo and point the local master
branch to the commit
that triggered the build in that remote repo. The commit itself will have the exact
same hash, they are the same commit. However, by repositioning the local branch
explicitly, git will point the head to the commit object that is part of the full history
instead of the git object attached to the shallow history.
If you want to build that exclusively from Vercel’s environment variables, this is the command: