Reading git range-diff
output
The git range-diff
command compares two commit ranges. You
can use this to check how a branch changed before and after a rebase, or to
find differences between a commit on your main
branch and a corresponding
backport on a long-term-support branch.
If you’ve ever tried to use git diff
to figure out the changes introduced by
resolving merge conflicts, you might understand the pain point this helps
address — git diff
shows you all the changes between two commits, so you
get all the changes introduced by the merge as well as all the conflict
resolutions. git range-diff
factors these changes out and makes it a lot
easier to see how a branch changed over time. (It is an endless pet peeve of
mine that GitHub does not give any visibility into how a PR changes when you
push commits to it. As a result, many developers who have
only used version control through GitHub have no idea that this sort of tool is
even possible, despite being trivial to construct.)
I’ve started to use git range-diff
more lately, but I’ve been somewhat
baffled by the output format, which the man page considers unworthy of
explanation. Last night I read some of the Git source code to figure out how
git range-diff
works — here’s what I found.
Diffs of diffs
git range-diff
runs two git diff
operations and then diff
s the
results. The output syntax is not very special — it’s literally just a
diff of diffs. The first column of diff markers shows you how the diffs
changed, and the second column shows you the diff itself.
Let’s walk through a set of changes to learn how to read diffs of diffs. Suppose we have a shell script which finds YAML files:
find \
-path '*.yaml'
You notice that it’s not picking up files with the .yml
extension, so you add
another clause:
find \
-path '*.yaml' \
-o -path '*.yml'
Meanwhile, your colleague runs ShellCheck on the script, notices
that some implementations of find
have no default path
and adds one to fix the script on macOS and other BSDs:
find \
. \
-path '*.yaml'
You merge in your colleague’s changes, producing this result:
find \
. \
-path '*.yaml' \
-o -path '*.yml'
Then, you run git range-diff main BEFORE_MERGE AFTER_MERGE
, perhaps using
git reflog
to find the commit hashes before and after merging
in the changes from main
. Here’s the diff output:
## find-yaml.sh ##
@@
- #!/usr/bin/env bash
find \
+ . \
- -path '*.yaml'
+ -path '*.yaml' \
+ -o -path '*.yml'
First, note the - #!/usr/bin/env bash
line. Due to the extra argument to
find
added in main
, the changes in the AFTER_MERGE
commit start one line
later in the file, so the diff’s context starts one line later as well. This
shows up as the context line being removed, even though it wasn’t changed on
either side!
This is a rather confusing result of git range-diff
’s extremely naïve
implementation — it makes no effort to distinguish between context lines and
actual meaningful elements of the diff. This is the sort of obvious UX pitfall
that made me assume that surely git range-diff
would do something
slightly smarter than literally diffing two diffs, but I don’t know why I
expected that. It’s Git, after all.
Similarly, the next +
line shows us that the .
argument is present in the
AFTER_MERGE
commit, but that it wasn’t added by the AFTER_MERGE
commit.
Unlike git diff
, which shows you all changes between two commits, git range-diff
will only show you changes from previous commits when they’re
present as context in the diffs of the commits you actually care about.
The rest of the changes are present in both versions of the diff.
To help make it easier to distinguish between a diff marker in the first column
and one in the second column, git range-diff
applies “dual
coloring”, which colors the background (instead of the text) of
the first column of diff markers.
Suppose you want to make your intent clearer, so you go and add a comment to explain your change:
# Make sure to find `.yaml` and `.yml` files!
find \
. \
-path '*.yaml' \
-o -path '*.yml'
You amend your commit and run the range-diff again. Now, we see this line in the output:
++# Make sure to find `.yaml` and `.yml` files!
This is showing us that we’ve added a line in the AFTER_MERGE
commit only. If
you’re comparing a commit before and after fixing merge conflicts, this could
indicate a new change that was added unintentionally, so pay attention to these!
Exercise for the reader: What do --
, -+
, and +-
lines indicate?
Diffs of logs
git diff
only deals with the contents of files, but git range-diff
deals
with ranges of commits, so those need to be compared as well!
Here’s how Git shows a series of commits being compared:
1: d17004e2f4 = 1: 92954fddce rts: Tighten up invariants of PACK
2: 42f1801df4 < -: ---------- testsuite: Fix badly escaped literals
-: ---------- > 2: cdfd86e951 testsuite: Fix badly escaped literals
3: 4d7afaaa7f = 3: 2dbf88daed rts/Interpreter: Improve documentation of TEST*_P instructions
4: 5f7c2d3e99 ! 4: e646db18c4 rts: Annotate BCOs with their Name
From left to right, we have:
-
The position of the left-hand side’s commit in the sequence being compared. For example,
1:
indicates this is the first commit in the range of commits on the LHS, and-:
indicates that no corresponding commit was found on the left-hand side. -
The (abbreviated) commit hash of the LHS commit, like
d17004e2f4
. -
A marker indicating if the commits are identical (
=
), only present on the LHS (<
), only present on the RHS (>
), or different (!
). -
The position of the RHS’s commit in the sequence being compared.
-
The (abbreviated) commit hash of the RHS commit, like
92954fddce
. Note that this can be different than the LHS commit hash even if the diffs are identical, because the parent commits can be different. -
The subject line of the commit message. If the message differs between the LHS and RHS, the RHS’s commit message is used here and a diff between the two commit messages is shown.
Note that in the middle of the log, we have two commits that Git can’t find a match for in the other side of the sequence:
2: 42f1801df4 < -: ---------- testsuite: Fix badly escaped literals
-: ---------- > 2: cdfd86e951 testsuite: Fix badly escaped literals
Similar to git diff
’s rename detection, git range-diff
sets a threshold beyond which large changes between diffs will be considered
entirely different commits. We can adjust these costs by setting
--creation-factor=90
(a percentage where higher numbers are more forgiving of
large changes) to force the comparison of the two diffs to be shown.
Parting thoughts
I felt pretty silly once I realized that the git range-diff
output is
literally just a diff of diffs. In fact, when reading the Git source code to
figure out how it was formatting the output, Jade discovered that git range-diff
actually runs git log --patch
to get
the diffs it feeds into the meta-diff. Perhaps the authors assumed the
implementation was so obvious that the output needed no explanation!
Even if you don’t know quite how the git range-diff
output works, it’s still
a useful tool for comparing ranges because it filters out a lot of the noise,
and if you know the before/after of both ranges it’s usually pretty easy to
reconstruct the context.
If you haven’t used git range-diff
before, give it a try! I’ve been finding
it very handy to have in my toolkit lately and wanted to help make it more
accessible for more people. Happy rebasing!