Reading git range-diff output

The git range-diff command compares two commit ranges. You can use this to check how a branch changed before and after a rebase, or to find differences between a commit on your main branch and a corresponding backport on a long-term-support branch.

If you’ve ever tried to use git diff to figure out the changes introduced by resolving merge conflicts, you might understand the pain point this helps address — git diff shows you all the changes between two commits, so you get all the changes introduced by the merge as well as all the conflict resolutions. git range-diff factors these changes out and makes it a lot easier to see how a branch changed over time. (It is an endless pet peeve of mine that GitHub does not give any visibility into how a PR changes when you push commits to it. As a result, many developers who have only used version control through GitHub have no idea that this sort of tool is even possible, despite being trivial to construct.)

I’ve started to use git range-diff more lately, but I’ve been somewhat baffled by the output format, which the man page considers unworthy of explanation. Last night I read some of the Git source code to figure out how git range-diff works — here’s what I found.

Diffs of diffs

git range-diff runs two git diff operations and then diffs the results. The output syntax is not very special — it’s literally just a diff of diffs. The first column of diff markers shows you how the diffs changed, and the second column shows you the diff itself.

Let’s walk through a set of changes to learn how to read diffs of diffs. Suppose we have a shell script which finds YAML files:

find \
    -path '*.yaml'

You notice that it’s not picking up files with the .yml extension, so you add another clause:

find \
    -path '*.yaml' \
    -o -path '*.yml'

Meanwhile, your colleague runs ShellCheck on the script, notices that some implementations of find have no default path and adds one to fix the script on macOS and other BSDs:

find \
    . \
    -path '*.yaml'

You merge in your colleague’s changes, producing this result:

find \
    . \
    -path '*.yaml' \
    -o -path '*.yml'

Then, you run git range-diff main BEFORE_MERGE AFTER_MERGE, perhaps using git reflog to find the commit hashes before and after merging in the changes from main. Here’s the diff output:

  ## find-yaml.sh ##
 @@
- #!/usr/bin/env bash

  find \
+     . \
 -    -path '*.yaml'
 +    -path '*.yaml' \
 +    -o -path '*.yml'

First, note the - #!/usr/bin/env bash line. Due to the extra argument to find added in main, the changes in the AFTER_MERGE commit start one line later in the file, so the diff’s context starts one line later as well. This shows up as the context line being removed, even though it wasn’t changed on either side!

This is a rather confusing result of git range-diff’s extremely naïve implementation — it makes no effort to distinguish between context lines and actual meaningful elements of the diff. This is the sort of obvious UX pitfall that made me assume that surely git range-diff would do something slightly smarter than literally diffing two diffs, but I don’t know why I expected that. It’s Git, after all.

Similarly, the next + line shows us that the . argument is present in the AFTER_MERGE commit, but that it wasn’t added by the AFTER_MERGE commit. Unlike git diff, which shows you all changes between two commits, git range-diff will only show you changes from previous commits when they’re present as context in the diffs of the commits you actually care about.

The rest of the changes are present in both versions of the diff.

To help make it easier to distinguish between a diff marker in the first column and one in the second column, git range-diff applies “dual coloring”, which colors the background (instead of the text) of the first column of diff markers.

Suppose you want to make your intent clearer, so you go and add a comment to explain your change:

# Make sure to find `.yaml` and `.yml` files!
find \
    . \
    -path '*.yaml' \
    -o -path '*.yml'

You amend your commit and run the range-diff again. Now, we see this line in the output:

++# Make sure to find `.yaml` and `.yml` files!

This is showing us that we’ve added a line in the AFTER_MERGE commit only. If you’re comparing a commit before and after fixing merge conflicts, this could indicate a new change that was added unintentionally, so pay attention to these!

Exercise for the reader: What do --, -+, and +- lines indicate?

Diffs of logs

git diff only deals with the contents of files, but git range-diff deals with ranges of commits, so those need to be compared as well!

Here’s how Git shows a series of commits being compared:

 1:  d17004e2f4 =  1:  92954fddce rts: Tighten up invariants of PACK
 2:  42f1801df4 <  -:  ---------- testsuite: Fix badly escaped literals
 -:  ---------- >  2:  cdfd86e951 testsuite: Fix badly escaped literals
 3:  4d7afaaa7f =  3:  2dbf88daed rts/Interpreter: Improve documentation of TEST*_P instructions
 4:  5f7c2d3e99 !  4:  e646db18c4 rts: Annotate BCOs with their Name

From left to right, we have:

  1. The position of the left-hand side’s commit in the sequence being compared. For example, 1: indicates this is the first commit in the range of commits on the LHS, and -: indicates that no corresponding commit was found on the left-hand side.

  2. The (abbreviated) commit hash of the LHS commit, like d17004e2f4.

  3. A marker indicating if the commits are identical (=), only present on the LHS (<), only present on the RHS (>), or different (!).

  4. The position of the RHS’s commit in the sequence being compared.

  5. The (abbreviated) commit hash of the RHS commit, like 92954fddce. Note that this can be different than the LHS commit hash even if the diffs are identical, because the parent commits can be different.

  6. The subject line of the commit message. If the message differs between the LHS and RHS, the RHS’s commit message is used here and a diff between the two commit messages is shown.

Note that in the middle of the log, we have two commits that Git can’t find a match for in the other side of the sequence:

 2:  42f1801df4 <  -:  ---------- testsuite: Fix badly escaped literals
 -:  ---------- >  2:  cdfd86e951 testsuite: Fix badly escaped literals

Similar to git diff’s rename detection, git range-diff sets a threshold beyond which large changes between diffs will be considered entirely different commits. We can adjust these costs by setting --creation-factor=90 (a percentage where higher numbers are more forgiving of large changes) to force the comparison of the two diffs to be shown.

Parting thoughts

I felt pretty silly once I realized that the git range-diff output is literally just a diff of diffs. In fact, when reading the Git source code to figure out how it was formatting the output, Jade discovered that git range-diff actually runs git log --patch to get the diffs it feeds into the meta-diff. Perhaps the authors assumed the implementation was so obvious that the output needed no explanation!

Even if you don’t know quite how the git range-diff output works, it’s still a useful tool for comparing ranges because it filters out a lot of the noise, and if you know the before/after of both ranges it’s usually pretty easy to reconstruct the context.

If you haven’t used git range-diff before, give it a try! I’ve been finding it very handy to have in my toolkit lately and wanted to help make it more accessible for more people. Happy rebasing!