Versioning Jupyter Notebooks With Git

There are a range of approaches to versioning Jupyter notebooks using git (e.g. here, here, and here) by removing any output before adding the notebooks to git. But they typically rely on adding a script to your executable path that can be invoked by a git filter to remove any output.

Update: jupytext can store notebooks as markdown and is recommended for versioning notebooks.

Fortunately, Jupyter’s own nbconvert can achieve the same task, which

avoids adding scripts to your exectuable path,
ensures that removing the output is always compatible with the Jupyter and python versions you are using.

Here’s how to set it up: First, open your ~/.gitconfig and add the following lines.

[filter "jupyter_clear_output"]
    clean = "jupyter nbconvert --stdin --stdout --log-level=ERROR \
             --to notebook --ClearOutputPreprocessor.enabled=True"
    smudge = cat
    required = true

The lines define a git filter called jupyter_clear_output which applies the clean filter when changes are staged and the smudge filter when files are checked out. The smudge filter is trivial: it just reproduces the input. The clean filter invokes nbconvert, reading from stdin, writing to stdout, converting to the notebook file format, and clearing all output. The flag required = true ensures that the filter does not fail silently.

The final step is to register the .ipynb extension with the jupyter_clear_output filter. If you would like to enable the filter on a per-repository basis, simply add a .gitattributes with the following content to your repository.

*.ipynb    filter=jupyter_clear_output

If you want to enable the filter globally, add the line above to ~/.gitattributes and let git know about the attributes file by adding the following line to your ~/.gitconfig.

[core]
    attributesfile = ~/.gitattributes