Versioning Jupyter Notebooks With Git
There are a range of approaches to versioning Jupyter notebooks using git (e.g. here, here, and here) by removing any output before adding the notebooks to git. But they typically rely on adding a script to your executable path that can be invoked by a git filter to remove any output. Fortunately, Jupyter’s own nbconvert can achieve the same task, which
- avoids adding scripts to your exectuable path,
- ensures that removing the output is always compatible with the Jupyter and python versions you are using.
Here’s how to set it up: First, open your ~/.gitconfig
and add the following lines.
[filter "jupyter_clear_output"]
clean = "jupyter nbconvert --stdin --stdout --log-level=ERROR \
--to notebook --ClearOutputPreprocessor.enabled=True"
smudge = cat
required = true
The lines define a git filter called jupyter_clear_output
which applies the clean
filter when changes are staged and the smudge
filter when files are checked out. The smudge
filter is trivial: it just reproduces the input. The clean
filter invokes nbconvert, reading from stdin
, writing to stdout
, converting to the notebook file format, and clearing all output. The flag required = true
ensures that the filter does not fail silently.
The final step is to register the .ipynb
extension with the jupyter_clear_output
filter. If you would like to enable the filter on a per-repository basis, simply add a .gitattributes
with the following content to your repository.
*.ipynb filter=jupyter_clear_output
If you want to enable the filter globally, add the line above to ~/.gitattributes
and let git know about the attributes file by adding the following line to your ~/.gitconfig
.
[core]
attributesfile = ~/.gitattributes