nbprocess tutorial

A step by step guide

nbprocess is a system for exploratory programming. In practice, programming in this way can feel very different to the kind of programming many of you will be familiar with, since we’ve mainly be taught coding techniques that are (at least implicitly) tied to the underlying tools we have access to. I’ve found that programming in a “notebook first” way can make me 2-3x more productive than I was before (when we used vscode, Visual Studio, vim, PyCharm, and similar tools).

In this tutorial, I’ll try to get you up and running with the basics of the nbprocess system as quickly and easily as possible. You can also watch this video in which I take you through the tutorial, step by step (to view full screen, click the little square in the bottom right of the video; to view in a separate Youtube window, click the Youtube logo):

Set up Your Jupyter Server

Jupyter Environment

To complete this tutorial, you’ll need a Jupyter Notebook Server configured on your machine. If you have not installed Jupyter before, you may find the Anaconda Individual Edition the simplest to install.

If you already have experience with Jupyter, please note that everything in this tutorial must be run using the same kernel.

Install `nbprocess`

No matter how you installed Jupyter, you’ll need to manually install nbprocess. You can install nbprocess with pip or conda from a terminal window:

pip install nbprocess

conda install -c fastai nbprocess

Jupyter notebook has a terminal window available, so we’ll use that: 1. Start jupyter notebook 2. From the “New” dropdown on the right side, choose Terminal. 3. Enter “python -m pip install nbprocess” (or use conda as specified above)

When the command completes, you’re ready to go.

Set up Repo

Create New Project

To create your new project repo, use the cli command nbprocess_new to create a new nbprocess project from an existing GitHub repo that you have cloned locally. To create a new GitHub repo locally, we recommend the gh cli tool, which allows you to create a new repo with the command gh repo create.

Alternatively, you can create a new empty github repository using this link, and follow the instructions on github to clone the repository locally before running the command nbprocess_new

GitHub pages

The nbprocess system uses quarto for documentation. You can host your site for free on Github Pages without any additional setup, so this is the approach we recommend (but it’s not required; any static site hosting will work fine).

After you setup your repo and push to GitHub following the steps below, GitHub pages will automatically be built and enabled for you using continuous integration CI. We will discuss how CI works later in this tutorial, however for most people this should work by default.

NOTE: Don’t expect your Pages to build & deploy properly yet; we still have some setup to do first!

Previewing Documents Locally

It is often desirable to preview the documentation locally before having it built and rendered by GitHub Pages. This requires you to run Quarto locally. You can run the command nbprocess_preview from the root of your repo to preview the documentation locally

Edit settings.ini

Next, edit the settings.ini file in your cloned repo. This file contains all the necessary information for when you’ll be ready to package your library. The basic structure (that can be personalized provided you change the relevant information in settings.ini) is that the root of the repo will contain your notebooks, the docs folder will contain your auto-generated docs, and a folder with a name you select will contain your auto-generated modules.

You’ll see these commented out lines in settings.ini. Uncomment them, and set each value as needed.

# lib_name = your_project_name
# repo_name = name of github repo
# user = your_github_username
# description = A description of your project
# keywords = some keywords
# author = Your Name
# author_email = [email protected]
# copyright = Your Name or Company Name
# branch = The default branch of your GitHub repo (usually either master or main)

We’ll see some other settings we can change later.

Install git hooks to avoid and handle conflicts

Jupyter Notebooks can cause challenges with git conflicts, but life becomes much easier when you use nbprocess. As a first step, run nbprocess_install_hooks in the terminal from your project folder. This will set up hooks which will remove metadata from your notebooks when you commit, greatly reducing the chance you have a conflict.

But if you do get a conflict later, simply run nbprocess_clean filename.ipynb. This will replace any conflicts in cell outputs with your version, and if there are conflicts in input cells, then both cells will be included in the merged file, along with standard conflict markers (e.g. =====). Then you can open the notebook in Jupyter and choose which version to keep.

Start the Documentation Server

You can call nbprocess_preview from the root of the repo to start the documentation server so you can see how your docs will render as you edit your notebooks. This is optional, but often useful especially if you are writing docs.

Edit 00_core.ipynb

Now, run jupyter notebook, and click 00_core.ipynb (you don’t have to start your notebook names with a number like we do here; but we find it helpful to show the order you’ve created your project in). You’ll see something that looks a bit like this:

#|default_exp core

module name here

API details.

Let’s explain what these special cells mean.

Module name and summary

The markdown cell uses special syntax to define the title and summary of your module. Feel free to replace “module name here” with a title and “API details.” with a summary for your module.

Add a function

Let’s add a function to this notebook, e.g.:

#|export
def say_hello(to):
    "Say hello to somebody"
    return f'Hello {to}!'

Notice how it includes #|export at the top - this means it will be included in your module, and documentation. The documentation will look like this:

say_hello

 say_hello (to)

Say hello to somebody

Add examples and tests

It’s a good idea to give an example of your function in action. Just include regular code cells, and they’ll appear (with output) in the docs, e.g.:

say_hello("Isaac")

Examples can output plots, images, etc, and they’ll all appear in your docs, e.g.:

from IPython.display import display,SVG

display(SVG('<svg height="100"><circle cx="50" cy="50" r="40"/></svg>'))

You can also include tests:

assert say_hello("Hamel")=="Hello Hamel!"

You should also add markdown headings as you create your notebook; one benefit of this is that a table of contents will be created in the documentation automatically.

Edit index.ipynb

Now you’re ready to create your documentation home page and README.md file; these are both generated automatically from index.ipynb. So click on that to open it now.

You’ll see that there’s already a line there to import your library - change it to use the name you selected in settings.ini. Then, add information about how to use your module, including some examples. Remember, these examples should be actual notebook code cells with real outputs.

Build lib + test

Now you can create your python module. To do so, just run nbprocess_prepare from the terminal at the root of your project folder. nbprocess_prepare bundles the following commands together for you to test your code and build the library. While running nbprocess_prepare is convenient, you have the flexibility to choose these seperate pieces. - nbprocess_export: Builds the .py modules and library from the jupyter notebook - nbprocess_test: Tests all your notebooks - nbprocess_clean: Cleans your notebooks to get rid of extreanous output for Github

Sometimes you may want to ensure you have the latest version of your python library and quarto installed. You can run nbprocess_install to do an editable install of your local python library as well as fetch and install the latest version of Quarto.

Preview The docs

If you have not already, you should view your docs in fully rendered form to catch any mistakes. You can preview your documentation site with the command nbprocess_preview. Note that your docs will build automatically in CI (discussed below).

Commit to Github

You can now check-in the generated files with git add, git commit and git push. (You can use git status to check which files have been generated.) The following command should be sufficient:

git add -A; git commit -m'check in files'; git push

Wait a minute or two for Github to process your commit, and then head over to the Github website to look at your results.

Continuous Integration (CI)

Back in your project’s Github main page, click where it says 1 commit (or 2 commits or whatever). Hopefully, you’ll see a green checkmark next to your latest commit. That means that your documentation site built correctly, and your module’s tests all passed! This is checked for you using continuous integration (CI) with GitHub actions. This does the following:

check the notebooks have been cleaned of needless metadata to avoid merge conflicts (with nbprocess_clean)
run the tests in your notebooks (with nbprocess_test)

The template contains a basic CI that uses the two points above, edit the file .github/workflows/test.yaml to your liking and comment out the parts you don’t want.

If you have a red cross, that means something failed. Click on the cross, then click Details, and you’ll be able to see what failed.

Automatically Building Docs

CI will automatically build docs and deploy them for you. You can see the code for this in .github/workflows/deploy.yaml, but you normally don’t have to worry about this unless you need to customize something. There might be certain circumstances in which your organization has disabled GitHub pages by default. If this is the case, you can enable Github Pages by clicking on the Settings tab in your repo, then click Pages on the left side bar. Set “Source” to gh-pages branch and the /root folder. Once you’ve saved, if you refresh that page, Github will have a link to your new website. Copy that URL, and then go back to your main repo page, click “edit” next to the description and paste the URL into the “website” section. While you’re there, go ahead and put in your project description too.

Docs URL

To see the URL for your docs site, you can go to the Settings tab on your GitHub repo, click Pages on the left hand side, and your URL will be displayed there. If you need to customize the domain name, see this article.

View docs and readme

Once everything is passing, have a look at your readme in Github. You’ll see that your index.ipynb file has been converted to a readme automatically.

Next, go to your documentation site (e.g. by clicking on the link next to the description that you created earlier). You should see that your index notebook has also been used here.

Congratulations, the basics are now all in place! Let’s continue and use some more advanced functionality.

Add a class

Create a class in 00_core.ipynb as follows:

#|export
class HelloSayer:
    "Say hello to `to` using `say_hello`"
    def __init__(self, to): self.to = to
        
    def say(self):
        "Do the saying"
        return say_hello(self.to)

This will automatically appear in the docs like this:

HelloSayer

 HelloSayer (to)

Say hello to to using say_hello

Document with show_doc

However, methods aren’t automatically documented. To add method docs, use show_doc:

show_doc(HelloSayer.say)

HelloSayer.say

 HelloSayer.say ()

Do the saying

And add some examples and/or tests:

o = HelloSayer("Alexis")
o.say()

Add links with backticks

Notice above there is a link from our new class documentation to our function. That’s because we used backticks in the docstring:

    "Say hello to `to` using `say_hello`"

These are automatically converted to hyperlinks wherever possible. For instance, here are hyperlinks to HelloSayer and say_hello created using backticks.

Set up autoreload

Since you’ll be often updating your modules from one notebook, and using them in another, it’s helpful if your notebook automatically reads in the new modules as soon as the python file changes. To make this happen, just add these lines to the top of your notebook:

%load_ext autoreload
%autoreload 2

Add in-notebook export cell

It’s helpful to be able to export all your modules directly from a notebook, rather than going to the terminal to do it. All nbprocess commands are available directly from a notebook in Python. Add this line to any cell and run it to export your modules (I normally make this the last cell of my notebooks).

from nbprocess.doclinks import nbprocess_export
nbprocess_export()

Run tests in parallel

Before you push to github or make a release, you might want to run all your tests. nbprocess can run all your notebooks in parallel to check for errors. Just run nbprocess_test in a terminal.

Code Execution & Skipping Cells

If you want to prevent code from getting executed when rendering or testing docs, use the comment |#eval: false in a code cell.

See the quarto docs for more execution options.

View docs locally

If you want to look at your docs locally before you push to Github, you can do so by running nbprocess_preview.

Set up prerequisites

If your module requires other modules as dependencies, you can add those prerequisites to your settings.ini in the requirements section. The requirements should be separated by a space and if the module requires at least or at most a specific version of the requirement this may be specified here, too.

For example if your module requires the fastcore module of at least version 1.0.5, the torchvision module of at most version 0.7 and any version of matplotlib, then the prerequisites would look like this:

requirements = fastcore>=1.0.5 torchvision<0.7 matplotlib

In addition to requirements you can specify dependencies with other keywords that have different scopes. Below is a list of all possible dependency keywords:

requirements: Passed to both pip and conda setup
pip_requirements: Passed to pip setup only
conda_requirements: Passed to conda setup only
dev_requirements: Passed to pip setup as a development requirement

For more information about the format of dependencies, see the pypi and conda docs on creating specifications in setup.py and meta.yaml, respectively.

Set up console scripts

Behind the scenes, nbprocess uses that standard package setuptools for handling installation of modules. One very useful feature of setuptools is that it can automatically create cross-platform console scripts. nbprocess surfaces this functionality; to use it, use the same format as setuptools, with whitespace between each script definition (if you have more than one).

console_scripts = nbprocess_export=nbprocess.cli:nbprocess_export

Test with editable install

To test and use your modules in other projects, and use your console scripts (if you have any), the easiest approach is to use an editable install. To do this, cd to the root of your repo in the terminal, and type:

pip install -e .

(Note that the trailing period is important.) Your module changes will be automatically picked up without reinstalling. If you add any additional console scripts, you will need to run this command again. After doing an editable install you can run nbprocess_test to run all of the tests in your notebooks.

Upload to pypi

If you want people to be able to install your project by just typing pip install your-project then you need to upload it to pypi. The good news is, we’ve already created a fully pypi compliant installer for your project! So all you need to do is register at pypi (click “Register” on pypi) if you haven’t previously done so, and then create a file called ~/.pypirc with your login details. It should have these contents:

[pypi]
username = your_pypi_username
password = your_pypi_password

Another thing you will need is twine, so you should run once

pip install twine

To upload your project to pypi, just type nbprocess_pypi in your project root directory. Once it’s complete, a link to your project on pypi will be printed.

Upload to pypi and conda

The command nbprocess_release from the root of your nbprocess repo will bump the version of your module and upload your project to both conda and pypi.

Install collapsible headings and toc2

There are two jupyter notebook extensions that I highly recommend when working with projects like this. They are:

Collapsible headings: This lets you fold and unfold each section in your notebook, based on its markdown headings. You can also hit left to go to the start of a section, and right to go to the end
TOC2: This adds a table of contents to your notebooks, which you can navigate either with the Navigate menu item it adds to your notebooks, or the TOC sidebar it adds. These can be modified and/or hidden using its settings.

Math equation support

nbprocess supports equations (using Quarto). You can include math in your notebook’s documentation using the following methods.

Using $$, e.g.:


$$\sum_{i=1}^{k+1}i$$

Which is rendered as:

\[\sum_{i=1}^{k+1}i\]

Using $, e.g.:

This version is displayed inline: $\sum_{i=1}^{k+1}i$ . You can include text before and after.

Which is rendered as:

This version is displayed inline: $\sum_{i=1}^{k+1}i$ . You can include text before and after.

For more information, see the Quarto Docs

Controling Cell and Output Visibility

To control what is displayed or hidden in the docs from a notebook, you will want to use one more directives. Directives are special comments that have are preceeded by |# that do some kind of pre or post processing of a notebook data before docs are rendered. Some of these directives are part of Quarto, but others are ones that we have added to nbprocess. A walk-through of the most common ones are below:

`|#hide`

When you use this directive, you will not see the cell input or output.

`|#echo: false`

This makes sure that only the output of a code cell is shown, not its input.

`|#hide_line`

You can use this to hide as specific line in your code. for example:

def _secret(): ...

for i in range(3):
    _secret() #|hide_line
    print(i)

becomes this:

def _secret(): ...

for i in range(3):
    print(i)

`#|filter_stream`

This allows you to filter lines containing specific keywords in cell outputs. For example


#|filter_stream FutureWarning MultiIndex
print('\n'.join(['A line', 'Foobar baz FutureWarning blah', 
                 'zig zagMultiIndex zoom', 'Another line.']))