Filters

Overview

If the base features of Pandoc and Quarto don’t do exactly what you need, you can very likely create a Pandoc Filter that bridges the gap. It’s also possible to create filters that preprocess the contents of Jupyter Notebooks (.ipynb) before they are rendered by Quarto and Pandoc. We’ll describe each of these filters in turn below.

Pandoc Filters

Pandoc consists of a set of readers and writers. When converting a document from one format to another, text is parsed by a reader into pandoc’s intermediate representation of the document—an “abstract syntax tree” or AST—which is then converted by the writer into the target format. The pandoc AST format is defined in the module Text.Pandoc.Definition in the pandoc-types package.

A “filter” is a program that modifies the AST, between the reader and the writer.

INPUT --reader--> AST --filter--> AST --writer--> OUTPUT

Pandoc’s built-in citation processing is implemented as a filter, as are many of Quarto’s extensions (e.g. cross-references, figure layout, etc.). Some other examples include:

Filter Description
spellcheck Checks the spelling of words in the body of the document (omitting metadata).
pandoc-quotes Replaces non-typographic quotation marks with typographic ones for languages other than American English

Using Filters

Add one or more filters to document rendering using the filters option. For example:

filters:
   - spellcheck.lua

By default, user filters are run after Quarto’s built-in filters. For some filters you’ll want to modify this behavior For example, here we arrange to run pandoc-quotes.lua before Quarto’s filters and spellcheck.lua after:

filters:
  - pandoc-quotes.lua
  - quarto
  - spellcheck.lua

Writing Filters

You can write Pandoc filters using Lua (via Pandoc’s built-in Lua interpreter) or using any other language using a JSON representation of the Pandoc AST piped to/from an external process.

We strongly recommend using Lua Filters, which have the following advantages:

  • No external dependencies

  • High performance (no serialization or process execution overhead)

  • Access to Pandoc’s library of Lua helper functions.

See the documentation on Writing Lua Filters for additional details.

If you want to write a JSON filter, see the documentation on Writing JSON filters.

Notebook Filters

Whenever you use Jupyter for embedded computations a notebook (.ipynb) is part of the rendering pipeline. This is naturally the case for rendering .ipynb files directly, however it is also the case for .qmd files that have embedded Julia or Python cells (in which case a temporary .ipynb is constructed). This notebook is ultimately converted into markdown and then rendered by Pandoc to the specified output format(s).

You may wish to do some pre-processing on the notebook prior to its conversion to markdown. This can be accomplished by specifying one or or more ipynb-filters. These filters are passed the JSON representation of the notebook on stdin and should write a transformed JSON representation to stdout.

For example, this notebook filter uses the nbformat package to read a notebook, prepend a comment to the source of each code cell, and then write it back to stdout:

import sys
import nbformat

# read notebook from stdin
nb = nbformat.reads(sys.stdin.read(), as_version = 4)

# prepend a comment to the source of each cell
for index, cell in enumerate(nb.cells):
  if cell.cell_type == 'code':
     cell.source = "# comment\n" + cell.source
  
# write notebook to stdout 
nbformat.write(nb, sys.stdout)

You can arrange for this filter to be run using the ipynb-filters option (specified at either the document or project level):

---
ipynb-filters:
  - filter.py
---

Note that the current working directory for the filter will be set to the location of the input notebook.