Bibliographies in Markdown

Using Pandoc to generate citations when you are using Zotero

Jan 27, 2025

When I started writing the Agile Codeline Patterns I decided to write in markdown. Working in text made my workflow straightforward, and since my end goal is to publish on LeanPub and I wanted an easy way to make drafts available on my website. (My site uses Hugo for static site generation.) Generating a bibliography was an unexpected challenge. Many solutions worked with Word or RTF formatted documents, but finding a simple solution for markdown or text seems hard. I found a process that works after much searching and I hope that this makes this less challenging for someone else..

The Requirements

Based on adding citation markers in a Markdown document, I wanted to be able to:

Generate a bibliography in whatever citation style I want.
Export pages to a format I could use to publish chapters on my Hugo site. This meant generating markdown output having YAML Metadata.
Minimize the cost, and the number of tools.
Simplicity: once I had a process, I wanted to make it as automatic as possible.

I looked at various reference managers, both paid and free, and none seems to have a good “scan markdown” mechanism.

This post made me think I could make this work with simple scripting.

Tool Chain

These are the tools I used. The interesting parts after exporting a Markdown file with images from the editor.

Ulysses for editing. Any editor that lets you edit markdown works.
Zotero for references with the Better BibTeX Plugin (installation instructions)
Pandoc for inserting references
(optionally) YQ for parsing the document header. I use the document title to generate an output directory for the Hugo site.

Document Prep

Since the source document is text, the process is simply:

Insert a citation key in better bibtex format. It will look like `@cohnAgileEstimatingPlanning2006`
Run a script to generate a bilblography.

The Basic Parts

An export of the Zotero library in Better BibTeX format
A directory containing a markdown file and any referenced image files
A script that runs Pandoc to insert the reference
A Customized output template

The Template

To ensure that the YAML metadata is the in the correct format, you’ll want to customize the standard markdown_mmd template:

Get a copy of the default template run pandoc -D markdown_mmd> output_template.mmd
Edit the template files to add YAML delimiters around the $titleblock variable:

$if(titleblock)$
---
$titleblock$
---

The Pandoc command line that does the processing is:

PANDOC_OPTS="-s --from markdown --to markdown_mmd+yaml_metadata_block --template output_template.md -M date:$(date +"%Y-%m-%d")  --citeproc --csl=ieee.csl --metadata link-citations=true      
pandoc $PANDOC_OPTS "input/doc/in.md" --output="output/doc/index.md"

Key parts:

-s (standalone) ensures that YAML metadata shows up in the output
-to markdown_mmd+yaml_metadata_block generates a markdown document with references (markdown does not support references) including the YAML metadata block.
-M date:$(date +"%Y-%m-%d") ss optional and updates the date header on the output file.
--csl=ieee.csl specifies the format for citations. You can download other formats from the Zotero Style Repository
--bibliography library.bib points to an export of your Zotero library.
--citeproc to generate the bibliography
--metadata link-citations=true :generate the hyperlinks between the references and the bibliography. This is optional.

Process

This process assumes that you have a directory structure that looks like this:

library.bib  input/  output/

In practice, the input and output can be anywhere,

In Zotero, export the library in BetterBibtex format.
Place the input with any images in a subdirectory if input
Run the pandoc command. You will also want to copy any image files from the input directory

At this point, you should have a new markdown file with citation markers replaced with references and a bibliography at the end of the document.

The Whole Script

I wanted to process a few files at once and base the output on the YAML title header value. This is the script I ended up with

#!/usr/bin/env bash

INPUT_DIR=inbox
OUTPUT_DIR=outbox


# update date header field
OUTFILE_OPTS="-s --from markdown --to markdown_mmd --template output_template.md -M date:$(date +"%Y-%m-%d")"

# options related to citation style and sources
CITE_OPTS="--bibliography library.bib --citeproc --csl=ieee.csl --metadata link-citations=true"

# combined options
PANDOC_OPTS="$OUTFILE_OPTS $CITE_OPTS"


# Set the directory containing the subdirectories
main_directory=$INPUT_DIR  

# Check if the main directory exists and is a directory
if [ ! -d "$main_directory" ]; then
  echo "Error: '$main_directory' is not a directory or does not exist."
  exit 1
fi

# Loop through each item in the main directory
for item in "$main_directory"/*; do
  # Check if the item is a directory
  if [ -d "$item" ]; then
    subdirectory="$item"
    echo "Processing directory: $subdirectory"

    # Run pandoc for each subdirectory
	# this assumes 1 .md file per exported document. It can be named anything that ends in .md

    find "$subdirectory" -name "*.md" -print0 | while IFS= read -r -d $'\0' md_file; do

        echo "Processing markdown file: $md_file"

        # get the outdir based on the title attribute. replace spaces with - and change to lower case
        doc_web_dir=$(yq --front-matter="extract" ".title" $md_file |  tr ' ' '-' | tr '[:upper:]' '[:lower:]')

        # Create output directory if it doesn't exist
        output_dir="$OUTPUT_DIR/$doc_web_dir"
	    mkdir -p "$output_dir"

        in_dir=$(dirname $md_file)
        # move the input file  to a known name since we want an index file
        in_file="$in_dir/infile.md"
        mv $md_file $in_file
        # Construct output filename
        filename=$(basename "$md_file" .md)
        output_file="$output_dir/index.md"
        
		# Run pandoc
        pandoc $PANDOC_OPTS "$in_file" --output="$output_file" 

        # copy other files to outbox
        cp $in_dir/*.png $output_dir

    done

    echo "Finished processing directory: $subdirectory"
  fi
done

echo "Finished processing all directories."

exit 0

Next Steps and Wrap-up

Once I found the right tool set, this was easier than I expected, but I was too focused on looking at reference manager support for processing text files. I could use a Makefile to further automate the process so that I only update target files that have changed.

Perhaps this can be a starting point for your work if you have a similar problem.

Accidental Simplicity

Discussion about this post