When I started writing the Agile Codeline Patterns I decided to write in markdown. Working in text made my workflow straightforward, and since my end goal is to publish on LeanPub and I wanted an easy way to make drafts available on my website. (My site uses Hugo for static site generation.) Generating a bibliography was an unexpected challenge. Many solutions worked with Word or RTF formatted documents, but finding a simple solution for markdown or text seems hard. I found a process that works after much searching and I hope that this makes this less challenging for someone else..
The Requirements
Based on adding citation markers in a Markdown document, I wanted to be able to:
Generate a bibliography in whatever citation style I want.
Export pages to a format I could use to publish chapters on my Hugo site. This meant generating markdown output having YAML Metadata.
Minimize the cost, and the number of tools.
Simplicity: once I had a process, I wanted to make it as automatic as possible.
I looked at various reference managers, both paid and free, and none seems to have a good “scan markdown” mechanism.
This post made me think I could make this work with simple scripting.
Tool Chain
These are the tools I used. The interesting parts after exporting a Markdown file with images from the editor.
Ulysses for editing. Any editor that lets you edit markdown works.
Zotero for references with the Better BibTeX Plugin (installation instructions)
Pandoc for inserting references
(optionally) YQ for parsing the document header. I use the document title to generate an output directory for the Hugo site.
Document Prep
Since the source document is text, the process is simply:
Insert a citation key in better bibtex format. It will look like `@cohnAgileEstimatingPlanning2006`
Run a script to generate a bilblography.
The Basic Parts
An export of the Zotero library in Better BibTeX format
A directory containing a markdown file and any referenced image files
A script that runs Pandoc to insert the reference
A Customized output template
The Template
To ensure that the YAML metadata is the in the correct format, you’ll want to customize the standard markdown_mmd
template:
Get a copy of the default template run
pandoc -D markdown_mmd> output_template.mmd
Edit the template files to add YAML delimiters around the
$titleblock
variable:
$if(titleblock)$
---
$titleblock$
---
The Pandoc command line that does the processing is:
PANDOC_OPTS="-s --from markdown --to markdown_mmd+yaml_metadata_block --template output_template.md -M date:$(date +"%Y-%m-%d") --citeproc --csl=ieee.csl --metadata link-citations=true
pandoc $PANDOC_OPTS "input/doc/in.md" --output="output/doc/index.md"
Key parts:
-s
(standalone) ensures that YAML metadata shows up in the output-to markdown_mmd+yaml_metadata_block
generates a markdown document with references (markdown
does not support references) including the YAML metadata block.-M date:$(date +"%Y-%m-%d")
ss optional and updates the date header on the output file.--csl=ieee.csl
specifies the format for citations. You can download other formats from the Zotero Style Repository--bibliography library.bib
points to an export of your Zotero library.--citeproc
to generate the bibliography--metadata link-citations=true
:generate the hyperlinks between the references and the bibliography. This is optional.
Process
This process assumes that you have a directory structure that looks like this:
library.bib input/ output/
In practice, the input and output can be anywhere,
In Zotero, export the library in BetterBibtex format.
Place the input with any images in a subdirectory if
input
Run the pandoc command. You will also want to copy any image files from the input directory
At this point, you should have a new markdown file with citation markers replaced with references and a bibliography at the end of the document.
The Whole Script
I wanted to process a few files at once and base the output on the YAML title
header value. This is the script I ended up with
#!/usr/bin/env bash
INPUT_DIR=inbox
OUTPUT_DIR=outbox
# update date header field
OUTFILE_OPTS="-s --from markdown --to markdown_mmd --template output_template.md -M date:$(date +"%Y-%m-%d")"
# options related to citation style and sources
CITE_OPTS="--bibliography library.bib --citeproc --csl=ieee.csl --metadata link-citations=true"
# combined options
PANDOC_OPTS="$OUTFILE_OPTS $CITE_OPTS"
# Set the directory containing the subdirectories
main_directory=$INPUT_DIR
# Check if the main directory exists and is a directory
if [ ! -d "$main_directory" ]; then
echo "Error: '$main_directory' is not a directory or does not exist."
exit 1
fi
# Loop through each item in the main directory
for item in "$main_directory"/*; do
# Check if the item is a directory
if [ -d "$item" ]; then
subdirectory="$item"
echo "Processing directory: $subdirectory"
# Run pandoc for each subdirectory
# this assumes 1 .md file per exported document. It can be named anything that ends in .md
find "$subdirectory" -name "*.md" -print0 | while IFS= read -r -d $'\0' md_file; do
echo "Processing markdown file: $md_file"
# get the outdir based on the title attribute. replace spaces with - and change to lower case
doc_web_dir=$(yq --front-matter="extract" ".title" $md_file | tr ' ' '-' | tr '[:upper:]' '[:lower:]')
# Create output directory if it doesn't exist
output_dir="$OUTPUT_DIR/$doc_web_dir"
mkdir -p "$output_dir"
in_dir=$(dirname $md_file)
# move the input file to a known name since we want an index file
in_file="$in_dir/infile.md"
mv $md_file $in_file
# Construct output filename
filename=$(basename "$md_file" .md)
output_file="$output_dir/index.md"
# Run pandoc
pandoc $PANDOC_OPTS "$in_file" --output="$output_file"
# copy other files to outbox
cp $in_dir/*.png $output_dir
done
echo "Finished processing directory: $subdirectory"
fi
done
echo "Finished processing all directories."
exit 0
Next Steps and Wrap-up
Once I found the right tool set, this was easier than I expected, but I was too focused on looking at reference manager support for processing text files. I could use a Makefile to further automate the process so that I only update target files that have changed.
Perhaps this can be a starting point for your work if you have a similar problem.