YAML metadata in Markdown

When I try to make my blog posts from the Markdown documents, I usually need to put the metadata somewhere. Since there is no database present, one place to put where they can be put is right into the document.

Metadata are data about the data. In the context of blog posts, metadata are usually category, tags, slug, time or title. Unfortunately, Markdown does not natively support metadata syntax. It is also important to note that these information should not be rendered in the final document, but it should be possible to process them during parsing.

Markdown comments

The easiest way to accomplish this is to use Markdown comment, for example to denote category like this:

[//]: # "programming"

However, this approach is not ideal. Since there are multiple ways to represent comment in a Markdown document, the parser doing the Markdown parsing has to understand all the used comment syntaxes. This is not much of a problem when only a single person is writing the blog, but could become problem in the future, should newcomers join the team.

The second problem with the comment approach is that the metadata should be parsed further. If only simple data are used, for instance just a category, like in the example above, writing a very naïve parser should not take more than a few lines and call it done.

The moment we want to insert multiple differing data structures there, like it may pay off to simply delegate this task to the dedicated parser. Consider the following example:

[//]: # "programming"
[//]: # "yaml, markdown, metadata"

Still relatively trivial to create parser.

Get everything in the first line after the colon as a category
Get everything in the second line after the colon and split it by the comma as tags

But it starts to get pretty complicated too soon. Now the third line is the title. We need to remember to split only the tags by a comma, not the title:

[//]: # "programming"
[//]: # "yaml, markdown, metadata"
[//]: # "A simple, yet powerful parser"

If we wanted to make order of the metadata irrelevant, we could complicate it further:

[//]: # "Title: A simple, yet powerful parser"
[//]: # "Category: programming"
[//]: # "Tags: yaml, markdown, metadata"

But now we need to make sure there are no typos. Title looks similar to the Title, but now the parser would be confused. It could throw an exception if all the metadata was required and some of them would be missing, so typos could still be managed quite well. But then, what if some metadata were optional? Or worse, what if we wanted custom metadata? How would we denote if it is a string, like is the case of the title, or is it an array, like is the case with tags? What about the security concerns with XSS? What about testing all this? And so on.

YAML Ain't Markup Language

There is a punch within recursive acronyms. One of the well known ones that falls to this category is GNU, a recursive acronym for GNU's Not UNIX. As you can see, the acronym for YAML is also recursive. YAML is not a markup language, like XML or HTML would be, precisely because it is considered to be a data-serialization language.

Utilizing YAML parser gets us rid of the problems mentioned before. It is documented, tested, it's upsides and downsides are known and it's security considerations are available to read. Using YAML in Markdown to denote metadata is not a new concept - it is known as Front Matter. In the Markdown blogs space, it is for instance used by the Jekyll or even Hugo , among few. These projects does not use javascript however.

Rewriting the last example into YAML would look like this:

Title: "A simple, yet powerful parser"
Category: "programming"
Tags: ["yaml", "markdown", "metadata"]

To make it a Front Matter YAML in a Markdown, we need to surround it with the ---:

---
title: "A simple, yet powerful parser"
cathegory: "programming"
taxonomies:
  tags: ["markdown", "yaml", "metadata"]
---

To parse this kind of document Javascript correctly and easily I have chosen remark-frontmatter from the Unified ecosystem and js-yaml packages. The entire code would look like this:

const fs = require("fs")
const yaml = require("js-yaml")
const unified = require("unified")
const parse = require("remark-parse")
const stringify = require("remark-stringify")
const frontmatter = require("remark-frontmatter")
const select = require("unist-util-select").select

let tree

unified()
  .use(parse)
  .use(stringify)
  .use(frontmatter, ["yaml"])
  .use(() => t => (tree = t))
  .process(fs.readFileSync("example.md"))

const yamlNode = select("yaml", tree)
const parsedYaml = yaml.safeLoad(yamlNode.value)

module.exports = parsedYaml

The sources are available in the repository