When I try to make my blog posts from the Markdown documents, I usually need to put the metadata somewhere. Since there is no database present, one place to put where they can be put is right into the document.
Metadata are data about the data. In the context of blog posts, metadata are usually category, tags, slug, time or title. Unfortunately, Markdown does not natively support metadata syntax. It is also important to note that these information should not be rendered in the final document, but it should be possible to process them during parsing.
The easiest way to accomplish this is to use Markdown comment, for example to denote category like this:
[//]: # "programming"
However, this approach is not ideal. Since there are multiple ways to represent comment in a Markdown document, the parser doing the Markdown parsing has to understand all the used comment syntaxes. This is not much of a problem when only a single person is writing the blog, but could become problem in the future, should newcomers join the team.
The second problem with the comment approach is that the metadata should be parsed further. If only simple data are used, for instance just a category, like in the example above, writing a very naïve parser should not take more than a few lines and call it done.
The moment we want to insert multiple differing data structures there, like it may pay off to simply delegate this task to the dedicated parser. Consider the following example:
[//]: # "programming" [//]: # "yaml, markdown, metadata"
Still relatively trivial to create parser.
- Get everything in the first line after the colon as a category
- Get everything in the second line after the colon and split it by the comma as tags
But it starts to get pretty complicated too soon. Now the third line is the title. We need to remember to split only the tags by a comma, not the title:
[//]: # "programming" [//]: # "yaml, markdown, metadata" [//]: # "A simple, yet powerful parser"
If we wanted to make order of the metadata irrelevant, we could complicate it further:
[//]: # "Title: A simple, yet powerful parser" [//]: # "Category: programming" [//]: # "Tags: yaml, markdown, metadata"
But now we need to make sure there are no typos. Title looks similar to the Title, but now the parser would be confused. It could throw an exception if all the metadata was required and some of them would be missing, so typos could still be managed quite well. But then, what if some metadata were optional? Or worse, what if we wanted custom metadata? How would we denote if it is a string, like is the case of the title, or is it an array, like is the case with tags? What about the security concerns with XSS? What about testing all this? And so on.
YAML Ain't Markup Language
There is a punch within recursive acronyms. One of the well known ones that falls to this category is GNU, a recursive acronym for GNU's Not UNIX. As you can see, the acronym for YAML is also recursive. YAML is not a markup language, like XML or HTML would be, precisely because it is considered to be a data-serialization language.
Rewriting the last example into YAML would look like this:
Title: "A simple, yet powerful parser" Category: "programming" Tags: ["yaml", "markdown", "metadata"]
To make it a Front Matter YAML in a Markdown, we need to surround it with
--- title: "A simple, yet powerful parser" cathegory: "programming" taxonomies: tags: ["markdown", "yaml", "metadata"] ---
const fs = require("fs") const yaml = require("js-yaml") const unified = require("unified") const parse = require("remark-parse") const stringify = require("remark-stringify") const frontmatter = require("remark-frontmatter") const select = require("unist-util-select").select let tree unified() .use(parse) .use(stringify) .use(frontmatter, ["yaml"]) .use(() => t => (tree = t)) .process(fs.readFileSync("example.md")) const yamlNode = select("yaml", tree) const parsedYaml = yaml.safeLoad(yamlNode.value) module.exports = parsedYaml
The sources are available in the repository