Use Pandoc to convert your resume to Markdown and HTML

Theo Carney
5 min readOct 30, 2020

--

An easy way to convert your resume to Markdown for displaying on Github Gist and to HTML for use on your portfolio website

Recently I was applying to Microsoft Leap and noticed that the application asked for a resume in markdown on Github Gist. While I knew it was technically possible for me to convert my resume by hand to markdown — markdown is designed to be human readable, and I already do this to format articles for posting on Dev.to — I didn’t want to do that. It would take a long time to do by hand and I didn’t think I would be able to achieve an especially good looking result.

More importantly, my programmer spidey-senses started to tingle. Can I do this more lazily? The thought occurred to me that trying to use programming to delegate this task to a computer rather than just doing it myself might actually end up taking longer than just biting the bullet and formatting it by hand. However, my eyes were feeling lazy, and my mind was feeling programmy. I’d just eaten pasta for dinner. So the game was on.

I Googled around and found a program called Pandoc that allows you to convert between different markdown file types, including Markdown, HTML5, and MS Word docx (the full list is here).

Pandoc is a command-line tool so there is no GUI (graphic user interface) to assist you. However, the Pandoc website makes it easy to get started. There are instructions to install it here and beginner-friendly instructions here under Getting started, which explain 1) how to use the CLI and further down 2) how to use Pandoc itself for converting text directly in the terminal, as well as entire files.

Once you have Pandoc installed, using it is eerily and pleasantly reminiscent of invoking a typical JavaScript function, only we’re actually in the Terminal and (if you’re on a Mac) writing Bash.

We navigate to the directory that contains the file we want to convert to another filetype. For me this looks like this:

/Users/tcarney/Development/pandoc-test

My master version of my resume I keep as a Google Doc, since Google Docs has a nice feature “Version history” under “File.” This is convenient and facilitates version control for evolving documents like a developer’s resume (new projects in, old projects out), instead of having a dozen different resume files floating around.

So I download my resume as a docx file and drag it into the pandoc-test directory:

/Users/tcarney/Development/pandoc-test/TheoCarney-Resume.docx

To convert it from docx to markdown, the command is straightforward:

pandoc TheoCarney-Resume.docx -f docx -t markdown -s -o TheoCarney-Resume.md

To break this down into JS terms, pandoc is our function, TheoCarney-Resume.docx is the argument (this is the document we are passing in and converting), and everything else is just options/parameters. -f stands for “from,” -t is “to,” so we are going from docx format to markdown format.

“The -s option says to create a ‘standalone’ file, with a header and footer, not just a fragment,” (Getting started) and -o option stands for output and specifies the name of the file that Pandoc will output the new markdown text to. If you omit the -o, instead pandoc will just print the new markdown text directly in the terminal.

So if Pandoc were a JS function, I would think about it sort of like this:

function pandoc (inputFile, -f, inputFileType, -t, outputFileType, -s, -o, outputFile) {

let outputFile = inputFile.map (markdownElement =>

markdownElement.outputFileType()

)

return outputFile

}

Obviously this is pseudocode so I’m fudging a lot of details. In reality, on another page of Pandoc documentation, I noticed that the inputFile argument can be passed in at the end, after the options, so the order is not so rigid.

Since Pandoc’s default conversion is from markdown to html, I also speculated that under the hood, Pandoc might convert the inputFile first to markdown as an intermediate step, and then convert it from that to the outputFileType. However, that made me curious enough to check the Pandoc docs, which say: “Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other.”

From this I gather that the intermediate representation of a document may be something specific to the Pandoc program, and not an independent, common file type like markdown or XML. Interesting.

Another thing to note is that when I tried converting to HTML, I initially got an error prompting me to provide an additional title option to Pandoc, like so:

pandoc TheoCarney-Resume.docx -f docx -t html -s -o TheoCarney-Resume.html — metadata title=”Theo Carney Resume”

Lastly, I just want to highlight something which I learned and really got excited about in the process of doing this all. If you read the Pandoc docs you’ll notice that not all file types are equal before Pandoc. For instance, you can’t convert from PDF, only to it, using Pandoc.

While this may not be entirely related, when watching a ComputerPhile video weeks back while I was writing a post about HTML, I remember Professor David Brailsford talking about how critics of PDF complain that the way of storing the actual data of a PDF is essentially spaghetti code and awful to look at. The response to this by PDF fans, which Prof. Brailsford seemed to agree with, was something to the tune of “yes, but PDFs look really nice and are convenient for a lot of other reasons, so that’s alright.”

Thinking about this, it made me really appreciate that with Pandoc, docx files are a perfect starting point for converting to essentially anything. I wondered, how is this possible? Docx files had previously annoyed me, since sometimes they can be difficult to open, depending on the computer/OS you’re using.

I thought of the mystical tree data structure, with its oftentimes insane complexity to the human eye, but perfectly logical, parsable, unambiguous structure to the computer. This data structure had already captured my imagination while learning about HTML.

Sure enough, I learned that the x in docx just stands for xml: docx files aren’t really a single file, but are actually just a disguised zip file of xml files!

docx is almost as funny as Eric Andre

From this I inferred that what Pandoc is doing isn’t magic; instead it is just taking in an unambiguous tree of data with a certain formatting style, changing that formatting style, and outputting a new tree. There is no guesswork involved, just regular, stylistic changes. That might be an oversimplification, but for me, that was a big aha moment.

If you want to check out for yourself how docx files can be unzipped, on a mac terminal you can just run this command with any docx file:

unzip any-file.docx -d hey-docx-your-fly-is-unzipped

Unzipped files and subdirectories will be output into a new directory:

hey-docx-your-fly-is-unzipped

--

--

Theo Carney
Theo Carney

Written by Theo Carney

Software Engineer and Chinese Language Nerd

No responses yet