jelford's blog

More styling

To make things not horrible, let's do a bit of styling:

Resize the page according to screen size

We're not going to do anything very clever here - I just hate trying to read pages that span the whole width of my screen with text. It's not that easy to read (I still line-wrap at 80 characters in vim).

We can add the following to style.css:

body {
    ...
    margin-left: auto;
    margin-right: auto;
    max-width: 55em;
}

And we're done with widths. max-width won't have any effect on smaller displays, but when we open it up on a widescreen monitor, the text will stay within a reasonably widthed area, in the middle of the viewport.

Prettify the code

pandoc supports code highlighting, to some extent, using the following syntax (docs):

~~~ { .css }
...
~~~

That generates HTML like:

<tbody><tr class="sourceCode"><td class="lineNumbers"><pre>1
2
3
4
5
</pre></td><td class="sourceCode"><pre><code class="sourceCode css">pre code <span class="kw">{</span>
    <span class="kw">display:</span> <span class="dt">block</span><span class="kw">;</span>
    <span class="kw">background-color:</span> <span class="dt">#EEEEEE</span><span class="kw">;</span>
    <span class="kw">overflow-x:</span> <span class="dt">auto</span><span class="kw">;</span>
<span class="kw">}</span></code></pre></td></tr></tbody>

Unfortunately, this isn't actually super easy to style up in CSS - you're stuck with table-based layout (if you enable line numbers), and we'll have to do something about actually having some meaningful CSS to make the highlighting look good. Okay, we'll try something else (not writing our own generic lexer) - enter highlight.js.

Now, I do want to use the feature of pandoc where we can conveniently add css classnames to our fenced code blocks, so we won't turn the extension off entirely - but I'll add a --no-highlight line to my pandoc make tasks. That leaves the task of adding the highlighting up to highlight.js. We want something like (pseudocode):

var code = document.querySelectorAll('pre.sourcecode');
for (let block of code) {
    hljs.do_highlighting_please(...);
}

Since we're already thinking about doing our own initialization, we might as well take the time to put this work off the main thread; there's no need to block the whole browser while we apply styling. Luckily the highlight.js instructions come with a recipe for putting the work onto a worker thread:

// in the main script:
addEventListener('load', function() {
  var code = document.querySelector('#code');
  var worker = new Worker('worker.js');
  worker.onmessage = function(event) { code.innerHTML = event.data; }
  worker.postMessage(code.textContent);
})

// in worker.js:
onmessage = function(event) {
  importScripts('<path>/highlight.pack.js');
  var result = self.hljs.highlightAuto(event.data);
  postMessage(result.value);
}

That won't quite do for us; we've got more than one code block - so we need to adapt what's going on in the main script to handle that. We could just change

  var code = document.querySelector('#code');
  ...

to

  var code = document.querySelectorAll('pre.sourcecode');
  for (let block of code) {
    ...
  }

but it's not necessarily fine to just spawn an infinite number of worker threads. According to mdn, each worker spawns a real OS-level thread. Potentially, that's very expensive; and we don't really need loads of work going on in parallel - we're just trying to do some syntax highlighting!

Let's adapt the code so that we send all our requests to a single worker, and when we get the results back, they have a key that lets us put them in the correct code block:

// in js/blog-highlight.js
document.addEventListener('DOMContentLoaded', function() {
    /* no .sourcecode anymore; turns our all our pres are code, so it's redundant */
    var code = document.querySelectorAll('pre'); 
    var worker = new Worker('js/highlight-worker.js');

    worker.onmessage = function(event) { 
        let target = code[event.data.sourceid].lastChild
        target.innerHTML = event.data.content; 
        target.classList.add('hljs');
    };

    // Can't pass the actual nodes to the workers, so use the list index as a kind of key
    for (let i=0; i<code.length; ++i) {
        worker.postMessage({
            content: code[i].lastChild.textContent, 
            sourceid: i
        });
    }
});


// in js/highlight-worker.js
importScripts('/js/highlight.pack.js');

onmessage = function(event) {
    console.log(event);
    var result = self.hljs.highlightAuto(event.data.content);
    postMessage({sourceid: event.data.sourceid, content: result.value});
}

We'll have to add js/blog-highlight.js to both the front page, and the individual page templates. Once we've done that, there's just one more thing bothering me; how does it look for people without javascript? (hint: not good).

We can do something about that though; let's script up a very simple code pane style for clients without javascipt. In our HTML it'll look something like this:

<head>
    ...
    <noscript>
        <link rel="stylesheet" type="text/css" href="/styles/noscript.css">
    </noscript>
    ...
</head>
pre code {
    background-color: #F0F0F0;
    display: block;
    padding: 5px;
    border-radius: 5px;
}

At that point, arguably, we're done pretifying the code.

Adding a title

We already took the time to get pandoc to be aware of the page title (that's what generates our <h1> tags at the top of each post). Let's use the same capability to add <title> tags to our blog post <head>s.

In blog/pandoc_html_template.html.template:

<head>
    ...
    <title>$title$</title>
    ...
</head>

We'd better also add a similar thing in the top-level page, but that'll say something like:

    output_file.write('<title>jelford\'s blog</title>')

That's all we need to get a decent title up at the top of the screen.

One final thing I didn't mention at the start; I'm going to add a nav bar at the top of the individual blog entries. That makes it easier to get back to reading the main page from inside a blog:

<body>
    <header>
        <nav>
            <ul>
                <li><a href="/blog.html">blog</a></li>
                <li><a href="/">home</a></li>
                <li><a href="https://github.com/jelford">github</a></li>
            </ul>
        </nav>
    </header>
    <article>
        ...

Bringing things together as a blog

So far we've taken a simple approach to gluing everyting together, but it'd be good to have a couple of the niceties that make things seem like more than just... a wall of text on a page.

I can think of a couple of nice features that would make things look better:

I still want to actually write all the posts in simple markdown, but to get e.g. permalinks, we're going to need the blog posts to make sense as standalone HTML pages, and then when we bring them together in the front page, we'll need to strip out just the content. First thing: we'll have pandoc translate individual blog entries into standalone HTML pages

blog/%.html : blog/%.md 
    pandoc --email-obfuscation=javascript --self-contained --css=style.css --standalone $< -f markdown -t html5 -o $@

The --standalone argument instructs pandoc to make a full HTML file (complete with headers, meta, and so on), while the --self-contained flag instructs it to inline all the CSS. We don't necessarily want that in there permanently, but if you try to build without it, pandoc with get the path wrong (style.css sits in the root, but pandoc has no way to know that the output document will be within a folder; it uses the relative path from the current working directory).

That's enough to get us a reasonable output if we browse to the individual blog posts by name, but there are a few things left:

Let's address the first point first - it should be easy enough to add a link in the body of every blog post to its own path.

So far, we've just been using pandoc's default HTML template, plus a very simple stylesheet. The default HTML template has some useful bits in it; it gives us a <head> section with some reasonable <meta> tags, and if we take the time to inspect it, we'll see there's also some sort of shim to make things nicer for users with an older version of internet explorer.

It's time to replace it with our own template, which will be even simpler:

We do at least still want CSS to be passed in, so we'll make our new template with reference to the default. You can see that by running:

pandoc -D html

There's some stuff in there we might want about authors, and it looks like it'll try to generate some headers, title, and so on for us. That's all very nice, but to begin with let's keep things super-simple and strip it right down:

<!DOCTYPE html>
<html lang="en">
<head>
        <meta charset="utf-8" />
$for(css)$
        <link rel="stylesheet" href="$css$" type="text/css" />
$endfor$
</head>
<body>
<article>
$body$
</article>
</body>
</html>

What've we got here?

Now we've got a working standalone page that we can add to, let's get to adding in our permalinks. We'll modify the body section of our template to include a link to its "canonical" location:

<body>
<div><a href="$permalink$">permalink</a></div>
$body$
</body>

If let's put our new template in a file somewhere, and tell pandoc to use that instead of the default. The template file:

cat > blog/pandoc_html_template.html.template << EOF
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8" />
    <meta name="generator" content="pandoc" />
$for(css)$
    <link rel="stylesheet" href="$css$" type="text/css" />
$endfor$
</head>
<body>
<article>
<div><a href="$permalink$">permalink</a></div>
$body$
</article>
</body>
</html>
EOF

And then we update our Makefile:

blog/%.html : blog/%.md pandoc_html_template.html.template
    pandoc --email-obfuscation=javascript --self-contained --css=style.css --standalone --template=blog/pandoc_html_template.html.template $< -f markdown -t html5 -o $@

Notice we also added the template to the list of dependencies for each blog entry's HTML output. This ensures that when we change the template, our make script will pick up the changes and know that it needs to regenerate every blog post.

Let's give it a whirl, and see what we get:

permalink

Bringing things together as a blog

...

Ah, that doesn't look to good - the link sitting above the title. Darn. Looks like we might want to do something better with how we work with titles, so we can put content below them in our templates. Taking another look at pandoc's default template, it incldes the following lines:

$if(title)$
<div id="$idprefix$header">
<h1 class="title">$title$</h1>
...

The "title" in the case comes from metadata at the top of the file, which pandoc reads in, and then makes available in the template rendering context. You can get the details on what kinds of things we might want to put in meta- data blocks from the pandoc documentation. We're interested in the % title element. So far, I've been putting an H element at the top of every blog post, by just writing:

% Bringing things together as a blog
...

Maybe it's time to tell pandoc what the title of the post is, and then let it figure out how to render the titles. So, as part of adding this feature, I've gone through and replaced all the headers with metadata blogs; the lines above become:

% Brining things together as a blog

and our template body changes to:

<body>
<article>
<h1 class="title">$title$</h1>
<a href="">permalink</a>
$body$
</article>
</body>

I'm not going to go though manually changing the first line of every blog post; I'll just run:

find blog -name '*.md' -exec sed -e 's/^# /% /g' {} \;

to get all the <h1> elements.

Finally, we're still missing a crucial detail - all our permalinks are missing their href attributes (or rather, they're all blank). We need to pass in the relative file path to the template when we generate the page. That's one last tweak to our Makefile:

pandoc --email-obfuscation=javascript --self-contained --css=style.css --standalone --template=blog/pandoc_html_template.html.template -V permalink=/$@ $< -f markdown -t html5 -o $@

Hey presto - we get html that looks like this:

<body>
<article>
<h1>...</h1>
<a class="permalink" href="blog/styling_the_front_page.html">permalink</a>
<p> ...
</article>

Tidying up the main page

Now that we've made the changes we needed to add permalinks, it's time we clear up the main page. Let's stop the compile_blog step from pulling in extraneous HTML from the sub pages, for a start. The approach I'm going to take is to parse sub-pages and extract just their body elements. It's easy to do with the very adequate Beautiful Soup library. This is the first piece of third-party software I'm using so far for this project - I'll start a local virtualenv to keep my external dependencies tidily separated from the rest of my system:

sudo pip install --upgrade pip setuptools virtualenv wheel # update everything
virtualenv . --python=python3
. bin/activate
pip install --upgrade beautifulsoup4

I'll also add some entries to .gitignore to make that not be a huge pain. Helpfully, there's a github project that collects together commonly used .gitignore snippets here, or if you prefer to have someone do your text-file-gluing-togther for you, there's a neat wrapper around there here.

One more thing - I'll want to check in a list of dependencies:

pip freeze > requirements.txt
git add requirements.txt

Now we'll need to do

. bin/activate

whenever we start working on the blog, to set up our working environment correctly.

Right, back to the business of sorting out the front page. We'll modify compile_blog to extract just the <article> section of our sub-pages:

class BlogPost():
    ...
    def content(self):
        with open(self.file_path, 'r') as f:
            soup = BeautifulSoup(f.read())
        return soup.body.article

BlogPost.content() now returns a beautifulsoup HTML element - let's update the compile step to be aware of that, and apply any top-level styling:

def compile(...):
    ...
    for b in reversed(sorted(blog_posts, key=lambda b: b.timestamp())):
        post = b.content()
        post['class'] = 'blog_post' # add styling that applies at the top level
        output_file.write(str(post))

Styling the main page

So we're finally in a place where the main page has some reasonable markup on it, embedding individual blog posts, that can be linked to as standalone pages. Nice going. Finally, let's add some simple styling that links back to the site's main page, with some borders, and then let's call it a day. We'll come back and add in date information to individual posts next time maybe.

First, I'm just going to add a header to the top of every page. We'll modify the front page as follows:

def compile(output_file, blog_posts):
    ...
    output_file.write('<header><h1>jelford\'s blog</h1><nav><ul><li><a href="/">home</a></li><li><a href="https://github.com/jelford">github</a></li></ul></nav></header>')

That gives us the main links we'll want at the top, along with an originally named header. One last thing - add the CSS to get the links to flow horizontally:

nav ul {
    list-style-type: none;
    margin: 0;
    padding: 0;
}

nav ul li {
    display: inline;
    margin-right: 0.3em;
}

Building a better blog: organising output

In the last entry, I walked through using a simple make-file to generate the HTML for a simple blog - it will convert a collection of markdown files into HTML snippets, then concatenate them into a single "blog" page.

In this entry, we'll build on that by managing multiple entries more in a nicer way:

Faster feedback

See it in a browser

First, let's get ourselves set up with a fast feedback loop. We'd like to see fully-rendered output, as it'll look to readers, in a browser, immediately as we make updates.

We can open up out rendered output in a web browser easily enough:

firefox ./blog.html

... but most browsers will behave a little differently when we view file:/// URLs compared to if they were on the web. We can get the "real thing" without much extra hassle though:

python3 -m http.server --bind 127.0.0.1

will run a simple http server that serves up the current working directory. The --bind on the end instructs it to only listen for connections from localhost - it's not much, but it's probably best not to open up a socket to the whole world if we don't have to. If you're using an older version of python you can do:

python2 -m SimpleHTTPServer

Make all the time

If we were building a JavaScript app, we'd probably have set up a file watch by now. We should do that here:

watch make

That'll run make every couple of seconds. It's not as event-driven as we might be used to from inotify-type file watching, but it'll do us. Now, every time we save a file, we should see the update in our browsers as soon as we refresh.

See it straight away

One last thing: while we're working on things, let's just have our browser auto-refresh the page.

We'll put this in the <head> of our HTML:

<meta http-equiv="refresh" content="5" >

So our final make step now looks like this:

blog.html : $(blog_objects) 
    echo '<html><head><link rel="stylesheet" type="text/css" href="style.css" ><meta http-equiv="refresh" content="2" ></head><body>' > blog.html 
    cat $(blog_objects) >> blog.html 
    echo '</body></html>' >> blog.html

Now, nobody like pages that auto-refresh while they're looking at them, so I'd like to take this out before pushing anything up, but there are a couple of things that mean it wouldn't be the worst thing ever if we forget:

Which is kind of nice...

Move our compilation into a script

I don't like writing raw bash (or makefiles, for that matter) any more than the next person. Let's move our bash step into a separate script:

blog.html : $(blog_objects)
    ./compile_blog $(blog_objects)

--- ./compile_blog ---
#! /usr/bin/env sh
echo '<html><head><link rel="stylesheet" type="text/css" href="style.css" ><meta http-equiv="refresh" content="2" ></head><body>' > blog.html 
cat $@ >> blog.html 
echo '</body></html>' >> blog.html

Phew. That's better. Okay, now let's move to using a nicer language than sh for manipulating files. We'll use python, since that's what I like.

#! /usr/bin/env python3

import argparse
import sys

def parse_args(args):
    parser = argparse.ArgumentParser(description='Combine blog posts into a single page')
    parser.add_argument('output_file', type=argparse.FileType('w'))
    parser.add_argument('input_files', nargs=argparse.REMAINDER)

    args = parser.parse_args(sys.argv[1:])
    return args.output_file, args.input_files


if __name__ == '__main__':
    output_file, input_files = parse_args(sys.argv[1:])
        
    output_file.write('<html><head><link rel="stylesheet" type="text/css" href="style.css" ><meta http-equiv="refresh" content="2" ></head><body>')

    for entry_path in input_files:
        with open(entry_path, 'r') as blog_entry:
            output_file.write(blog_entry.read())

    output_file.write('</body></html>')

Finally, let's make it so that our main page re-compiles any time we make changes to its build script:

blog.html : $(blog_objects) compile_blog
    ...

Ordering our blog posts

Now we're in proper programming land, we can start to be more sophisticated with how we work with our files.

Let's start by moving them into some sort of very basic domain model:

class BlogPost():
    def __init__(self, file_path):
        self.file_path = file_path

    def content(self):
        with open(self.file_path, 'r') as f:
            return f.read()

def compile(output_file, blog_posts):
    output_file.write('<html><head><link rel="stylesheet" type="text/css" href="style.css" ><meta http-equiv="refresh" content="2" ></head><body>')

    for b in blog_posts:
        output_file.write(b.content())

    output_file.write('</body></html>')

def compile_files(output_file, input_paths):
    compile(output_file, (BlogPost(p) for p in input_paths))

if __name__ == '__main__':
    compile_files(*parse_args(sys.argv[1:]))

Now we've got a model, we can sort the blog posts by the details - in this case, I want the time the file was create (or, added to git, as a reasonable proxy, since I want it to survive renamed, checkouts elsewhere, that sort of thing).

import subprocess
import datetime
import timezone


class BlogPost():
    ...
    def timestamp(self):
        timestamp = subprocess.check_output(['git', 'log', '--diff-filter=A', '--pretty=%aD' , '--', self.file_path]).decode().strip()
        return datetime.strptime(timestamp, '%a, %d %b %Y %H:%M:%S %z') if timestamp \
            else datetime.now(tz=timezone.utc) # if the file's not yet in git

...

def compile(...):
    ...
    for b in reversed(sorted(blog_posts, key=lambda b: b.timestamp())):
        ...

Finally, let's put something around each post, so it's easier to see when one post stops and the next one starts:

def compile(...):
    ...
    for b in ...
        output_file.write('<section class="blog_post">')
        output_file.write(b.content())
        output_file.write('</section>')

And we'll put something in style.css to make that visible:

section.blog_post {
    border-top: 1px solid grey;
}

Building a static blog with Markdown, Make, and Python

Background

I wanted to build a simple way to publish blog posts. I'm happy without any server-side gubbins; all I want it a simple place to put some static content, so github-pages is fine.

The other must for me is to be able to use markdown to write in. I've tried just doing raw HTML in the past, and frankly it doesn't fill me with joy.

Github has a tool built in, which you can install locally and test out - Jekyll - but I wanted to use pandoc for compiling my markdown to HTML, and besides I don't have ruby installed.

Managing the build

All the build needs to do is:

Sounds like a job for make. Besides, that gives me a chance to brush up on Makefiles.

The rest of ths post goes through each of those steps, one by one. For reference, this whole site is checked in as a github page; you can easily check out the source for yourself here.

Gathering the markdown files

Make has built-in macros for grouping together a series of files from one place into a build target:

blog_sources := $(wildcard blog/*.md)
blog_outputs := $(patsubst %.md,%.html,$(blog_sources))

Those will translate to something like:

blog_sources := page_1.md this_blogpost.md that_blogpost.md
blog_outputs := page_1.html this_blogpost.html that_blogpost.html

Using these macros, we can define a set of make rules to compile the blog pages, and then generate a single front-page that combines them together for easy browsing:

blog.html : $(blog_outputs)
    combine_pages $(blog_outputs)

$(blog_outputs) : $(blog_sources)
    convert_to_html $< $@

So then we just have to implement the conversion and combining steps.

Converting to HTML

I'll be using pandoc to convert markdown files to HTML. This couldn't be eaiser:

blog/%.html : blog/%.md 
    pandoc --email-obfuscation=javascript $< -f markdown -t html5 -o $@

Notice this doesn't use $(blog_outputs) or $(blog_sources). I found make was desperate to do all the $(blog_outputs) at once in the case they were the make target (makes sense), so this is a pattern rule.

Combining

Here's the simplest implementation I can think of for creating an easy-to-browse front page from all the inputs:

blog.html : $(blog_objects)
    echo '<html><head><link rel="stylesheet" type="text/css" href="style.css" ></head><body>' > blog.html
    cat $(blog_objects) >> blog.html
    echo '</body></html>' >> blog.html

That'll just concatenate all the pages together (in any old order) into one long page with all the content. Pretty spartan, but it does the job. Just one more thing so we don't all hate ourselves every time we load the page: adding a style sheet:

blog.html : $(blog_outputs)
    echo '<html><head><link rel="stylesheet" type="text/css" href="style.css" ></head><body>' > blog.html
    cat $(blog_objects) >> blog.html
    echo '</body></html>' >> blog.html

Over the course of the next few blog posts, I'll about adding nicities on top of this starting point: