header

Optimizing Static Sites

As an excuse to play around with some web stuff, I pretended my website was getting so much traffic it was costing me an arm and leg to host it, and that my users complained it was too slow for them. Because my website is statically generated, optimizing it was pretty low hanging fruit, so I set out to do a few tweaks to optimize it, and wrote them down here for future reference.

These instructions are mostly geared towards the Nanoc site generator and the NGINX webserver, but should apply to other generators and server software as well. You can browse all the code in my Blog Skeleton repository.

Optimizing the content

The obvious first step is to optimize the actual content that takes up most of the space: images, scripts, and styles. To do this, I pipe the contents of all these files through different optimizers at build time:

  • For the JavaScript scripts, all the scripts are piped through UglifyJS, which compresses and mangles the code. To save several round trips to the server when loading the pages, I combine all scripts into one big all.js script, and only reference the amalgamated script from the pages:

    <script async src="/js/all.js" 
            type="text/javascript"></script>

    I’m using async to avoid the script blocking the rendering of the page.

    Nanoc comes with an UglifyJS filter out of the box, so all I needed here was a combined all.js.

  • For the CSS styles, I use a similar approach than for the scripts, except they are piped through CSSMinify before being combined into one all.css file. In order for the CSS to not block the rendering, I would have to use inline CSS styles, but this can’t be done transparently and requires a lot of work, so I didn’t go through the trouble.

    Besides a combined all.css, I also added a tiny custom Nanoc filter for minifying the CSS.

  • All the images are run through lossless image compressors. For PNGs, I use OptiPNG:

    optipng -out output.png -quiet input.png

    For JPEG, I use JPEGtran:

    jpegtran -copy none -optimize -progressive -outfile output.jpg input.jpg

    When the images are still too large, I manually feed them through lossy compressors once and save the result: for JPEG, I reduce the standard quality setting, for PNG, I feed it to TinyPNG (which, as far as I know, mostly uses pngquant). The image compressor is implemented as a custom Nanoc filter.

These automatic compilation steps cut down the biggest bandwidth consumers of the website, and speeds up delivery quite a bit.

Finally, all the textual content can be gzip-compressed by the server, giving the most efficient delivery. For NGINX, this just means ensuring gzip is turned on for all the types I use:

gzip on;
gzip_types text/plain text/css 
  application/json application/javascript
  text/javascript text/xml application/xml 
  application/xml+rss image/svg+xml;

Optimizing cache usage

After ensuring all the content is delivered efficiently, the next step is to avoid browsers having to download the content at all. The problem with turning on caching for files is to find the right time balance between the chance is that the given resource will change, and how often you want it to be requested. However, you can avoid balancing these factors by giving different versions your files unique names, in which case you can turn on infinite caching on them (also referred to as revving).

To do this, I added an extra build step in my site compilation, where it goes through all referenced files (images, scripts, styles, binaries), computes the checksum (e.g. SHA-1) of these files, renames them to <filename>-rev-<file-checksum>.<extension> (e.g. all-rev-f9ab8d1ae09.js), and updates all references to these files from other locations to the new filename. One note is that the order in which you process files is important here, because some revved files have references to other revved files (e.g. CSS files referencing images). The order in which I process things are: images, scripts, styles, and HTML files (that last one only to update the links, not to actually rename them).

Once this is in place, it’s just a matter of telling the webserver to cache all these files infinitely:

location ~* ".*-rev-\w{10}\..*$" {
  expires 365d;
}

Although we reached the goal of caching all the heavy assets as much as possible, there’s still some room for improvement. Sometimes, websites (mostly crawlers and indexers) store links to your images. When you update an image, the hash will change, and so the links will no longer work (unless you keep old versions around, which I don’t for space reasons). To still serve the latest version of the image in that case, I keep a copy of the file under the original name, and added a fallback rule to my NGINX config to first try the revved file, and then to fall back to the original name (of course, without cache headers, because this one can change):

location ~* "^(?<prefix>.*/(?<basename>[^/]*))-rev-\w{10}\.(?<suffix>.*)$" {
  expires 365d;
  try_files $uri $prefix.$suffix;
}

Alternatively, I could have returned a 307 redirect.

A second problem is that the referenced filenames now contain an ugly cb-f9ab8d1ae09. For images, styles, and scripts, this generally isn’t a problem, since they’ll never be user-visible. However, for files that are downloaded (in my case PDF files and binaries), this does end up visible. To avoid this, I added a special rule to my server that sends a Content-Disposition HTTP header, such that the downloaded file gets the original filename when being downloaded:

location ~* "^(?<prefix>.*/(?<basename>[^/]*))-rev-\w{10}\.(?<suffix>.*)$" {
  expires 365d;
  try_files $uri $prefix.$suffix;

  location ~* "\.(pdf|bz2)$" {
    try_files $uri $prefix.$suffix;
    add_header Content-Disposition 
      'attachment; filename="$basename.$suffix"';
  }
}

I implemented the revving step in a custom Nanoc deployer. I didn’t put it in the build step, because

  • It cross-cuts a lot of compile and route rules, cluttering up the code
  • It relies on Nanoc processing the files in a specific order, and I haven’t found a nice way to impose this order.
  • It makes the next step easier

Optimizing delivery

The final step in my optimization was to use a Content Delivery Network to get all the assets as quickly as possible to the site visitors. By putting these assets behind Amazon CloudFront, visitors can get all the assets quickly from a caching local server close to them.

A static website makes it very easy to use a CDN. All I had to do was create a new CloudFront distribution, point it to my website, and update my custom deployer to replace all the links to the assets by links to Amazon CloudFront (e.g. rewrite /blog/all-rev-f9ab8d1ae09.js into //d18sc3w29ndn46.cloudfront.net/blog/all-rev-f9ab8d1ae09.js). Since I already have a step rewriting links to these assets from the previous revving step, adding this was trivial. Also, since this is done in a pre-deploy step instead of a build step, I can still locally test my website without hitting external servers (which wouldn’t be able to cache the resources yet anyway, since they don’t have it yet).

There’s one pitfall that took me a bit longer than expected: it seemed that CloudFront wasn’t returning gzip-compressed content, even though the browser requested it. The reason here is that, by default, NGINX doesn’t compress content when requested by proxies, and CloudFront doesn’t compress content itself. Fortunately, the solution is simple: just tell NGINX to compress content for proxies as well:

gzip_proxied any;

Published by

Remko Tronçon

Software Engineer · Hobby musician · BookWidgets