Static site generators make mirroring easy

After evaluating a couple more site generators - jekyll and lanyon - I decided that I would stick with ikiwiki for now, as it is written in perl - which I grok better than ruby and python.

Posts are written in markdown, so IF I decide later to switch to another system, it should be easier than the great blogspot migration I plan for my main blog - the-edge.blogspot.com. The number of things that are going to break on that conversion are going to HURT.

I dislike default templates and css for ikiwiki. They use up far too much vertical space.

However, I am loving the new shortness of my edit-revise-post-edit cycle, as well as the ability to have multiple fragmentary posts online on my laptop, as well as having control over the source code and templates so I can fix them when the urge strikes me. I am hoping my new system will look like:

1) Write outline in org mode. 2) When outline is done, convert to markdown format and put it in the appropriate posts directory for the blog I’m writing for. Link to that mdwn inside of org mode. 3) Edit/Post/Revise until it is done 4) Push out to the main server

One of the big advantages of continuing to use blogspot for my work is that they take care of the backups. Backups are good. I don’t ever want to lose work, but I don’t want to use blogspot anymore. By using git as the repository I can create an infinite number of copies of the blog and mirror it to offsite and local machines, thus reducing the chance I’ll lose anything to zero.

So I setup name service for the two blogs in bind thusly in bind9:

the-edge        IN      AAAA 2001:4f8:3:36:2e0:81ff:fe23:90d3
                IN      AAAA 2002:4b47:9bd6:FFFF::1
                IN      A
                IN      A

nex-6 IN CNAME the-edge.taht.net.

edit.the-edge IN A IN AAAA 2001:4f8:3:36:2e0:81ff:fe23:90d3 edit.nex-6 IN A IN AAAA 2001:4f8:3:36:2e0:81ff:fe23:90d3

Readers pull the blog from any of the mirror servers under either IPv4 or IPv6, and under IPv6 they should pull it from the “closest” server - where if they have a 2001 address, they go to my native 2001 server out on the net, and a 2002, directly to the gateway in my house.

This has the latter property of making local reads to my blog and wikis - fast and local, too.

    root@toutatis:/etc/bind/taht# ping6 the-edge.taht.net PING the-edge.taht.net(mainmail.teklibre.org) 56 data bytes 64 bytes from mainmail.teklibre.org: icmp_seq=1 ttl=64 time=0.087 ms

(toutatis is my name server on the Net - and I have reverse lookups working on that one) The darn blogs and wikis are primarily for ME, anyway (call me selfish), so inside my network I get:
    d@cruithne:~$ ping6 the-edge.taht.net PING the-edge.taht.net(2002:4b47:9bd6:FFFF::1) 56 data bytes 64 bytes from 2002:4b47:9bd6::1: icmp_seq=1 ttl=64 time=1.16 ms

cruithne is my laptop, and at least temporarily, my internal IPv6 address is the above. I’m migrating to a new address shortly having ordered a fixed IP from comcast.

I could also use geodns on my main server, but haven’t set it up.

Problem 1

The first flaw in the mirroring scheme is that keeping accurate site statistics becomes hard. I’ll have to merge the server logs or universally use google analytics. However: I haven’t looked at google analytics in MONTHS and I haven’t seen the current state of the weblog analysis software… but I’m sure I can come up with something useable. If I care. Looking at google analytics is more depressing than useful.

Problem 2 - solved!

I setup an apache server on the main box, and a lighttpd server on the [openrd) home gateway, and got the sites mirrored.

I started off with the idea that I’d setup the ikiwiki compiler on both the (dual core) server and 1.2ghz openrd (arm based) box, git push the wiki database on the commits, and have both regenerate the wiki in parallel…

Um, er, ah, no. It takes about 2 minutes to regenerate the wiki on the openrd box, where 20 seconds is about all I could stand. (Even THAT is too much!) I may have missed a configuration option to lessen the workload, but….

So I switched to using rsync to update the mirror. It was NICE. It regenerates the website (currently) in 1.2 seconds FLAT, and barely tweaks the cpu meter on the openrd. It also means I don’t even have to have perl installed on the mirror either (or anything other than a webserver, really) This gives me some ideas towards using local web servers on my [/wiki/wisp6) for useful, local content.

Problem 3 - Fixed

Then I noticed that the pretty highlight plugin wasn’t working on the server box. (toutatis) :(

As I write, Toutatis has been up for a REALLY long time.

toutatis $: uptime 19:30:47 up 346 days, 1:25, 3 users, load average: 0.00, 0.01, 0.00

It’s running Ubuntu lucid, and it does so much stuff for me that updating it to a new release of ubuntu scares me to death.

I need to schedule the upgrade for a day when someone I trust is at the co-lo, and I have a plan for migrating all the code on it - I’m not heavily into compiling my own version of libhighlight and libhilight-perl, either. This is kind of a showstopper for now.

I have a new server coming up but it’s not installed yet.

Update: I bit the bullet and Installed a new apt repo, updated perl, added libhighlight, and rebuilt ikiwiki. I’m scared to reboot this box now.

Problem 4

I chose to use two different web servers for a reason. Lighttpd is precisely that - it’s light. It scales really well under workloads that apache needs tuning for, so I put that on the openrd box.

Apache… I TRUST. I also have apache on this box doing multiple other things - so switching out for lighthttpd wasn’t an option.

The problem I introduced by using two different web servers is that the etags calculation gets done differently in each. This makes web caching problematic as any load balancing web browser is going to hit both sites and invalidate its etag cache. This totally defeats my “speed at any cost” quest…

Since I’m now using rsync on the website itself, I THINK, but am not sure, that I can get etags to “do the right thing” based solely on the actual file size and modification time, and drop the inode entirely from the calculation.

But according to some old posts configuring lighttpd it may be best to remove etags from the equation and rely on expires headers only.

Find me elsewhere.

Best of the blog Uncle Bill's Helicopter - A speech I gave to ITT Tech - Chicken soup for engineers
Beating the Brand - A pathological exploration of how branding makes it hard to think straight
Inside the Internet Mind - trying to map the weather within the global supercomputer that consists of humans and google
Sex In Politics - If politicians spent more time pounding the flesh rather than pressing it, it would be a better world
Getting resources from space - An alternative to blowing money on mars using NEAs.
On the Columbia - Why I care about space
Authors I like:
Doc Searls
Jerry Pournelle
The Cubic Dog
David Brin
Charlie Stross
Eric Raymond
The Intercept
Chunky Mark
Dan Luu's rants about hardware design
Selenian Boondocks
Transterrestial Musings

February 10, 2011
1102 words

latency usability