As I like all my external links to be valid, I wanted a linkchecker. The python based “linkchecker” looked like it was perfect for the job, so I installed it this morning and ran it.
linkchecker http://mylaptop.example.org/
It got stuck spawning 10 copies of the ikiwiki cgi. Each invocation of such takes about 10 seconds each (which worries me, long term), so let’s NOT check cgi urls:
linkchecker –ignore-url=”.cgi” http://mylaptop.example.org/Nex6/
That took 10 seconds (too long!), so I arbitarily upped it to 400 threads from the default of 10…
linkchecker -t 400 –ignore-url=”.cgi” http://mylaptop.example.org/Nex6/
this pegged my cpu at 110%, but then the local webserver started missing requests and stuff started timing out. 20 threads was also bad… I could tune up apache or try lighttpd at some point. 10, and excluding the cgi, seems optimum.
So then, just for grins, I decided to run this against my original blog still hosted on blogspot. It has - probably - 1000s of broken links.
linkchecker -t 400 –ignore-url=”.cgi” http://the-edge.blogspot.com
Ha. This (quite!) effectively executed a denial of service attack on my own blog. Blogspot (quite rightly) started throwing 503 errors at me. Don’t do this to other services, people!
ID 9979 URL `http://the-edge.blogspot.com/2004_06_27_archive.html' (cached) Name `06/27/2004 - 07/04/2004' Parent URL http://the-edge.blogspot.com/2005/04/vacillation-road.html, line 427, col 5 Real URL http://the-edge.blogspot.com/2004_06_27_archive.html Result Error: 503 Service Unavailable
Since I develop the Nex-6 blog and wiki locally, I can link check it locally on my own webserver and only DOS myself rather than a site on the internet… but I think fully checking my old blog, on blogspot, for errors, is going to take a VERY long time if I have to limit myself to 10 threads.
So how can I avoid doing a DOS of myself in the future? I can ratelimit incoming connections to the webserver itself with a iptables (and ip6tables) rule, and I can also (using apache) reject large numbers of requests from individual ip addresses. I don’t know if lighttpd has this feature or not. The core problem here is that although the main webserver is REALLY fast with the static content, the one cgi in the system is REALLY slow. It takes over 10 seconds to complete. I have that hanging off of another url than the default, so I can probably tarpit or ratelimit requests to it - via IPv6, anyway, at the OS level rather than at the webserver level. Unfortunately that won’t work for ipv4 connections as I only have one ip address available.
I have lighttpd running on the (relatively weak) [[/wiki/openrd]] box. Hmm… after I get mirroring setup, let’s see what happens if I SLAM that.
The whole post and publish thing was getting to me from a typing perspective, so I wrote a tiny little shell script called pp:
<p>#!/bin/sh
git commit -a -m “$*” && git push</p>
“pp” thusly creates a commit and pushes out the blog to the local server. ppp (who uses ppp anymore?) pushes the thing out to the main servers.