So I ran across something insanely cool today (at least if you're me), but it'll take some explaining. However, for the hardcore geeks in the audience, the following output sums up the story nicely:
...@ebexgs:~$ uname -a Linux ebexgs 2.6.18-3-686 #1 SMP Mon Dec 4 16:41:14 UTC 2006 i686 GNU/Linux
Looped 1000 times:
Used timeout of 50 microseconds.
Slept average of 7989.000000 microseconds.
...@vader:~$ uname -a Linux vader 2.6.22-2-686 #1 SMP Fri Aug 31 00:24:01 UTC 2007 i686 GNU/Linux
Looped 1000 times:
Used timeout of 50 microseconds.
Slept average of 56.000000 microseconds.
So let's back up a bit. If you've used practically any computer in the last twenty years (and if not, how exactly are you reading this?) you may have noticed that the machine is capable of doing several things at once. Say, playing an MP3 while loading a web page. That's not as simple as it sounds, because for most of that time, most personal computers had just one CPU that was capable of running just one instruction at a time. Thus your computer, all appearances to the contrary, was truly single-minded. This was visible in primitive operating systems like DOS: you could only run one program at a time. (And DOS really was primitive; this hadn't been true of "real" operating systems since the mid-60s at the latest.)
When more advanced operating systems (also, Windows) made it to personal computers, multitasking came with them. This is the juggling act that lets a computer appear to divide its attention between multiple users or programs. The operating system contrives this by running each program in sequence, switching between them many times a second. Just like the persistence of vision that keeps you from seeing the gaps between frames of a film, if the context switches come fast enough you never notice that each of your programs is actually spending most of its time frozen in place.
There are a number of ways you could do this, but almost all systems today use some kind of timer interrupt. The problem, of course, is that while your program is running, the operating system isn't; thus, how can the system manage the context switches? One way is to require that all programs occasionally give the operating system a chance to run -- such "cooperative" multitasking is unreliable since just one unhelpful program can jam up the works, but it is fairly simple and was used by early versions of both Windows and MacOS. Better to use a signal that periodically triggers the operating system, regardless of what the programs are doing. Conveniently, when IBM originally designed the PC they included a clock chip that can be programmed to send out just such a periodic signal. It ran at about 20 Hz, and was mostly intended to help DOS keep track of the time.
While not really ideal, this was already good enough to enable some "preemptive" multitasking; instead of hoping your programs all cooperate, the operating system could start and stop programs on its own, by using that clock tick to trigger a bit of its own code. And while the interrupt frequency has gone from 20 Hz to as much as 1000 Hz, this scheme has gone mostly unchanged over the last few decades.
This has some nasty side-effects. When computers now can run a billion or more instructions per second, a thousandth of a second is kind of an eternity. Usually that isn't too important, because now lots of other things can cause the operating system to kick in and stop or restart a program -- things like moving a mouse, or a packet arriving on the network, or a disk finishing a write -- and most programs spend most of their time waiting for events like that to happen. But if you have a program that needs to wait some arbitrary amount of time before doing something, rather than being woken up when an event occurs, with most operating systems there's no reliable way to do that precisely. For instance, say you're controlling a widget where you need to send some data, then wait a couple of milliseconds before sending more. On a modern computer a millisecond is a long time, but if your system's clock only ticks once every four milliseconds (fairly common), there's no way to do it. If your program gives up control and asks to be woken up in two milliseconds, the operating system itself probably won't kick in again to do the scheduling for twice that long. On the other hand, if you try to just spin your wheels for that time, there's a good chance the operating system will wake up in the meanwhile and take control away from you, only to return who knows when.
This bit me hard during one of my college jobs. I was working for a psychology lab that wanted software to measure peoples' reaction times with high accuracy in various situations, and these unpredictable few-millisecond delays made the task much more complicated than it needed to be. The simplest thing would, of course, be to program the timer to run faster (still involves hacking in the operating system's guts, but mostly in straightforward ways). However, there's a limit to how fast anyone wants one of these to run, since eventually the system would be spending all of its time running the operating system's scheduling code, and most programs don't need high-resolution timing. So in most systems the actual clock chip won't go more than a few times faster, anyway.
There's a much cleverer solution. Way back when, we all learned that the code to handle your clock interrupt had to be as simple as possible, so that it wouldn't take too much time. But a millisecond is forever these days, so you can do a bit more with it. Plus, there's lots of other hardware lying around that will wake up the operating system if something needs to happen. So, if you're willing to live without the regular tick, it turns out there's another mode you can use. While these clock chips might have a restricted range of tick frequencies, they're very precise, and besides sending periodic ticks they can also do one-shot countdowns. Rather than wake up every so many milliseconds, before finishing its work the operating system can instead look at a list of upcoming timeouts and set the clock chip like an alarm for the next one. If there's nothing to do for half a second, the system can go idle and save power, but if a program wants to get woken up a few microseconds from now, in this scheme the operating system can set its wake-up call for then.
In very recent versions the Linux kernel has started using this scheme, and in Debian testing the 2.6.22 kernel has the required bits enabled (NO_HZ turns off the regular tick, and HIGH_RES_TIMERS enables the wake-up calls). So in the output that I started with, I wrote a little program that tries to sleep for 50 microseconds and keeps track of how long it actually takes. Under the 2.6.18 kernel shipped with the Debian Etch release, each one takes about 8000 microseconds -- 160 times longer than the requested delay. (8 milliseconds, or two 250 Hz timer ticks -- hilariously, this loop takes half as long on a much slower computer with only one CPU.) Under the new kernel, 56 microseconds. Which is a feature I've been wanting for about eight years now, and is therefore awesome.