Recently in Tech Notes Category

Optimization

I'd always heard that compiler optimization1 is uniformly risk-positive -- i.e. turning on optimizations might make your code not run, but never the other way around.

On a seemingly-unrelated note, I'd always heard that you can't do proper tail calls in C, because it is so tightly wedded to the function call paradigm -- with the result that recursive algorithms can't be safely used in C.

The other day, I discovered that, taken together, both of the above statements are under some circumstances false.

Consider the following program:

#include <stdio.h>

int x;

unsigned long long int foo(unsigned long long int a) {
if(a % 2 == 0) ++x;
return a > 0 ? foo(--a) : a;
}

int main() {
unsigned long long int i = 2;
while(1) {
printf("Recursing %llu levels (got x = %d)\n", i, x);
x = 0;
foo(i);
i *= 2;
}
return(0);
}

Formally, you can see that foo() is recursive, so as the variable i gets bigger and bigger, the program should run forever, taking twice as long each time through. In practice, if you compile this program normally, the stack will overflow because it can only hold so many function calls, and the program will abruptly crash after a fraction of a second.

With gcc, turn on "-foptimize-sibling-calls" (which happens by default if you use "-O2" or higher), and the compiler will employ tail call optimization, meaning that the recursive bit (call a function and return the result) gets turned into an iterative operation (jump to a new function and let it return its own result as if it was a simple continuation of this function). As a result, that recursive call at the end of foo() doesn't take up an extra slot in the call stack, so the program really will run forever, exactly like it should. (Eventually the variable i will overflow and wrap around, so you'll get the wrong number, but the program will never crash. And since I used a 64-bit integer (that's what the "long long" thing means), you probably won't be around for the billions of years it will take your computer to actually count that high.)

1: For non-programmers, a compiler is the program that turns a programming language source code -- something that usually looks like written instructions or mathematical formulas -- into the binary instructions that a CPU can run as a program. Compiler optimizations are tricks the compiler can use, that result in binary instructions that don't exactly reflect what you wrote in the source code, but should have the same effect and run faster or take less space. C is one of the older programming languages, and one of the most widespread.

Time and the Modern Computer

So I ran across something insanely cool today (at least if you're me), but it'll take some explaining. However, for the hardcore geeks in the audience, the following output sums up the story nicely:

...@ebexgs:~$ uname -a Linux ebexgs 2.6.18-3-686 #1 SMP Mon Dec 4 16:41:14 UTC 2006 i686 GNU/Linux

...@ebexgs:~$ ./nanosleep
Looped 1000 times:
Used timeout of 50 microseconds.
Slept average of 7989.000000 microseconds.

...@vader:~$ uname -a Linux vader 2.6.22-2-686 #1 SMP Fri Aug 31 00:24:01 UTC 2007 i686 GNU/Linux

...@vader:~$ ./nanosleep
Looped 1000 times:
Used timeout of 50 microseconds.
Slept average of 56.000000 microseconds.

So let's back up a bit. If you've used practically any computer in the last twenty years (and if not, how exactly are you reading this?) you may have noticed that the machine is capable of doing several things at once. Say, playing an MP3 while loading a web page. That's not as simple as it sounds, because for most of that time, most personal computers had just one CPU that was capable of running just one instruction at a time. Thus your computer, all appearances to the contrary, was truly single-minded. This was visible in primitive operating systems like DOS: you could only run one program at a time. (And DOS really was primitive; this hadn't been true of "real" operating systems since the mid-60s at the latest.)

When more advanced operating systems (also, Windows) made it to personal computers, multitasking came with them. This is the juggling act that lets a computer appear to divide its attention between multiple users or programs. The operating system contrives this by running each program in sequence, switching between them many times a second. Just like the persistence of vision that keeps you from seeing the gaps between frames of a film, if the context switches come fast enough you never notice that each of your programs is actually spending most of its time frozen in place.

There are a number of ways you could do this, but almost all systems today use some kind of timer interrupt. The problem, of course, is that while your program is running, the operating system isn't; thus, how can the system manage the context switches? One way is to require that all programs occasionally give the operating system a chance to run -- such "cooperative" multitasking is unreliable since just one unhelpful program can jam up the works, but it is fairly simple and was used by early versions of both Windows and MacOS. Better to use a signal that periodically triggers the operating system, regardless of what the programs are doing. Conveniently, when IBM originally designed the PC they included a clock chip that can be programmed to send out just such a periodic signal. It ran at about 20 Hz, and was mostly intended to help DOS keep track of the time.

While not really ideal, this was already good enough to enable some "preemptive" multitasking; instead of hoping your programs all cooperate, the operating system could start and stop programs on its own, by using that clock tick to trigger a bit of its own code. And while the interrupt frequency has gone from 20 Hz to as much as 1000 Hz, this scheme has gone mostly unchanged over the last few decades.

This has some nasty side-effects. When computers now can run a billion or more instructions per second, a thousandth of a second is kind of an eternity. Usually that isn't too important, because now lots of other things can cause the operating system to kick in and stop or restart a program -- things like moving a mouse, or a packet arriving on the network, or a disk finishing a write -- and most programs spend most of their time waiting for events like that to happen. But if you have a program that needs to wait some arbitrary amount of time before doing something, rather than being woken up when an event occurs, with most operating systems there's no reliable way to do that precisely. For instance, say you're controlling a widget where you need to send some data, then wait a couple of milliseconds before sending more. On a modern computer a millisecond is a long time, but if your system's clock only ticks once every four milliseconds (fairly common), there's no way to do it. If your program gives up control and asks to be woken up in two milliseconds, the operating system itself probably won't kick in again to do the scheduling for twice that long. On the other hand, if you try to just spin your wheels for that time, there's a good chance the operating system will wake up in the meanwhile and take control away from you, only to return who knows when.

This bit me hard during one of my college jobs. I was working for a psychology lab that wanted software to measure peoples' reaction times with high accuracy in various situations, and these unpredictable few-millisecond delays made the task much more complicated than it needed to be. The simplest thing would, of course, be to program the timer to run faster (still involves hacking in the operating system's guts, but mostly in straightforward ways). However, there's a limit to how fast anyone wants one of these to run, since eventually the system would be spending all of its time running the operating system's scheduling code, and most programs don't need high-resolution timing. So in most systems the actual clock chip won't go more than a few times faster, anyway.

There's a much cleverer solution. Way back when, we all learned that the code to handle your clock interrupt had to be as simple as possible, so that it wouldn't take too much time. But a millisecond is forever these days, so you can do a bit more with it. Plus, there's lots of other hardware lying around that will wake up the operating system if something needs to happen. So, if you're willing to live without the regular tick, it turns out there's another mode you can use. While these clock chips might have a restricted range of tick frequencies, they're very precise, and besides sending periodic ticks they can also do one-shot countdowns. Rather than wake up every so many milliseconds, before finishing its work the operating system can instead look at a list of upcoming timeouts and set the clock chip like an alarm for the next one. If there's nothing to do for half a second, the system can go idle and save power, but if a program wants to get woken up a few microseconds from now, in this scheme the operating system can set its wake-up call for then.

In very recent versions the Linux kernel has started using this scheme, and in Debian testing the 2.6.22 kernel has the required bits enabled (NO_HZ turns off the regular tick, and HIGH_RES_TIMERS enables the wake-up calls). So in the output that I started with, I wrote a little program that tries to sleep for 50 microseconds and keeps track of how long it actually takes. Under the 2.6.18 kernel shipped with the Debian Etch release, each one takes about 8000 microseconds -- 160 times longer than the requested delay. (8 milliseconds, or two 250 Hz timer ticks -- hilariously, this loop takes half as long on a much slower computer with only one CPU.) Under the new kernel, 56 microseconds. Which is a feature I've been wanting for about eight years now, and is therefore awesome.

XML Tree Pruning

A little while back I had an XML document that I needed to prune down. Now tools like XPATH will easily let you pick out certain bits of an XML tree, but what I needed was kind of the complement: keep the document intact, but zap specific bits of it. Turns out that libxml2 provides an easy way to do this. For instance, in python:

import libxml2
doc = libxml2.parseFile("file.xml")
xpc = doc.xpathNewContext()
for n in xpc.xpathEval('/root/item|/root/folder[./title/text()!="Keeper"]'):
    n.unlinkNode()
doc.saveFormatFile("pruned.xml", True)

I.e. if I have an XML tree with a root element containing a bunch of item and folder nodes, this will toss out all of them except the folder I want to keep (here, the one with title "Keeper"). The procedure should be about the same from any language that can use libxml2.

About this Archive

This page is an archive of recent entries in the Tech Notes category.

ScavHunt 2008 is the previous category.

The Great Outdoors '05 is the next category.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.31-en