## December 28, 2006

### Finding the right one

Here's my way to find the position in a 32 bit word of the rightmost 1 bit. Back in late '96 I used this for certain purposes, other portable methods being too slow. The technique works equally well for 64 bit or larger word sizes. It uses the well known trick that given a word x, we can compute a word containing a single 1 bit in the position of the rightmost 1 bit of x by bitwise anding x with its arithmetic negation. Here's why: if I add a word to its bitwise negation, there are no carries and I get a word of all ones, which is -1 in standard two's complement representation of signed words. Why is this word of all ones the representation of -1? Well just imagine adding 1 onto it --- carries ripple all the way through the word and fall off the end, leaving a word of all zeros, which is 0. Just as 999 behaves like -1 on a three digit decimal counter.

Now the arithmetic negation of a word, on the other hand, obeys this obvious rule.

Comparing the two, we see the relation between the two kinds of negation.

So here's what happens when you add one onto the bitwise negation to get the arithmetic negation, followed by bitwise anding that with the original word, illustrated by an example.

A non-zero word consists of an arbitrary sequence of bits followed by the last 1 bit, followed by all zeros. (In the example there are two such zeros.) So its bitwise negation consists of the negation of the arbitrary sequence of bits followed by a 0 bit followed by all ones. Note that if we anded the logical negation with the original word we would get zero, because in each bit position we would be anding together a 0 bit with a 1 bit giving a 0 bit. However, now we add 1 onto the bitwise negation first, to get the arithmetic negation. This has the effect of causing a carry to ripple through the sequence of ones at the right hand end, putting a 1 bit into the position of the 0 bit to the left of that sequence, which is in the position of the rightmost 1 bit in the original word. So the arithmetic negation consists of the negation of the arbitrary sequence of bits followed by a 1 bit followed by all zeros. Clearly anding it with the original word causes the arbitrary leading bits to be anded with their negations, thus making all upper bits of the result be 0, and the zeros on the right hand end lead to zeros in the result. So the only bit position that is 1 in both is that of the last 1 in the original pattern. So we may conclude that

where k is the position of the rightmost 1 bit in the word x, with position zero defined to be the rightmost position.

Now to the real problem --- how do we find k given this? We could keep shifting it to the right until we get 1, but that looks clumsy and inefficient. Instead we use some mathematics! The integers modulo a prime number p form a finite field, and the multiplicative group in any finite field is cyclic. Here is a concrete example where p=7 of what this all means. First the multiplicative group of integers modulo 7 is just the multiplication table of the possible non-zero remainders when dividing by 7. The term "modulo 7" just means "ignoring multiples of 7", so everything is a remainder on dividing by 7. We throw out zero because it behaves badly with multiplication. The result is as follows.

This is the "multiplicative group of the integers modulo 7", and it is cyclic, which means that there is an element that generates all of the others. In this case 3 does the job, as follows.

The powers of 3 cycle through all values. This does not work for 2 in this case. We say that 3 is a generator and two is not, for modulo 7 multiplication.

Now for my little trick: if we choose a prime p bigger than the word size (32 bits say) where 2 is a generator, then modulo p the different powers of 2 that appear when we and a word with its arithmetic negation will all be different modulo p. 37 is such a prime (for 32 bit words). So the value of the following expression has 32 distinct values, all less than 37, for each different position of the last 1 bit. Here "mod 37" just means "take the remainder upon dividing by 37", sometimes written "% 37" in some programming languages.

So if we create a constant array with a maximum index of 36 that contains a table as follows, we may use it for table lookup of the position of the rightmost 1 bit, given the value of the previous expression. The array should contain the following values, which are discrete logarithms modulo 37.

Note that it is the fact that values are now small (less than 37 in fact) that makes the array practicable. Now we may write the final expression for the position of the rightmost 1 using this array.

If the word size is 64 or 128 or 256 bits, we may use the same method mutatis mutandis with the respective primes 67, 131 and 269.

## December 14, 2006

### Alternating three ways without branching

Suppose you want a variable to cycle between three different values in a loop, like this.

If this code is executed in a loop, then x will alternate between the values a, b, and c repeatedly. But how may we achieve the same effect without any branching? Tricky...here's a little scheme I've cooked up to do just that. It cheats by using two variables x and y. The mysterious "plus-in-a-circle" operation is bitwise exclusive-or (exor).

First, before the loop starts, initialize x and y, and precompute the magic constant s as follows.

Then inside the loop execute the following non-branching code.

Note that x cycles between the following three values, because of the self-inverse property of exor (see Alternating without a branch).

So if we exor x with the magic constant s we get a value that cycles between a, b, and c, because of the self-inverse property of exor, once again.

There's a secret connection between this scheme and the Fibonacci numbers...

## December 13, 2006

### Alternating without a branch

Suppose you want to achieve the effect of a variable repeatedly alternating between two values

(for example inside a loop) without using a branch. If the variables are just bit patterns such as integers or references (pointers) then you may use bitwise exclusive-or (exor) as follows, and eliminate the branch.

Of course, the exor of a and b can be precomputed outside the loop. This relies upon exor being not only commutative and associative, but also acting as its own inverse as follows.

Informally speaking, exor acts as its own inverse; it is both addition and subtraction in its domain.

Soon (after doing some real work) I will say how to alternate cyclically between three different values in a loop, without branching ... this is a little more amusing.