Fast Power of Two Equal or Larger than a 32-bit Integer



I have seen a naive implementation at work today (written by a colleague). My colleague intends to generate a number that is power of two, and the number is the least small number that is equal or larger than the given number. The following code will compute a next largest power of two given a unsigned 32-bit integer if it is not itself power of two otherwise it will return itself. For example, w(2) = 2 and w(3) = 4.

1
2
3
4
5
6
inline unsigned int
w(register unsigned int x) {
    int a = 1;
    while (a < x) a <<= 1;
    return a;
}
inline unsigned int
w(register unsigned int x) {
    int a = 1;
    while (a < x) a <<= 1;
    return a;
}

The for loop is costly especially when input x is considerable large, the loop takes several cycle and each cycle shift the value to one position left (equivalent to multiply by two) until the value is equal or larger than the input. The implementation is straightforward, but kinda slow, and we will need to improve this using bit tweaking technique. The naive approach however has one advantage, which works for several data types. For example, if you change the input type from int to int64 and the same algorithm still works! To optimise for 32-bit integer only, we can use the following technique. The first is to get the most-significant-bit (MSB) for a 32-bit integer. The MSB is the highest numbered bit (the left-most bit) and it can be computed using SWAR algorithm by iteratively folding upper bits into lower bits. For 32-bit integer, the following implementation will compute the MSB.

1
2
3
4
5
6
7
8
9
10
inline unsigned int
msb32(register unsigned int x)
{
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
        return (x & ~(x >> 1));
}
inline unsigned int
msb32(register unsigned int x)
{
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
        return (x & ~(x >> 1));
}

In C/C++, the keyword register is a hint to the compiler that the following variable is better to be stored using registers instead of memory, such as stacks, but with the state-of-the-art compilers, which do the optimisation well, then the keyword is as good as whitespace. The inline keyword is also a hint to the compiler to include the function code where it is needed without the function calling overhead. To get the next largest power of two, it is also based on SWAR algorithm except the last return statement.

1
2
3
4
5
6
7
8
9
10
inline unsigned int
nlpo2(register unsigned int x)
{
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
        return (x + 1);
}
inline unsigned int
nlpo2(register unsigned int x)
{
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
        return (x + 1);
}

With these two functions, we can check if the input is power of two by using (x && ((x & (x – 1)) == 0)) [see here],

1
2
3
4
5
6
7
inline unsigned int
h(register unsigned int x) {
    if (x && ((x & (x - 1)) == 0)) { // if power of two
        return msb32(x);
    }
    return nlpo2(x);
}
inline unsigned int
h(register unsigned int x) {
    if (x && ((x & (x - 1)) == 0)) { // if power of two
        return msb32(x);
    }
    return nlpo2(x);
}

Wait, the MSB is equal to itself when the number is power of two, therefore,

inline unsigned int
h(register unsigned int x) {
    if (x && ((x & (x - 1)) == 0)) { // if power of two
        return x;
    }
    return nlpo2(x);
}

Isn’t this faster, the comparison roughly shows that the improved version is around 200 times faster than the naive approach (of course, this depends on the input number), if the numbers to check are large, the speedup gain is much more obvious (the naive approach spends more cycles in the loop) but in general, the improved approach is faster because there is no loop involved and only one condition check. The rest is lighting fast using the binary operations, e.g. bit shifting, logic or etc.

Reference:

The Aggregate Magic Algorithms

–EOF (Coding For Speed) —

GD Star Rating
loading...
700 words Last Post: High Performance FillChar Method Comparison for 32 bit (every four bytes)
Next Post: How Many One's Between Number 1 to N ?

The Permanent URL is: Fast Power of Two Equal or Larger than a 32-bit Integer (AMP Version)

Leave a Reply