Counting the number of Leading Zeros for a 32-bit Integer (Signed or Unsigned)

To count the number of leading zeros for a integer, we can use the following intuitive (bruteforce) method to increment the number of zeros until a first one (from left to right) is met, using a loop.

unsigned leading_zero_naive(int x)
{
    unsigned n = 0;
    const unsigned bits = sizeof(x) * 8;
    for (int i = 1; i < bits; i ++) {
        if (x < 0) break;
        n ++;
        x <<= 1;
    }
    return n;
}

unsigned leading_zero_naive(int x)
{
    unsigned n = 0;
    const unsigned bits = sizeof(x) * 8;
    for (int i = 1; i < bits; i ++) {
        if (x < 0) break;
        n ++;
        x <<= 1;
    }
    return n;
}

This method can be easily adapted to various integer data types for 8, 16, 32, or 64 bits integer. Just change the data type of x and the code will still work.

It has a loop, and the loop will terminate once a ‘1’ is met. The parameter x is shifted one position to the left, if it is smaller than 0, (the most significant byte is one, the number is negative), then we end the loop.

In Hackers’ Delight, Henry Warren proposes the following method to count the number of leading zeros, specifically for 32-bit unsigned integers.

unsigned leading_zero(unsigned x)
{
    unsigned n = 0;
    if (x <= 0x0000ffff) n += 16, x <<= 16;
    if (x <= 0x00ffffff) n +=  8, x <<= 8;
    if (x <= 0x0fffffff) n +=  4, x <<= 4;
    if (x <= 0x3fffffff) n +=  2, x <<= 2;
    if (x <= 0x7fffffff) n ++;
    return n;
}

unsigned leading_zero(unsigned x)
{
    unsigned n = 0;
    if (x <= 0x0000ffff) n += 16, x <<= 16;
    if (x <= 0x00ffffff) n +=  8, x <<= 8;
    if (x <= 0x0fffffff) n +=  4, x <<= 4;
    if (x <= 0x3fffffff) n +=  2, x <<= 2;
    if (x <= 0x7fffffff) n ++;
    return n;
}

There are five conditions, which are time consuming, all the rest operations seem fast enough. However, since five conditions are always needed for any given 32-bit integers, it will suffer from cache miss if the condition is mis-predicted. Multiply x bya power of two each round. Allow the counter to step proportional to the logarithm (base 2).

How fast is this? we compare the two methods using a loop.

int main()
{
    unsigned n = 0;
    for (int i = 1; i < 999999999; i ++) {
        n += leading_zero(n); // 6.056
    }
    cout << n << endl;
    return 0;
}

int main()
{
    unsigned n = 0;
    for (int i = 1; i < 999999999; i ++) {
        n += leading_zero(n); // 6.056
    }
    cout << n << endl;
    return 0;
}

And,

int main()
{
    unsigned n = 0;
    for (int i = 1; i < 999999999; i ++) {
        n += leading_zero_naive(n);// 1.834
    }
    cout << n << endl;
    return 0;
}

int main()
{
    unsigned n = 0;
    for (int i = 1; i < 999999999; i ++) {
        n += leading_zero_naive(n);// 1.834
    }
    cout << n << endl;
    return 0;
}

Surprisingly enough, on GNU GCC4.8.1, Release Compilation (using CodeBlocks), the naive version runs 3 times faster than the second version that looks more sophisticated.

Experiment is important if you don't trust your analysis. Always test your solutions if you are not sure.

Thanks for Marc's suggestions, we can do a bit tweaking on the naive version.

The for loop can be actually improved a little bit by counting from top to bottom, to avoid access BITS everytime.

unsigned leading_zero_naive2(int x)
{
    unsigned n = 0;
    const unsigned bits = sizeof(x) * 8;
    for (int i = bits; --i; ) {
        if (x < 0) break;
        n ++;
        x <<= 1;
    }
    return n;
}

unsigned leading_zero_naive2(int x)
{
    unsigned n = 0;
    const unsigned bits = sizeof(x) * 8;
    for (int i = bits; --i; ) {
        if (x < 0) break;
        n ++;
        x <<= 1;
    }
    return n;
}

This is slightly faster.. since there is no comparison... However, the compilers get better and better, and will probably identify this stuff anyway...

For input = 0, we return the size of bits, otherwise, for non-zeros, we just need to check if x is smaller than zero (check for 1) and shift one position to the left at each loop, so

unsigned leading_zero_naive3(int x)
{
    unsigned n = 0;
    if (x == 0) return sizeof(x) * 8;
    while (1) {
        if (x < 0) break;
        n ++;
        x <<= 1;
    }
    return n;
}

unsigned leading_zero_naive3(int x)
{
    unsigned n = 0;
    if (x == 0) return sizeof(x) * 8;
    while (1) {
        if (x < 0) break;
        n ++;
        x <<= 1;
    }
    return n;
}

We get rid of the comparison straight away. This is again slightly faster than above.

For GCC Compiler, it provides a built-in function __builtin_clz() that will do the exactly same thing. And it is pretty optimised so it outperforms the above..

Well, please note that the above performance comparison is not always similar, it really depends on the input numbers. For bigger numbers, the naive version tends to stop and break earlier, which is faster than the Hacker's Delight version, which always takes 5 conditions checks.

For GCC Compiler, use __builtin_clz(), otherwise, choose the one that suits you best (analyse your input number range first).

--EOF (Coding For Speed) --

GD Star Rating
loading...

722 words Last Post: Using Faster Exponential Approximation
Next Post: High Performance FillChar Method Comparison for 32 bit (every four bytes)

The Permanent URL is: Counting the number of Leading Zeros for a 32-bit Integer (Signed or Unsigned) (AMP Version)

One Response

sonofusion82

__builtin_clz() intrinsic function usually use processor’s special instruction, so it is a lot faster than any C implementation unless the compiler is able recognize the pattern and optimize it out.
I usually like to use godbolt.org compiler explorer to see what the compiler is doing with my code.
I’ve copied your code to there and you can see how each implementation is being compiled. https://godbolt.org/g/VcW4Mo

CodingForSpeed.COM

Counting the number of Leading Zeros for a 32-bit Integer (Signed or Unsigned)

One Response

Leave a Reply

Related posts:

One Response

Leave a Reply