Unrolling the Loop



Unrolling the Loop can be considered the simple-but-effective optimisation method.

Usually, the operations done in a loop can be grouped into two or more. In this case, unrolling them speeds up the code.

For example, suppose you want to update the elements at even index to its plus one but decrement the elements at odd index by one, instead of having code like this:

1
2
3
4
5
6
7
for (int i = 0; i <= 100; i ++) {
  if (i & 1 == 0) {
    arr[i] ++;
  } else {
    arr[i] --;
  }
}
for (int i = 0; i <= 100; i ++) {
  if (i & 1 == 0) {
    arr[i] ++;
  } else {
    arr[i] --;
  }
}

You should unrolling the loop like this:

1
2
3
4
5
6
7
for (int i = 0; i <= 100; i +2) {
  arr[i] ++;
}
 
for (int i = 1; i <= 100; i +2) {
  arr[i] --;
}
for (int i = 0; i <= 100; i +2) {
  arr[i] ++;
}

for (int i = 1; i <= 100; i +2) {
  arr[i] --;
}

The loop unrolling is faster because unnecessary if-checks are avoided. The branching is costly especially if mis-predicted by the processors.

Loop unrolling sometimes can be referred to removing for-loops for known small loops. For example,

1
2
3
for (int i = 0; i < 4; i ++) {
  arr[i] = i;
}
for (int i = 0; i < 4; i ++) {
  arr[i] = i;
}

should be changed to:

1
2
3
4
arr[0] = 0;
arr[1] = 1;
arr[2] = 2;
arr[3] = 3;
arr[0] = 0;
arr[1] = 1;
arr[2] = 2;
arr[3] = 3;

This can be useful in multiplication of small-size matrix, where sizes of both matrix are known e.g. 2×2 times 2×4. In this case, unrolling the loop and multiply each element is a lot faster that removes the unnecessary loop (condition jumping)

Why do we need to unroll the loop? In a short word, to avoid condition mis-predict and cache miss due to condition checks in a loop.

–EOF (Coding For Speed) —

GD Star Rating
loading...
288 words Last Post: Hello world!
Next Post: Put the Most-likely If-checks in the Front

The Permanent URL is: Unrolling the Loop (AMP Version)

Leave a Reply