## Introduction

How many multiplications are needed to calculate \(x^6\)? The naïve way to do it is \(x^6 = x\cdot x\cdot x\cdot x\cdot x\cdot x\). That is five multiplications. But we can do better: \((x\cdot x\cdot x)^2\) only requires three. Using as few operations as possible is important for the efficient evaluation of expressions on a computer. Is there a general way to find the smallest number of multiplications needed to compute any given power \(n\)?

This problem is equivalent to finding shortest *addition chains*. An addition chain is a sequence of integers such that each one can be expressed as the sum of two prior elements of the chain. For example, 23 can be constructed through an addition chain of only 6 elements:

Correspondingly, \(x^{23}\) can be computed using 6 multiplications:

Notice that we used the already computed value of \(x^3\) not once, but twice: for \(x^5\) and for \(x^{23}\).

Computing the shortest addition chain for a given integer \(n\) is difficult, and is shown to be an NP-complete problem. The minimal length is given by OEIS A003313. Luckily, there are a few simple and practical ways to generate non-optimal but still short addition chains that we can use for computing powers.

### Practical applications

Some computer languages, like Fortran, often implement such optimizations for computing integer powers. Others, like C and C++, do not have a builtin exponentiation operator at all. To compute general powers in C, we can resort to the `pow(base, exp)`

standard library function. This, however, is designed to work with arbitrary real exponents and is usually much slower than direct multiplication. In fact, on my computer one call to `std::pow()`

with `double`

s takes roughly the same amount of time as 45 (!) elementary `double`

multiplications.

Once realizing this, many people will write small utility functions for efficiently computing specific powers, e.g.,

```
inline double pow4(double x) {
double x2 = x*x;
return x2*x2;
}
```

Doing this for each power is tedious. Below I show a general implementation using C++ templates that generates a short (though not always shortest) addition chain for computing any positive integer power.

```
template<unsigned N> struct power_impl;
template<unsigned N> struct power_impl {
template<typename T>
static T calc(const T &x) {
if (N%2 == 0)
return power_impl<N/2>::calc(x*x);
else if (N%3 == 0)
return power_impl<N/3>::calc(x*x*x);
return power_impl<N-1>::calc(x)*x;
}
};
template<> struct power_impl<0> {
template<typename T>
static T calc(const T &) { return 1; }
};
template<unsigned N, typename T>
inline T power(const T &x) {
return power_impl<N>::calc(x);
}
```

To compute e.g. \(x^6\), we can now simply call `power<6>(x)`

. Modern optimizing compilers will inline all recursive calls within the implementation of `power()`

and produce efficient machine code similar to what we would have gotten from writing `temp = x*x*x; result = temp*temp;`

directly.

How does the code above work? It breaks down the computation recursively:

- For \(n=2k\), it computes \(x^n = y\cdot y,\;\, y = x^{n/2}\), then continues with \(y = x^{n/2}\) recursively.
- For \(n=3k\), it computes \(x^n = y\cdot y\cdot y,\;\, y = x^{n/3}\).
- Otherwise it computes \(x^n = x\cdot y,\;\, y = x^{(n-1)}\).

This recursive breakdown does not always generate the optimal solution, but it is not much less efficient either. The following table shows the 17 cases of \(1 \le n \le 100\) for which the simple recursive algorithm uses more multiplications than the optimal one. In all but one case it generates only one additional multiplication compared to the optimal solution.

```
n optimal simple
-----------------------
23 6 7
33 6 7
43 7 8
46 7 8
47 8 9
59 8 9
66 7 8
67 8 9
69 8 9
77 8 9
83 8 9
85 8 9
86 8 9
92 8 9
94 9 10
95 9 11
99 8 9
```

In those cases where we need to squeeze out every bit of performance, we can still provide specializations for the cases where the simple recursive solution is not the shortest possible. For example,

```
template<> struct power_impl<23> {
template<typename T>
static T calc(const T &x) {
T x2 = x*x;
T x3 = x2*x;
T x5 = x3*x2;
T x10 = x5*x5;
T x20 = x10*x10;
return x20*x3;
}
};
```

Personally, I have not yet encountered the need to quickly compute 23rd or higher powers in an inner loop, so I do not include these specializations in my code.

For those who do need it, there are tables of optimal solutions, as well as a lot of other information on addition chains on Achim Flammenkamp’s website.

## Comments !