It is not enough that a program works *properly*, but it should also finish its job *timely* and with reasonable *resource requirements*. When analyzing these aspects of an algorithm, we talk about *time and space complexity*. If unspecified which one it is, the *time* complexity is usually implied.

Consider the following problem:

We define Fibonacci numbers as follows: $$F_n := \begin{cases} n, & n \in \{0,1\}, \\ F_{n-1} + F_{n-2}, & \text{otherwise}.\end{cases}$$ Compute $F_n$ for a given $n$.

Mathematically, the problem is trivial: we just have to apply the definition. For example, for $n = 5$, we get:

$$\begin{align*} F_n = F_5 &= F_4 + F_3 \\ &= (F_3 + F_2) + (F_2 + F_1) \\ &= ((F_2 + F_1) + (F_1 + F_0)) + ((F_1 + F_0) + 1) \\ &= (((F_1 + F_0) + 1) + (1 + 0)) + ((1 + 0) + 1) \\ &= (((1 + 0) + 1) + (1 + 0)) + ((1 + 0) + 1) \\ &= 5. \end{align*}$$Of course, we can do so for any $n$, but the time this takes might not be very encouraging. The following table shows the results and the timings of an algorithm that is a copy of the above definition (`fib_def`

) and the simple, yet fast one similar to the one we've seen in the lecture about algorithms and variables (`fib_fast`

).

n | fib_def(n) | fib_fast(n) | time(def) | time(fast) |
---|---|---|---|---|

1 | 1 | 1 | 0.0000029 | 0.0000012 |

2 | 1 | 1 | 0.0000021 | 0.0000014 |

3 | 2 | 2 | 0.0000029 | 0.0000014 |

4 | 3 | 3 | 0.0000033 | 0.0000017 |

5 | 5 | 5 | 0.0000050 | 0.0000017 |

6 | 8 | 8 | 0.0000074 | 0.0000019 |

7 | 13 | 13 | 0.0000112 | 0.0000021 |

8 | 21 | 21 | 0.0000167 | 0.0000021 |

9 | 34 | 34 | 0.0000262 | 0.0000024 |

10 | 55 | 55 | 0.0000412 | 0.0000026 |

11 | 89 | 89 | 0.0000660 | 0.0000029 |

12 | 144 | 144 | 0.0001061 | 0.0000029 |

13 | 233 | 233 | 0.0001805 | 0.0000029 |

14 | 377 | 377 | 0.0002797 | 0.0000031 |

15 | 610 | 610 | 0.0004530 | 0.0000033 |

16 | 987 | 987 | 0.0007396 | 0.0000036 |

17 | 1597 | 1597 | 0.0011926 | 0.0000038 |

18 | 2584 | 2584 | 0.0019245 | 0.0000041 |

19 | 4181 | 4181 | 0.0030890 | 0.0000041 |

20 | 6765 | 6765 | 0.0049942 | 0.0000043 |

21 | 10946 | 10946 | 0.0080504 | 0.0000045 |

22 | 17711 | 17711 | 0.0130370 | 0.0000050 |

23 | 28657 | 28657 | 0.0210469 | 0.0000055 |

24 | 46368 | 46368 | 0.0362742 | 0.0000083 |

25 | 75025 | 75025 | 0.0549119 | 0.0000072 |

26 | 121393 | 121393 | 0.0912960 | 0.0000086 |

27 | 196418 | 196418 | 0.1439888 | 0.0000074 |

28 | 317811 | 317811 | 0.2361903 | 0.0000083 |

29 | 514229 | 514229 | 0.3863642 | 0.0000093 |

30 | 832040 | 832040 | 0.6122549 | 0.0000100 |

31 | 1346269 | 1346269 | 0.9895782 | 0.0000112 |

32 | 2178309 | 2178309 | 1.5994761 | 0.0000093 |

33 | 3524578 | 3524578 | 2.5780375 | 0.0000100 |

34 | 5702887 | 5702887 | 4.1696584 | 0.0000100 |

35 | 9227465 | 9227465 | 6.7392943 | 0.0000098 |

36 | 14930352 | 14930352 | 10.9062860 | 0.0000100 |

37 | 24157817 | 24157817 | 17.6386068 | 0.0000098 |

38 | 39088169 | 39088169 | 28.5867844 | 0.0000098 |

39 | 63245986 | 63245986 | 46.3197675 | 0.0000119 |

40 | 102334155 | 102334155 | 75.2086654 | 0.0000117 |

As you can see, for $n = 40$, `fib_def`

takes over a minute to produce the result, while `fib_fast`

takes less than one milisecond. It is worth noting that small timings are not very precise, which is the reason why the numbers in the rightmost column are not ascending all the time, but the order of the magnitude is correct.

The number of steps that `fib_def`

takes grows quickly. On the same computer, computing `fib_def(100)`

would take approximately 6.7 million years, while `fib_fast(100)`

takes only 0.0203 miliseconds. Mathematics may not care, but anyone using your program (including you) will.

A program that was used to produce the above results can be downloaded here.

So, why does the computation by definition get so slow so quickly? Without much detail, this can be shown quite easily. Let us denote the number of additions (all of which we assume to require an equal amount of time) needed to compute $F_n$ as $T_n$. The exact value of $T_n$ is easy to obtain from the definition of $F_n$:

$$ F_n := \begin{cases} n, & n \in \{0,1\}, \\ F_{n-1} + F_{n-2}, & \text{otherwise},\end{cases} \quad\quad T_n := \begin{cases} 0, & n \in \{0,1\}, \\ T_{n-1} + 1 + T_{n-2}, & \text{otherwise}.\end{cases} $$Disregarding the trivial cases $n=0,1$, what can we say about the speed of growth of $T_n$? Obviously, $T_i > T_j$ for all $i > j$, i.e., $T$ grows. So,
$$T_{n+2} = T_{n+1} + T_n > T_{n} + T_{n} = 2T_{n}.$$
In other words, when $n$ grows **only by 2**, the time $T_n$ needed to compute $F_n$ **more than doubles!**

And, indeed, if you look at the time needed to compute $F_{38}$ and the time needed to compute $F_{40}$, the latter is more than double the former ($28.59$ seconds vs. $75.21$ seconds: $2.63$ **times** more time!).

This goes on at a similar rate and it would take (much) more than $2^{(100-40)/2} = 2^{30} \approx 10^9$ **times** longer to compute $F_{100}$ than it did to compute $F_{40}$ (using the same algorithm on the same computer, of course). A better estimate is roughly $2.6^{30} \approx 2.8 \cdot 10^{12}$ **times** more time to compute $F_{100}$ than to compute $F_{40}$.

Take a stopwatch, run the program, measure the time (just like we did above â˜º).

Why not like this?

The speed of a program's execution depends on many factors, among which the quality of an algorithm is just one. Here are just a few things that often affect this speed:

The hardware. Some processors will be able to do some operations directly, while others will have to simulate them. The differences in speed of communication between various components of the computer and even inside the processor itself are more than significant. There are also cache speeds and sizes, memory speed and sizes, etc.

The operating system. How well does it use all that the hardware that is available? How well does it parallelize and prioritize tasks, since there are other processes running along with your program?

The libraries used. A library is a collection of functions and other objects, often used as a black box. Simply using a different library can speed up or slow down a program significantly.

The programming language and even its implementation. An interpreter/compiler translates the commands to the so-called machine language. This can be done in many ways, with more or less overhead, with various levels of efficiency, etc. A Python program run on Windows may run faster or slower than the same program on Linux, even with indentical input data.

The data. When running a program, it is impossible to test all possible inputs. How representative is the data chosen for the timing? How much worse can the timings be for carefully chosen data?

Ideally, by counting the basic steps of the algorithm (not even a program, as this already depends on the choice of a language!). While that sounds simple enough, it is next to impossible to do for all but the simplest of algorithms. There are several reasons for this:

The basic steps (arithmetic operations, assignments, comparisons,...) are often "hidden" in modern languages.

The number of steps almost always depends on the data.

Here is a very simple example:

Write a function that takes one integer and returns its number of digits (return $1$ for $n = 0$).

Here are three solutions:

In [1]:

```
from math import floor, log10
def count_digits_log(n):
"""Return the number of digits of an integer n using logarithms."""
if n == 0:
return 1
return floor(log10(abs(n))) + 1
def count_digits_str(n):
"""Return the number of digits of an integer n using string len."""
return len(str(abs(n)))
def count_digits_loop(n):
"""Return the number of digits of an integer n using repeated division."""
if n == 0:
return 1
n = abs(n)
res = 0
while n > 0:
res += 1
n //= 10
return res
n = int(input("n = "))
print("count_digits_log({}) = {}".format(n, count_digits_log(n)))
print("count_digits_str({}) = {}".format(n, count_digits_str(n)))
print("count_digits_loop({}) = {}".format(n, count_digits_loop(n)))
```

Let us determine the number of the basic steps.

`count_digits_log`

We have a comparison and, if`n = 0`

, nothing more. Otherwise, we have an absolute value of an integer, a logarithm, and a rounding. While changing a sign and rounding a number are quite simple operations, computing a logarithm is definitely not. Whatever the steps of this function are, they are unlikely to be the same for all the programming languages, their implementations, and various libraries.`count_digits_str`

This whole function is based on how Python treats strings, which is very different between languages. Even if we ignore that, what does`str`

actually do? It is probably something similar to`count_digits_loop`

, because the numbers are memorized in the computer in binary (zeros and ones), so they cannot be directly split into decimal digits (zero through nine).

Somewhat surprisingly,`len`

is highly dependent on the language:- A string in C is marked by the special character at the end, meaning that its
`len`

(called`strlen`

) must check each character in the string to find its length. - In Pascal, the length is simply read from the zeroth character of the string (imposing the limit of 255 characters on the length of strings).
- In Python, it changed through the versions (the last change being introduced as recently as in Python 3.3, as proposed by PEP 393). You don't need to know these details; only that it is nowhere as simple as it looks.

- A string in C is marked by the special character at the end, meaning that its
`count_digits_loop`

This one is the most clean way in terms of counting basic steps, as we use only these operations (assignment, comparison, increment, division). However, notice that the`while`

loop has one step for each digit of`n`

, meaning that the total number of steps depends on the number of digits of`n`

.

Obviously, our approach is too pedantic. Relating the number of steps to the data itself is unnecessarily convoluted.

The first thing to do in order to simplify this process is to relate the number of the operations needed to execute an algorithm with the **size of the data**, instead of the data itself.

Now, what is *the size of the input*?

This is left to the one doing the analysis to decide, as it is a thing highly specific to the problem that the analyzed algorithms are solving.

Here are a few examples:

When we are doing something with a single number (like counting its digits), the only reasonable measures are the number itself and the number of its digits.

However, if we denote the number as $n$ and - for simplicity - asume that it's not a zero, then its number of digits is $\lfloor \log_{10} |n| \rfloor + 1$, so these two measures are clearly related and it doesn't really matter which one we use.For various list algorithms, the size of the list is a good measure.

It is a single number describing the size of the data while disregarding the individual elements of the list. Assuming that all the elements of the list have more-or-less the same size (say, all are numbers), the size of all the elements of the list together would be`length_of_the_list * size_of_an_element`

.

Note that if the elements radically vary in size, we might want to use some other measure. For example, a list of different sized lists is not described well enough by the number of those lists. Instead, a good measure might be the sum of lengths of all those lists or something similar.For algorithms dealing with matrices of order $n$, one could consider using the number of elements as a describing value. However, this is just $n^2$, which is clearly related to the order $n$ of the matrices, so we often use $n$ instead of $n^2$.

The most important thing to remember here is that **we are comparing different algorithms that solve the same problem (and take the same input)**. This means that we can choose the measure of the size of the input, but the choice must be the same for all the compared algorithms.

In other words, if we have two algorithms working with matrices of order $n$, we can take the input size to be either $n$ or $n^2$, but we must make the same choice for both of the algorithms. Talking $n$ for one algorithm and $n^2$ for the other will almost never produce a meaningful result.

This is very much like comparing lengths: two roads maintain the same ratio of lengths regardless of the measure we use (miles, kilometers, milimeters, light years,...).

From now on, we shall denote the size of the input as $n$, unless emphasised to be different.

Even with the simplified "observe only the size of the input" approach, it is very hard to find the exact number of operations needed to execute an algorithm.

For example, let's say that we have two algorithms, with the following number of operations for an input of size $n$:

Algorithm 1: $f(n) = 2^n + n^3$ operations,

Algorithm 2: $g(n) = 10^{10} n^7$ operations.

For $n = 1$, Algorithm 1 will need $3$ operations, while Algorithm 2 will need $10^{10}$ operations. So, it seems that the Algorithm 1 is faster, especially due to the ridiculously big constant $10^{10}$ in $g(n)$. However, this will only be true for $n \le 77$. For $n > 77$, Algorithm 1 will need more steps than Algorithm 2.

So, how best to compare functions $f$ and $g$ as the representatives of the complexity of Algorithms 1 and 2, respectively?

The main concern of this analysis is what happens with big data. There is not much concern whether a solution to some small problem is reached in a milisecond or two. But, there is a big difference between 75 seconds and a milisecond, or millions of years and miliseconds (as seen in the Fibonacci example).

For that reason, we don't really need the **exact number of steps**. Instead, we are usually concerned with the asymptotic behaviour of algorithms, i.e., in the **approximate number of steps** as the input size grows limitlessly ($n \to \infty$).

The most often used measure is described by the big O notation:

Let f and g be two functions defined on some subset of the real numbers. One writes

$$f(x) = O(g(x)) \text{ as $x \to \infty$}$$if and only if there is a positive constant $M$ such that for all sufficiently large values of $x$, the absolute value of $f(x)$ is at most $M$ multiplied by the absolute value of $g(x)$.

That is, $f(x) = O(g(x))$ if and only if there exists a positive real number $M$ and a real number $x_0$ such that

$$|f(x)| \le M |g(x)| \text{ for all $x \ge x_0$}.$$

In layman's terms, albeit a bit imprecise, this means that

$f(x) = O(g(x))$ if and only if there is $x_0$ after which $f$ grows at most as quickly as $g$.

In all the generality, this sounds confusing. However, it is quite simple:

$n = O(n)$,

$n = O(n+1)$,

$n = O(n-1)$,

$n-1 = O(n)$,

$n+1 = O(n)$,

$n = O(n^2)$,

$n^2 \ne O(n)$,

$n^2 + 10^{100}n = O(n^2)$,

$10^{10^{10}} n^{100} = O(2^n)$,

$2^n \ne O(10^{10^{10}} n^{100})$,

$10^{10^{10}} n^k = O(n^k) \text{ for all $k \in \mathbb{N}$}$,

$\log(n) = O(p(n)) \text{ for any non-constant polynomial $p$}$,

$p(n) \ne O(\log(n)) \text{ for any non-constant polynomial $p$}$.

These are very simple to read in a somewhat informal manner. For example, the last two:

Logarithm grows at most as fast as a polynomial, but a polynomial grows faster than a logarithm.

Let us denote $$f(n) = O(g(n)), \quad \text{but } g(n) \ne O(f(n))$$ as $O(f(n)) < O(g(n))$, i.e., that $g$ grows asymptotically strictly faster than $f$.

The following is true: $$\begin{align*} O(1) &< O(\operatorname{\underbrace{\log\cdots\log}_{k}}n) < \cdots < O(\log n) \\ &< O(\sqrt[a]{n}) < O(\sqrt[b]{n}) \\ &< O(p(n)) < O(q(n)) \\ &< O(2^n) < O(3^n) < \cdots \\ &< O(n!) \\ &< O(n^n), \end{align*}$$ where $k \ge 2$, $a > b$, and $p$ and $q$ are real polynomials such that $\deg p < \deg q$.

So, the algorithms with constant complexity ($O(1)$) are, quite expectedly, the fastest ones. The algorithms with the *logarithmic complexity* ($O(\log n)$) are considered better than those of the *linear complexity* $(O(n)$), which are better than those of a nonlinear *polynomial complexity* $(O(n^k)$) with them being better with a lower degree, and all of them are considered better than the algorithms of an *exponential complexity* ($O(b^n)$) with them being better with a smaller basis.

If algorithms $A_1$ and $A_2$ have respective complexities $f$ and $g$ such that $O(f(n)) < O(g(n))$, but $A_1$ is significantly slower for smaller inputs, it is desirable to use both of them, picking the right one depending on the data. So,

```
if an_appropriate_condition_about_input:
result = function_that_implements_A2(data)
else:
result = function_that_implements_A1(data)
```

A common example of this is matrix multiplication: Strassen algorithm is the fastest one to use, but on smaller matrices it is actually slower than the conventional matrix multiplication.

There is a theoretically faster matrix multiplication algorithm: Coppersmithâ€“Winograd algorithm. However, this one is not used in practice, as it pays off only on the matrices that are too big for modern computers.

Another example of combining algorithms is when some extra properties can be exploited. For example, the matrix algorithms that exploit some structure:

```
if is_symmetric(M):
eigenvalues = eigenvalue_solver_for_symmetric_matrices(M)
else:
eigenvalues = eigenvalue_solver_for_general_matrices(M)
```

Notice that the test `is_symmetric`

becomes a part of this new, combined algorithm, so its speed must be taken into account as well. If the test is too complex, it might not pay off to combine algorithms, but instead just use the more general one.

Of course, more than two algorithms can be combined.

**Problem:** Write a function that checks if an integer is prime or not.

Recall that a number is prime if it is greater than 1 and divisible only by 1 and itself.

In [2]:

```
def is_prime_def(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
if n < 2:
return False
for d in range(2, n):
if n % d == 0:
return False
return True
def is_prime_sqrt(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
if n < 2:
return False
d = 2
while d*d <= n:
if n % d == 0:
return False
d += 1
return True
n = int(input("n = "))
bool2text = { False: "is not", True: "is" }
print("is_prime_def says that number {} {} a prime.".format(n, bool2text[is_prime_def(n)]))
print("is_prime_sqrt says that number {} {} a prime.".format(n, bool2text[is_prime_sqrt(n)]))
```

Let us compare these two algorithms. First, a short description of their main loops:

In ** is_prime_def**, the main loops goes from $2$ to $n-1$ and checks if $n$ is divisible by any of these numbers. Stops and returns

`False`

if it is.In ** is_prime_sqrt**, the main loops goes from $2$ to $\lfloor\sqrt{n}\rfloor$ and checks if $n$ is divisible by any of these numbers. Stops and returns

`False`

if it is.Let us now compare the number of steps that each of these algorithms (programs, actually) perform:

If $n < 2$: both perform only 1 comparison.

If $n \ge 2$ is composite:

Denote the smallest positive divisor greater than 1 as $p$.: 1 comparison (`is_prime_def`

`n<2`

), $p-1$ assignments and increments (implicitly in`for`

), $p-1$ divisions (`n%d`

), and $2(p-1)$ comparisons (one implicitly in`for`

and`n%d == 0`

for each step of the loop).: 1 comparison (`is_prime_sqrt`

`n<2`

), $p-1$ assignments and increments`d+=1`

, $p-1$ multiplications`d*d`

, $2(p-1)$ comparisons (`d*d <= n`

and`n%d == 0`

for each step of the loop), $p-1$ divisions (`n%d`

).

If $n \ge 2$ is prime:

: 1 comparison (`is_prime_def`

`n<2`

), $n-1$ assignments and increments (implicitly in`for`

), $n-1$ divisions (`n%d`

), and $2(n-1)$ comparisons (one implicitly in`for`

and`n%d == 0`

for each step of the loop.: 1 comparison (`is_prime_sqrt`

`n<2`

), 1 assignment (`d = 2`

), $\lfloor\sqrt{n}\rfloor-1$ assignments and increments`d+=1`

, $\lfloor\sqrt{n}\rfloor-1$ multiplications`d*d`

, $2(\lfloor\sqrt{n}\rfloor-1)$ comparisons (`d*d <= n`

and`n%d == 0`

for each step of the loop), $\lfloor\sqrt{n}\rfloor-1$ divisions (`n%d`

).

So, which one is faster (i.e., has less steps)?

If $n < 2$: there is no difference.

If $n \ge 2$ is composite:

has $5(p-1)+1$ operations,`is_prime_def`

`is_prime_sqrt`

has $6(p-1)+2$ operations,

If $n \ge 2$ is prime:

`is_prime_def`

performs $5(n-1)+1$ operations,: performs $6(\lfloor\sqrt{n}\rfloor-1)+2$ operations.`is_prime_sqrt`

So, depending on $n$, one or the other is faster. However, notice that

$$6(p-1) + 2 \le 10(p-1) + 2 = 2(5(p-1)+1),$$so $6(p+1)+2 = O(5(p-1)+1)$. However,

$$6(\lfloor\sqrt{n}\rfloor-1)+2 = O(5(n-1)+1), \quad \text{but} \quad 5(n-1)+1 \ne O(6(\lfloor\sqrt{n}\rfloor-1)+2).$$In other words, if $n$ is composite, the algorithms have the same complexity; if $n$ is prime, `is_prime_sqrt`

has a strictly lower complexity than `is_prime_def`

.

**So, which is better on average?**

This is actually impossible to say for almost any practical purpose. How many numbers that we use these algorithms on will be prime and how many will be composite?

Statistics can be of help here, but only if we have a very solid information on the input data: the distribution of primes and composites in it, plus their sizes (as the number of steps depend on that.

In other words, to find an average number of steps, we'd need to first solve the problem, by which time it is too late to pick the faster of the two algorithms.

Luckily, it is more often useful to check the worst case than the average one, because we are interested in a guarantee of how fast our program will run.

This simplifies things a lot, because it is very clear that the worst case scenario in our example is to let both of the loops finish. In other words,

the (worst case) complexity of

`is_prime_def`

is $O(n)$, whilethe (worst case) complexity of

`is_prime_sqrt`

is $O(\sqrt{n})$,

which makes `is_prime_sqrt`

the "faster" of the two (not in all cases, but in those that are the most important).

So, at the end of the day, we are interested in **the order of the magnitude of the number of the operations that an algorithm performs in its worst case**. Even though the above analysis is lengthy and quite complicated, knowing what we actually need (and why), this is trivial to compute:

Everything outside of the loops performs once, so it doesn't affect the complexity.

Everything in the outermost loop is performed as many times as the loops has steps.

Everything in any loop directly inside the outermost one is performed $s_1 \cdot s_2$ times, where $s_1$ and $s_2$ are the number of steps in these two loops.

etc.

**Beware of the function calls!** Some functions may contain loops themselves and this must be accounted for!

So, what is usually enough to check is the number of runs of the innermost loop. However, be very careful that something more complex is not "hidden" somewhere. For example, let `matrix1`

and `matrix2`

be matrices of order `n`

. Observe the following algorithm sketch:

```
for i in range(n):
something = matrix1 * matrix2
for j in range(n):
something_else = number + number
```

Here, we have $n$ matrix multiplications and $n^2$ number additions. However, the number of the operations is **not** of the order of magnitude $n^2$, because the matrix multiplication has $n^3$ operations which, when called $n$ times, gives us order of maginutide $n^4$ of operations.

However, we are comparing algorithms here. We *can* count more complex operations (for example, matrix multiplications instead of number additions), but then we have to do the same for all the compared algorithms. In other words, we **can say**:

*Algorithm 1 is faster because it has $n^4$ operations and Algorithm 2 has $n^5$*, or*Algorithm 2 is faster because it has $n$ matrix multiplications and Algorithm 2 has $n^2$*,

but it is wrong to say

*Algorithm 1 is slower because it has $n^4$ operations and Algorithm 2 has $n^2$*matrix multiplications.

So, a quick estimate for our exampe above (i.e., how we usually do it):

In [3]:

```
def is_prime_def(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
if n < 2:
return False
for d in range(2, n):
if n % d == 0:
return False
return True
def is_prime_sqrt(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
if n < 2:
return False
d = 2
while d*d <= n:
if n % d == 0:
return False
d += 1
return True
n = int(input("n = "))
bool2text = { False: "is not", True: "is" }
print("is_prime_def says that number {} {} a prime.".format(n, bool2text[is_prime_def(n)]))
print("is_prime_sqrt says that number {} {} a prime.".format(n, bool2text[is_prime_sqrt(n)]))
```

Observation:

`is_prime_def`

: the loop has $n$ steps ($n-1$ actually, but it is the same order of magintude as $n$), each containing only basic operations (however many, as long as it is at most a constant, which is $5$ here).

`is_prime_sqrt`

: the loop has $\sqrt{n}$ steps (again, only the order of magnitude; it doesn't even matter that it's not an integer!), each containing only basic operations (however many, as long as it is at most a constant, which is $6$ here).

How we actually count:

`is_prime_def`

: the loop has $n$ basic operations.

`is_prime_sqrt`

: the loop has $\sqrt{n}$ basic operations.

Since $O(\sqrt{n}) < O(n)$, `is_prime_sqrt`

has a lower (computational) complexity.

Note that this doesn't say that `is_prime_sqrt`

cannot be made faster. By using a single `sqrt`

, we can remove `d*d`

(which repeats in all steps of the loop):

In [4]:

```
from math import floor, sqrt
def is_prime_sqrt(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
if n < 2:
return False
for d in range(2, floor(sqrt(n)) + 1):
if n % d == 0:
return False
d += 1
return True
n = int(input("n = "))
bool2text = { False: "is not", True: "is" }
print("is_prime_sqrt says that number {} {} a prime.".format(n, bool2text[is_prime_sqrt(n)]))
```

n = 1234577 is_prime_sqrt says that number 1234577 is a prime.

This one has the same complexity as the previous version, but it is somewhat faster.

As always, here is a more Pythonic way to check if a number is prime:

In [5]:

```
from math import floor, sqrt
def is_prime_py(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
return (n >= 2) and all(n%d != 0 for d in range(2, floor(sqrt(n)) + 1))
n = int(input("n = "))
bool2text = { False: "is not", True: "is" }
print("is_prime_py says that number {} {} a prime.".format(n, bool2text[is_prime_py(n)]))
```

n = 1234577 is_prime_py says that number 1234577 is a prime.

We say (just for the purpose of this problem) that two natural numbers $a \ne b$ are *friendly* if $a$ equals the sum of all natural divisors of $b$ except $b$ itself and $b$ equals the sum of all natural divisors of $a$ except $a$ itself.

**Problem:** For a given $n$ write all pairs of friendly numbers smaller than or equal to $n$, each of them once (say, $a < b$).

As one should always do, let us analyze the problem before writing any code.

Denote the sum of all divisors of $n$ except the $n$ itself as `divisors_sum(n)`

. This is a function that we can write quite easily.

What we now need is:

For each $a \in \{1,\dots, n\}$:

For each $b \in \{a+1,\dots, n\}$:

Check that $a = {\tt divisors\_sum}(b)$ and $b = {\tt divisors\_sum}(a)$; print if

`True`

.

The code should be trivial:

In [3]:

```
from time import time
def divisors_sum(n):
"""Returns the sum of all natural divisiors of a natural number `n` except `n` itself."""
res = 0
for d in range(1, n):
if n % d == 0:
res += d
return res
# More Pythonic version
def divisors_sum_py(n):
"""Returns the sum of all natural divisiors of a natural number `n` except `n` itself."""
return sum(d for d in range(1, n) if n % d == 0)
n = int(input("n = "))
st = time()
for a in range(1, n+1):
for b in range(a+1, n+1):
if a == divisors_sum(b) and b == divisors_sum(a):
print((a, b))
print("Done in {:.6f} seconds.".format(time() - st))
```

n = 500 (220, 284) Done in 3.520527 seconds.

**Remark:** Notice that `print`

has two pairs of braces:

```
print((a, b))
```

This is because we want to print a tuple, so the outer braces belong to `print`

, but the inner ones define a tuple `(a, b)`

that is printed in the form appropriate for printing pairs of numbers.

This is equivalent to

```
print("({}, {})".format(a, b))
```

but, obviously, much simpler.

The complexity of this algorithm is $O(n^3)$:

the

`a`

-loop runs`n`

times,the

`b`

-loop runs`n-a`

times for each step of the`a`

-loop, which gives the following total (during one run of the program): $$(n-1)+(n-2)+\cdots+1+0 = n(n-1)/2 = \frac{1}{2}n^2 - \frac{1}{2}n = O(n^2).$$`divisors_sum(x)`

has`x`

steps for each of the two calls in each step of the`b`

-loop. Just as in the previous item, this brings us to a grand total $O(n^3)$ steps.

The exact number of operations is, of course, smaller than $n^3$ (because the `b`

-loop doesn't run $n^2$ times, but "only" $n(n-1)/2 < n^2$ times). However, it asymptotically behaves like $n^3$, which means that the program will take approximately $2^3 = 8$ times longer for $2n$ than it takes for $n$.

Can we do better?

Notice the following:

For each number $a$, there is only one number that can possibly be its friend. That is, of course,

$$b := {\tt divisors\_sum}(a).$$

So, why check for all $b \in \{a+1,\dots,n\}$?

We can achieve the same like this:

For each $a \in \{1,\dots, n\}$:

Define $b := {\tt divisors\_sum}(a)$.

Check that $a < b$ and $a = {\tt divisors\_sum}(b)$; print if

`True`

.

Translated from English to Python:

In [4]:

```
n = int(input("n = "))
st = time()
for a in range(1, n+1):
b = divisors_sum(a)
if a < b and a == divisors_sum(b):
print((a, b))
print("Done in {:.6f} seconds.".format(time() - st))
```

n = 500 (220, 284) Done in 0.015062 seconds.

**Always analize your problem before writing any code that solves it!**

Sure, this can "lose" you a few minutes, but in the long run your programs will be faster, more compact, probably less buggy, and easier to maintain.

The analysis can get trickier than what we've seen above. In the following problem, a loose "worst case analysis" that we've seen above will not reflect how big the difference between two algorithms is, but it'll take some common sense (or a very hard formal analysis) to observe that difference.

**Problem:** Write a function that prints all the prime factors of a given integer, each of them once.

Mathematically, the solution is as follows:

Check all the numbers between 2 and $|n|-1$. For those with which $n$ is divisible, check if they are primes. If they are, print them.

Translated to Python, we get:

In [5]:

```
from time import time
from math import floor, sqrt
def is_prime_sqrt(n):
"""Returns `True` if an integer `n` is a prime and `False` otherwise."""
if n < 2:
return False
for d in range(2, floor(sqrt(n)) + 1):
if n % d == 0:
return False
d += 1
return True
def print_prime_factors_def(n):
"""Prints all the prime factors of `n`, each of them once."""
for p in range(2, abs(n)+1):
if n % p == 0 and is_prime_sqrt(p):
print(p)
n = int(input("n = "))
print("The prime factors of {} are:".format(n))
st = time()
print_prime_factors_def(n)
print("Total time:", time() - st)
```

n = 1234567 The prime factors of 1234567 are: 127 9721 Total time: 0.12981295585632324

What is the complexity of this algorithm?

Since we are using $|n|$ where the sign matters, we can assume that $n \ge 0$ without the loss of generality, in order to make the analysis easier to follow.

The function's loop always has $n - 2$ steps. Each step has a division and some have checks that the number is a prime. Those checks themselves have the complexity $O(\sqrt{p})$, which can be replaced by $O(\sqrt{n})$ to avoid dependance on the exact data.

If we were checking whether $p$ is prime or not for every $p$ (and not just for divisors of $n$), the complexity of our program would have been $O(n\sqrt{n})$.

However, we are doing this check only for the divisors of $n$, so our complexity is between $O(n)$ and $O(n\sqrt{n})$. In layman's terms, this means that our algorithm is slower than the linear one, but not by much. It remains open by how much.

For a different algorithm, let us analyze the problem a bit.

Let $n = s p_1^{k_1} p_2^{k_2} \cdots p_r^{k_r}$, where $s \in \{-1, 1\}$ is a sign, $p_i$ are primes such that $p_1 < p_2 < \dots < p_r$, and $k_i \in \mathbb{N}$, be a factorization of $n$ into prime factors.

What is the smallest number $x > 1$ such that $x | n$ ($x$ divides $n$, i.e., $n$ is divisible by $x$)?

It is, of course, $x = p_1$. Our algorithm doesn't need to verify that it is a prime, because we know that it is.

Now, let us define $n' := n / p_1^{k_1} = s p_2^{k_2} \cdots p_r^{k_r}$. What is the smallest $x > 1$ such that $x|n'$?

Of course, this is $p_2$ which is also a prime.

If we repeat this until nothing is left (i.e., we get $n' = s \in \mathbb \{-1, 1\}$), we will get all the prime factors of $n$.

Here is the code, with a minor change of dropping the sign at the beginning:

In [6]:

```
from time import time
def print_prime_factors_div(n):
"""Prints all the prime factors of `n`, each of them once."""
p = 2
n = abs(n)
while n > 1:
if n % p == 0:
print(p)
while n % p == 0:
n //= p
p += 1
n = int(input("n = "))
print("The prime factors of {} are:".format(n))
st = time()
print_prime_factors_div(n)
print("Total time:", time() - st)
```

n = 1234567 The prime factors of 1234567 are: 127 9721 Total time: 0.0015151500701904297

The worst case scenario here is $O(n)$, if the number is prime, which is only slightly better than the previous algorithm.

But, as the numbers grow, the ratio of the prime numbers in the set of all the observed numbers falls towards zero (see, for example, Prime number theorem).

Now, the more accurate estimate of the number of steps here is $O(p_r + k_1 + k_2 + \cdots + k_r)$, which is, on average, significantly smaller than $n$. We shall not go into detail, but consider numbers like $n = 2p$ or $n = 3p$ for some prime number $p$: for them, $p_r = n/2$ or $p_r = n/3$ respectively. This is still linear, but much smaller than $n$.

If we try to measure the times for the above problems, we shall notice that for big composite number (for example, 1234567) the second algorithm is faster (usually several times). However, for big primes (for example, 1234577), the second algorithm is a bit slower!

This can be further improved, at expense of some of the code elegance. For example:

In [10]:

```
from time import time
from math import floor
def print_prime_factors_div(n):
"""Prints all the prime factors of `n`, each of them once."""
p = 2
n = abs(n)
n2 = floor(sqrt(n))
while p <= n2:
if n % p == 0:
print(p)
while n % p == 0:
n //= p
n2 = floor(sqrt(n))
p += 1
if n > 1:
print(n)
n = int(input("n = "))
print("The prime factors of {} are:".format(n))
st = time()
print_prime_factors_div(n)
print("Total time:", time() - st)
```

n = 1234567 The prime factors of 1234567 are: 127 9721 Total time: 0.00019025802612304688

Note that the above algorithm with `sqrt`

is not reliable for truly large numbers. In Python, integers can be as big as we want (most of the other languages have a limit here), but real numbers have a fixed precision of roughly 16 digits. This means that the square root of any number with more than 32 digits will have an imprecise square root, and will thus be prone to errors.

To overcome this problem, one would have to implement an integer square root function, using only integer arithmetics (by Newton's method or some similar procedure).

`sqrt`

-speedup, this algorithm is also attractive because it is compact and elegant (it doesn't need additional function to check if a number is a prime), and it is easily modifiable. In the following examples, we shall modify the original, $O(n)$ version of the algorithm. Each of these can be made faster using the same `sqrt`

trick that we used above.

Try to solve the following problem:

Write a function that prints all the prime factors of a given integer $n$, each of them as many times as it appears in the factorization of $n$.

So, $p_1$ has to be printed $k_1$ times, $p_2$ has to be printed $k_2$ times, etc.

Try to modify `print_prime_factors_def`

to achieve this. Here, we modify `print_prime_factors_div`

(the version without `sqrt`

).

In [ ]:

```
def print_prime_factors_div(n):
"""Prints all the prime factors of `n`, each of them
repeated according to its multiplicity."""
p = 2
n = abs(n)
while n > 1:
if n % p == 0:
while n % p == 0:
print(p)
n //= p
p += 1
n = int(input("n = "))
print("The prime factors of {} are:".format(n))
print_prime_factors_div(n)
```

That's right: we only swapped two lines (i.e., we moved the `print`

into the inner `while`

).

However, in

```
if some_condition:
while that_same_condition:
do_something()
```

the `if`

part is redundant (check what happens when `some_condition`

is `True`

and what if it is `False`

).

So, the final version is, somewhat surprisingly, even simpler that the original:

In [12]:

```
def print_prime_factors_div(n):
"""Prints all the prime factors of `n`, each of them
repeated according to its multiplicity."""
p = 2
n = abs(n)
while n > 1:
while n % p == 0:
print(p)
n //= p
p += 1
n = int(input("n = "))
print("The prime factors of {} are:".format(n))
print_prime_factors_div(n)
```

n = 1234567 The prime factors of 1234567 are: 127 9721

**Problem:** Print each prime factor once, but next to it write also its multiplicity, i.e., each line of the output has to have the form $p_i \ast\ast k_i$.

`print_prime_factors_div`

. We already have a loop that has exactly one step for each occurrence of each $p_i$. We just need to count that and adapt printing.

In [13]:

```
def print_prime_factors_div(n):
"""Prints the summands of the prime factorization of `n`."""
p = 2
n = abs(n)
while n > 1:
if n % p == 0:
i = 0
while n % p == 0:
i += 1
n //= p
print("{}**{}".format(p, i))
p += 1
n = int(input("n = "))
print("The prime factors of {} are:".format(n))
print_prime_factors_div(n)
```

n = 1234567 The prime factors of 1234567 are: 127**1 9721**1

**Final remark:** Returning all of these by a generator is very simple: all we need to do is replace `print`

with an appropriate `yield`

.

For example, the last one:

In [14]:

```
def prime_factors_div(n):
"""Returns the generator that makes an iterator for
the prime factorization of `n`."""
p = 2
n = abs(n)
while n > 1:
if n % p == 0:
i = 0
while n % p == 0:
i += 1
n //= p
yield "{}**{}".format(p, i)
p += 1
n = int(input("n = "))
print(n, "=", "+".join(prime_factors_div(n)))
```

n = 1234567 1234567 = 127**1+9721**1