- A+

I've been researching and benchmarking various Fibonacci algorithms recently for my own amusement and more or less by accident came up with an alternate implementation of the classic O(n) time and O(1) space dynamic programming implementation.

Consider the following two functions:

`BigInt fib_dp_classic(int n) { if (n == 0) { return 0; } BigInt x = 0, y = 1, z; for (int i = 2; i <= n; ++i) { z = x + y; x = y; y = z; } return y; } `

and

`BigInt fib_dp_mod(int n) { BigInt x = 0, y = 1, z = 1; for (int i = 0; i < n; ++i) { switch (i % 3) { case 0: y = x + z; break; case 1: z = x + y; break; case 2: x = y + z; break; } } switch (n % 3) { case 0: return x; break; case 1: return y; break; case 2: return z; break; } } `

On my machine, calculating the millionth Fibonacci number takes 6.55s with fib_dp_classic and 2.83 seconds with fib_dp_mod, and even turning on -O3 doesn't change this too much. I don't really have any good ideas as to why the mod version is faster. Is it because the extra store instructions in the classic version are more expensive than the mod in the second? It's my understanding that the compiler should be putting all three variables in registers in both versions and computing the mod is actually fairly expensive; is this not the case?

In fact, I just put both of these through compiler explorer and both are using only registers once you turn optimizations on. Granted, this is only using ints, not the GMP-based bigints I was actually using for my benchmark. Is there some weird GMP implementation detail that might be the cause here?

Update: I even strace'd both to see if malloc() might be the culprit and fib_dp_classic uses 130 syscalls (for n=1000000) while fib_dp_mod uses 133. So still no real clues...

Your first function uses one addition and 3 copies for `BigInt`

-- all of the are quite time consuming operations. The second function uses one addition and one copy operations -- that's where the savings come from, you save 2 copy operations with `BigInt`

.