Karatsuba Multiplication

Everyone knows how to multiply large numbers. But there is a faster way. What’s interesting is that it was only discovered as recently as the 1960s, by Karatsuba. It makes me wonder how many other things are right under our noses that millions(billions?) of people know about over the course of centuries but nobody has thought of a better way… yet.

I implemented the algorithm bellow in C. The key idea is that it’s possible to save a multiplication by doing arithmetic.

Learning Go

I find that a good way to figure out if I like a new language is to write a few classic algorithms in it.

Merge Sort

Insertion Sort

Quick Sort

And the unit test framework

This might not be the most idiomatic Go but I really like the spartan syntax, I like the array slices and I really like the ability to swap variable values in one line. I am impressed.

Exercise 1.45 of SICP

Exercise 1.45: We saw in section 1.3.3 that attempting to compute square roots by naively finding a fixed point of y \mapsto \frac{x}{y} does not converge, and that this can be fixed by average damping. The same method works for finding cube roots as fixed points of the average-damped y \mapsto \frac{x}{y^2}. Unfortunately, the process does not work for fourth roots — a single average damp is not enough to make a fixed-point search for y \mapsto \frac{x}{y^3} converge. On the other hand, if we average damp twice (i.e., use the average damp of the average damp of y \mapsto \frac{x}{y^3}) the fixed-point search does converge. Do some experiments to determine how many average damps are required to compute nth roots as a fixed-point search based upon repeated average damping of y \mapsto \frac{x}{y^{n-1}}. Use this to implement a simple procedure for computing nth roots using fixed-point, average-damp, and the repeated procedure of exercise 1.43. Assume that any arithmetic operations you need are available as primitives.

Doing experiments on the average number of damps to get the fixed-point procedure to converge seems to be:

Degree of root Power of 2 Number of average damps
2 21 1
4 22 2
8 23 3
16 24 4
32 25 5

Using this basis it makes sense to write a procedure discrete-log that computes the number of powers of 2 in a given number. It’s a bit like a very primitive implementation log2 function hence the name. Discrete-log is \Theta(\log_2 n) because it halves the number of computations every iteration.

> (root 2 0)
1
> (root 2 1)
2
> (root 2 2)
1.4142135623746899
> (root (expt 2 50) 50)
1.9999956962054166
> (root 0.00000001 2)
1.0000000000082464e-4

This program summarizes most of Chapter 1. It has lexical scope, passing functions as parameters, functions that construct and return other functions and the square root algorithm with an improved close-enough? procedure from exercise 1.7.
The repeated-compose and discrete-log could be done iteratively instead of recursively like this.

Exercise 1.26 of SICP

Exercise 1.26: Louis Reasoner is having great difficulty doing exercise 1.24. His fast-prime? test seems to run more slowly than his prime? test. Louis calls his friend Eva Lu Ator over to help. When they examine Louis’s code, they find that he has rewritten the expmod procedure to use an explicit multiplication, rather than calling square:

“I don’t see what difference that could make,” says Louis. “I do.” says Eva. “By writing the procedure like that, you have transformed the \Theta(\log n) process into a \Theta(n) process.” Explain.

Just like the recursive binary tree computational procedure for Fibonacci numbers was exponential so is this process exponential because of the double call to expmod.
This double call creates a tree of depth log2(n). The amount of nodes in a tree (the amount of computational steps to perform the algorithm), of depth log2(n) is 2^{\log_2 n} = n. This means the algorithm is indeed \Theta(n).

Exercise 1.25 of SICP

Exercise 1.25: Alyssa P. Hacker complains that we went to a lot of extra work in writing expmod. After all, she says, since we already know how to compute exponentials, we could have simply written

Is she correct? Would this procedure serve as well for our fast prime tester? Explain.

While the procedure is mathematically accurate and would produce the right results, fast-expt would produce enormous numbers that would then have to be reduced modulo m. For example:

(expmod 2 10 7)
(remainder (fast-expt 2 10) 7)
(remainder 1024 7)

The 1024 might not seem so bad at first.

(expmod 2 100 7)
(remainder (fast-expt 2 10) 7)
(remainder 1267650600228229401496703205376 7)
That’s a bit excessive.

The version bellow keeps the number in check.

I am taking some liberty here in skipping a few minor steps.
(expmod 2 10 7)
(remainder (square (expmod 2 5 7)) 7)
(remainder (square (remainder (* 2 (expmod 2 4 7)) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square (expmod 2 2 7)) 7)) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square (remainder (square (expmod 2 1 7)) 7)) 7) 7) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square (remainder (square (remainder (* 2 1) 7)) 7)) 7)) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square (remainder (square (remainder 2 7)) 7)) 7)) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square (remainder (square 2) 7)) 7)) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square (remainder 4 7)) 7)) 7)) 7)
(remainder (square (remainder (* 2 (remainder (square 4) 7)) 7)) 7)
(remainder (square (remainder (* 2 2) 7)) 7)
(remainder (square (remainder 4 7)) 7)
(remainder (square 4) 7)
2

The key difference is that at no point do any of the numbers grow to be much larger than the base.

Exercise 1.24 of SICP

Exercise 1.24: Modify the timed-prime-test procedure of exercise 1.22 to use fast-prime? (the Fermat method), and test each of the 12 primes you found in that exercise. Since the Fermat test has (log n) growth, how would you expect the time to test primes near 1,000,000 to compare with the time needed to test primes near 1000? Do your data bear this out? Can you explain any discrepancy you find?

I arbitrarily chose to run the fast-prime? test 100 times. As in 1.22 I ran it only larger ranges than the book suggested.

> (search-for-primes (expt 10 10) (- (expt 10 11) 1))

10000000019 *** .016000032424926758
10000000033 *** .016000032424926758
10000000061 *** .0160000324249267583

> (search-for-primes (expt 10 11) (- (expt 10 12) 1))

100000000003 *** .016000032424926758
100000000019 *** .016000032424926758
100000000057 *** .0150001049041748053

> (search-for-primes (expt 10 12) (- (expt 10 13) 1))

1000000000039 *** .016000032424926758
1000000000061 *** 0.
1000000000063 *** .0159997940063476563

> (search-for-primes (expt 10 13) (- (expt 10 14) 1))

10000000000037 *** .015999794006347656
10000000000051 *** .016000032424926758
10000000000099 *** .0149998664855957033

The test is too fast to give meaningful times to compare against. I would expect numbers of order 106 to take 2 times longer to compute than 103. The reasoning follows the property of logarithms. log(106) = 6 and log(103) = 3.

In my case: log(1010) = .016
I would predict: log(1020) = .032
> (search-for-primes (expt 10 20) (- (expt 10 21) 1))

100000000000000000039 *** .03099989891052246
100000000000000000129 *** .03099989891052246
100000000000000000151 *** .030999898910522463

Exercise 1.23 of SICP

Exercise 1.23: The smallest-divisor procedure shown at the start of this section does lots of needless testing: After it checks to see if the number is divisible by 2 there is no point in checking to see if it is divisible by any larger even numbers. This suggests that the values used for test-divisor should not be 2, 3, 4, 5, 6, …, but rather 2, 3, 5, 7, 9, …. To implement this change, define a procedure next that returns 3 if its input is equal to 2 and otherwise returns its input plus 2. Modify the smallest-divisor procedure to use (next test-divisor) instead of (+ test-divisor 1). With timed-prime-test incorporating this modified version of smallest-divisor, run the test for each of the 12 primes found in exercise 1.22. Since this modification halves the number of test steps, you should expect it to run about twice as fast. Is this expectation confirmed? If not, what is the observed ratio of the speeds of the two algorithms, and how do you explain the fact that it is different from 2?

12 Primes from 1.22:

  1. 10000000019 *** .14100003242492676
  2. 10000000033 *** .1399998664855957
  3. 10000000061 *** .14000010490417483
  4. 100000000003 *** .5310001373291016
  5. 100000000019 *** .5320000648498535
  6. 100000000057 *** .54699993133544923
  7. 1000000000039 *** 1.7660000324249268
  8. 1000000000061 *** 1.75
  9. 1000000000063 *** 1.76500010490417483
  10. 10000000000037 *** 5.672000169754028
  11. 10000000000051 *** 5.672000169754028
  12. 10000000000099 *** 5.65599989891052253

Using time-prime-test with the new next function

  1. 10000000019 *** .07799983024597168
  2. 10000000033 *** .07800006866455078
  3. 10000000061 *** .07800006866455078
  4. 100000000003 *** .2969999313354492
  5. 100000000019 *** .29600000381469727
  6. 100000000057 *** .2969999313354492
  7. 1000000000039 *** .9849998950958252
  8. 1000000000061 *** .9840002059936523
  9. 1000000000063 *** .9839999675750732
  10. 10000000000037 *** 3.171999931335449
  11. 10000000000051 *** 3.1570000648498535
  12. 10000000000099 *** 3.171999931335449
Using next Exercise 1.22 Factor of speedup
0.07799983 0.141000032 1.807696658
0.078000069 0.139999866 1.794868503
0.078000069 0.140000105 1.79487156
0.296999931 0.531000137 1.787879664
0.296000004 0.532000065 1.797297493
0.296999931 0.546999931 1.841751036
0.984999895 1.766000032 1.792893625
0.984000206 1.75 1.778454912
0.983999968 1.765000105 1.793699353
3.171999931 5.67200017 1.788146372
3.157000065 5.67200017 1.796642399
3.171999931 5.655999899 1.78310215

Judging by the speedup it does look like there is about a 2 fold increase in speed of determining if a number is prime.

Iterative version of expmod

Recursive version:

I was a bit surprised that the above algorithm didn’t come with the problem set of to rewrite it in an iterative form. So here it is.
Iterative Version:

Exercise 1.20 of SICP

Exercise 1.20: The process that a procedure generates is of course dependent on the rules used by the interpreter. As an example, consider the iterative gcd procedure given above. Suppose we were to interpret this procedure using normal-order evaluation, as discussed in section 1.1.5. (The normal-order-evaluation rule for if is described in exercise 1.5.) Using the substitution method (for normal order), illustrate the process generated in evaluating (gcd 206 40) and indicate the remainder operations that are actually performed. How many remainder operations are actually performed in the normal-order evaluation of (gcd 206 40)? In the applicative-order evaluation?

GCD Algorithm:

Normal Order:

Every time there is an R in an if = statement that R gets evaluated.
#{R1}+#{R2}+#{R3}+#{R4}= 1 + 2 + 4 + 7 = 14 times remainder is run in the if =
R3 is ran once in order to return the actual remainder.
14+#{R3}=18
Number of Times “remainder” function called 18.

Applicative Order:
(gcd 206 40)
(gcd 40 (remainder 206 40))
(gcd 40 6)
(gcd 6 (remainder 40 6))
(gcd 6 4)
(gcd 4 (remainder 6 4))
(gcd 4 2)
(gcd 2 (remainder 4 2))
(gcd 2 0)
2
Number of Times “remainder” function called 4.

Exercise 1.19 of SICP

Exercise 1.19: There is a clever algorithm for computing the Fibonacci numbers in a logarithmic number of steps. Recall the transformation of the state variables a and b in the fib-iter process of section 1.2.2: a <- a + b and b <- a. Call this transformation T, and observe that applying T over and over again n times, starting with 1 and 0, produces the pair Fib(n + 1) and Fib(n). In other words, the Fibonacci numbers are produced by applying Tn, the nth power of the transformation T, starting with the pair (1,0). Now consider Tpq to be the special case of p = 0 and q = 1 in a family of transformations Tpq , where Tpq transforms the pair (a,b) according to a <- bq + aq + ap and b <- bp + aq. Show that if we apply such a transformation Tpq twice, the effect is the same as using a single transformation Tp’q’ of the same form, and compute p’ and q’ in terms of p and q. This gives us an explicit way to square these transformations, and thus we can compute Tn using successive squaring, as in the fast-expt procedure. Put this all together to complete the following procedure, which runs in a logarithmic number of steps:

Definition of a1 and b1:
a_1 \leftarrow bq + aq + ap
b_1 \leftarrow bp + aq
T^2_{pq} = T_{p'q'}
a_1 = bq + aq + ap
b_1 = bp + aq
a_2 = b_1q + a_1q + a_1p
b_2 = b_1p + a_1q
The key here is to substitute from the definition of a1 and b1 into a2 and b2 in order to get p’ and q’
I start with b2 because the expression is simpler to work with. the values I get for p’ and q’ must also work with a2
b_2 = b_1p + a_1q = (bp+aq)p + (bq+aq+ap)q
b_2 =  bp^2+aqp + bq^2+aq^2+aqp
b_2 =  b(p^2+ q^2)+a(2qp+q^2)
p'=(p^2+ q^2)
q'= (2qp+q^2)
b_2 = bp'+aq'
a2 is trickier to check because of the number of terms, I’ll take a few short cuts here in showing the math
a_2 = bpq + aq^2 + bq^2 + aq^2 + aqp + bqp + aqp + ap^2
a_2 = b(2qp+q^2) + a(2qp+q^2) + a(p^2+ q^2)
a_2 = bq'+aq'+ap'
Finally: T^2_{pq} = T_{p'q'} where p'=  p^2 + q^2 and q'= 2qp + q^2

> (fib 10)
55
> (fib 100)
354224848179261915075