Karatsuba Multiplication

Everyone knows how to multiply large numbers. But there is a faster way. What’s interesting is that it was only discovered as recently as the 1960s, by Karatsuba. It makes me wonder how many other things are right under our noses that millions(billions?) of people know about over the course of centuries but nobody has thought of a better way… yet.

I implemented the algorithm bellow in C. The key idea is that it’s possible to save a multiplication by doing arithmetic.

sin(n) = -1 where n is an integer in radians

I saw a fun math problem on reddit the other day.

find a number n, so that sin(n)=-1, where n is an integer in radians; so sin(270 degrees) doesn’t count. Obviously it will never be exactly -1 but close enough for the difference to be lost in rounding.

\sin(\frac{3\pi}{2} + 2\pi \cdot n) = -1
I need the argument of sin function to be as close to an integer as possible. Call this integer m.
\frac{3\pi}{2}+2\pi \cdot n \approx m
Solving for \pi leads to:
\pi \approx \frac{p}{q} \approx \frac{2m}{4n+3}

If I have a rational approximation to \pi with an even numerator I can divide it by two get my m. I also have to make sure that the denominator is in the form of 4n+3.
It’s possible to use continued fractions to approximate real numbers. Here’s a continued fraction sequence for pi: https://oeis.org/A001203

The first rational approximation I learned in elementary school is 22/7 which is perfect.

> (sin 11)
-.9999902065507035

For the others I’ll have to evaluate the continued fraction to get my approximation of a simple fraction.

> (eval-terms (list 3 7 15 1 292 1))
104348/33215

> (sin (/ 104348 2))
-.9999999999848337

> (eval-terms (list 3 7 15 1 292 1 1 1 2 1 3 1 14 2 1))
245850922/78256779

> (sin (/ 245850922 2))
-.99999999999999999532
Looks like a good candidate was found.

This is the code to evaluate a continued fraction coefficients. It’s very convenient that scheme has a native rational data type.

Exercise 2.38 of SICP

Exercise 2.38: The accumulate procedure is also known as fold-right, because it combines the first element of the sequence with the result of combining all the elements to the right. There is also a fold-left, which is similar to fold-right, except that it combines elements working in the opposite direction:

What are the values of

3/2

1/6

(1 (2 (3 ())))

(((() 1) 2) 3)
Give a property that op should satisfy to guarantee that fold-right and fold-left will produce the same values for any sequence.
(fold-right op i (list a1 a2 a3))
(a_1 \cdot (a_2 \cdot ( a_3 \cdot i)))
(fold-left op i (list a1 a2 a3))
(((i \cdot a_1) \cdot a_2) \cdot a_3)

Any binary associative operation will be invariant under fold-right and fold-left.

Since 2.37 involved matrix multiplication which is associative, I will use that as an example.
> (define i (list (list 1 0 0) (list 0 1 0) (list 0 0 1)))
> (define a1 (list (list 8 3 2) (list 1 0 9) (list 3 4 5)))
> (define a2 (list (list 5 6 7) (list 1 2 8) (list 6 7 7)))
> (define a3 (list (list 6 7 9) (list 4 3 1) (list 3 4 5)))
> (fold-left matrix-*-matrix i (list a1 a2 a3))
((884 965 1033) (840 900 950) (802 878 942))
> (fold-right matrix-*-matrix i (list a1 a2 a3))
((884 965 1033) (840 900 950) (802 878 942))

Notice that i is the identity matrix because if it wasn’t, I would also be testing commutativity of matrix product. Matrix product doesn’t commute. If i is changed to another matrix fold-left and fold-right will return differing results.

Exercise 2.37 of SICP

Exercise 2.37: Suppose we represent vectors v=(v_i) as sequences of numbers, and matrices m = (m_{ij}) as sequences of vectors (the rows of the matrix). For example, the matrix

M = \begin{bmatrix} 1&2&3&4\\ 4&5&6&6\\ 6&7&8&9 \end{bmatrix}

is represented as the sequence ((1 2 3 4) (4 5 6 6) (6 7 8 9)). With this representation, we can use sequence operations to concisely express the basic matrix and vector operations. These operations (which are described in any book on matrix algebra) are the following:

We can define the dot product as:

Fill in the missing expressions in the following procedures for computing the other matrix operations. (The procedure accumulate-n is defined in exercise 2.36.)

Solutions:

> (define n (list (list 1 2) (list 3 4)))
> (matrix-*-vector n (list 10 20))
(50 110)
> (transpose n)
((1 3) (2 4))
> (define m (list (list 1 2 3) (list 4 5 6) (list 7 8 9)))
> (matrix-*-matrix m m)
((30 36 42) (66 81 96) (102 126 150))
> (define q (list (list 9 1) (list 3 4) (list 6 7) (list 1 8)))
> (matrix-*-matrix (list (list 1 2 3 4) (list 5 6 7 8)) q)
((37 62) (113 142))

Exercise 2.34 of SICP

Exercise 2.34: Evaluating a polynomial in x at a given value of x can be formulated as an accumulation. We evaluate the polynomial
p(x) = a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x+ a_0
using a well-known algorithm called Horner’s rule, which structures the computation as
(\cdots (a_n x + a_{n-1})x + \cdots + a_1)x + a_0
In other words, we start with an, multiply by x, add an-1, multiply by x, and so on, until we reach a0. Fill in the following template to produce a procedure that evaluates a polynomial using Horner’s rule. Assume that the coefficients of the polynomial are arranged in a sequence, from a0 through an.

For example, to compute 1 + 3x + 5x^3 + x^5 at x = 2 you would evaluate

I included a non-abstracted version of Horner’s method for clarity.

> (horner-eval 2 (list 1 3 0 5 0 1))
79
> (horner-eval 2 (list 1))
1
> (horner-eval 2 (list ))
0
> (horner-eval 2 (list 1 1))
3

Exercise 2.32 of SICP

Exercise 2.32: We can represent a set as a list of distinct elements, and we can represent the set of all subsets of the set as a list of lists. For example, if the set is (1 2 3), then the set of all subsets is (() (3) (2) (2 3) (1) (1 3) (1 2) (1 2 3)). Complete the following definition of a procedure that generates the set of subsets of a set and give a clear explanation of why it works:

In order for me to understand what is going on here, I understand how powersets are constructed first and then derive this function.

This program wants to create what is called a powerset.
If S is a set of {a1,a2,a3}
\mathcal{P}(S) = \{\{\} \{a_3\} \{a_2\} \{a_2 a_3\} \{a_1\} \{a_1 a_3\} \{a_1 a_2\} \{a_1 a_2 a_3\}\}
The reasoning to construct a powerset is similar to the way the count change example is thought about.
Create a powerset for the cdr elements without using the car element, cons the car element with powerset of cdr elements. To the result append powerset of cdr elements.

When the arguments are an element and a set inject the element into every element of the set creating a new set. Append that result with the original powerset injected into.

Here is the process above in code:

Finally:

It’s worth noticing that because of the let expression (subsets (cdr s)) is only called once, as opposed to twice in my earlier derivation.

Exercise 2.16 of SICP

Exercise 2.16: Explain, in general, why equivalent algebraic expressions may lead to different answers. Can you devise an interval-arithmetic package that does not have this shortcoming, or is this task impossible? (Warning: This problem is very difficult.)

The errors are a lot tighter for par2 than par1. The reason that this is true is due to the fact that any operation between two intervals will increase errors. If two algebraically equivalent statements are evaluated, the statement with the least operations between intervals will produce less errors.

Here’s a slightly exaggerated example:
A = 10 \pm 0.1
A = \frac{A^7}{A^6} = \frac{A \cdot A \cdot A \cdot A \cdot A \cdot A \cdot A}{A \cdot A \cdot A \cdot A \cdot A \cdot A}

> (define A (make-center-percent 10 1))
> (print A)
[10.,.9999999999999963]
> (print (div-interval (interval-pow A 7) (interval-pow A 6)))
[10.084120477031101,12.92768447766068]

The most noticeable difference is between percent errors of 1% and 13%. Even though the fraction is algebraically equivalent to A evaluating the intervals gives very different answers.

In order to devise a system that doesn’t have this problem would mean it would have to be smart enough to detect duplicate entries and rewrite them in a more interval arithmetic efficient manner. At a minimum it would have to analyze code of this form:

and reduce it to A and than evaluate it.

Transforming the resistor example is even tougher.

The program would have to recognize division and simplify it by dividing the numerator and denominator by r1 and r2. Both the exponent and parallel resistor examples would have to be transformed symbolically since simplifying arithmetically would only introduce errors. I do not think it is feasible to write a program like this with the current lisp knowledge learned in the book so far.

Exercise 2.15 of SICP

Exercise 2.15: Eva Lu Ator, another user, has also noticed the different intervals computed by different but algebraically equivalent expressions. She says that a formula to compute with intervals using Alyssa’s system will produce tighter error bounds if it can be written in such a form that no variable that represents an uncertain number is repeated. Thus, she says, par2 is a “better” program for parallel resistances than par1. Is she right? Why?

> (print (div-interval (make-center-percent 6.8 1) (make-center-percent 6.8 1)))
[1.0002000200020003,1.9998000199979908]
> (print (div-interval (make-center-percent 6.8 .1) (make-center-percent 6.8 .1)))
[1.000002000002,.19999980000021655]

> (print (div-interval (make-center-percent 6.8 .1) (make-center-percent 3.4 .1)))
[2.000004000004,.19999980000021655]
> (print (div-interval (make-center-percent 6.8 .1) (make-center-percent 1.7 .1)))
[4.000008000008,.19999980000021655]
> (print (div-interval (make-center-percent 6.8 .1) (make-center-percent 1.7 .01)))
[4.000000440000004,.1099999890000035]

> (print (par1 (make-center-percent 6.8 .1) (make-center-percent 1.7 .01)))
[1.3600022771855311,.19199981581619266]
> (print (par2 (make-center-percent 6.8 .1) (make-center-percent 1.7 .01)))
[1.3599998237438813,.028000014256019334]

> (print (par1 (make-center-percent 6.8 .1) (make-center-percent 6.8 .1)))
[3.4000136000136,.2999992000024115]
> (print (par2 (make-center-percent 6.8 .1) (make-center-percent 6.8 .1)))
[3.4000000000000004,.10000000000000857]

> (print (par1 (make-center-percent 6.8 10) (make-center-percent 4.7 5)))
[2.844199964577264,22.613352145193346]
> (print (par2 (make-center-percent 6.8 10) (make-center-percent 4.7 5)))
[2.777440701636504,7.05260392723452]

Alyssa’s system will produce tighter error bounds. The errors are a lot tighter for par2 than par1. The reason that this is true is due to the fact that any operation between two intervals will increase errors. If two algebraically equivalent statements are evaluated, the statement with the least operations between intervals will produce less errors.

Here’s a slightly exaggerated example:
A = 10 \pm 0.1
A = \frac{A^7}{A^6} = \frac{A \cdot A \cdot A \cdot A \cdot A \cdot A \cdot A}{A \cdot A \cdot A \cdot A \cdot A \cdot A}

> (define A (make-center-percent 10 1))
> (print A)
[10.,.9999999999999963]
> (print (div-interval (interval-pow A 7) (interval-pow A 6)))
[10.084120477031101,12.92768447766068]

The most noticeable difference is between percent errors of 1% and 13%.