Machine Learning: a review

During my hiatus from this blog I completed two Coursera courses. I thought a brief review might be in order.

The first course I took was Machine Learning. The course was taught by Andrew Ng of Stanford University. Machine Learning (hereafter, ML) was my first exposure to the world of massive open online courses. I had heard good thing about the ML course, so I decided to take the plunge.

I was very curious about one thing. A course on machine learning would presumably require the prospective student to have some level of mathematical sophistication, and some ability to program. I have known very good mathematicians who could not program at all, and some very good programmers who were either indifferent to or afraid of math. It seemed unlikely that the ML course would be populated solely by students skilled in both mathematics and programming. I had heard that Professor Ng was a gifted instructor; I wanted to see how he would tackle the difficult task of teaching both math and programming to a large audience of widely varying backgrounds.

The ML course was ten weeks long. Each week featured multiple brief video lectures, say between 5 and 20 minutes long. Along with the lectures, there were weekly quizes and programming assigments. The programming language for the course was GNU Octave. There was also a discussion forum where one could ask questions, talk about the assignments, etc.

Here is a brief outline of the topics covered.

  • linear regression, 1 variable
  • linear regression, multiple variables
  • logistic regression
  • neural networks
  • design of machine learning systems
  • support vector machines
  • clustering
  • dimensionality reduction
  • anomaly detection

Along the way we had a very brief review of some ideas from linear algebra, some tutorials on the Octave language, and many valuable protips (practical advice from a seasoned machine learning practitioner).

A typical (supervised) machine learning problem represents its data as points in a high dimensional space. The task of identifying the points with some property P is cast as the problem of finding a surface (a plane, for instance) that separates the points with property P from those without. Typically there is no such surface, so the problem is to constuct an appropriate cost function, and then find a surface that minimizes the cost. Minimization is often handled through some flavor of gradient descent.

Professor Ng’s approach was very pragmatic. In the time available he could not begin to teach the mathematical methods of optimization, particularly in a class where neither calculus nor linear algebra was a prerequisite. Instead he presented plausible appeals to geometric intuition, and urged us to trust that appropriate Octave librabaries could be used to perform the harder minimizations. He put practice ahead of theory, which I believe is the correct pedagogical approach.

The heart of the course was a collection of extremely well thought out programming problems. Professor Ng and his staff prepared problem sets, presented as detailed pdf instructions along with enough Octave scaffolding so that the student programmer only had to consider the essential core of the task at hand. The grunt work was already taken care of. The problems were graded by a well tuned automated system. One submits answers to it, it lets you know whether your response was correct.

My overall assesment? The class was outstanding. Professor Ng’s reputation is well deserved. My thanks and congratulation to him and to everyone involved. A formidable amount of IT support was required to construct the course, the forums, the video production, and the automated grading. A lot of people did good work there, they deserve recognition.

If you are interested in machine learning, the course is offered again starting Oct 14th. If you decide to go for it, here is a bit of advice.

First, brush up on your linear algebra. Nothing fancy is required, just matrix multiplication. I was amazed at the clarity that framing neural networks in terms of matrices produced. What had been a tangle of nested loops and multiple indices became much simpler. Simpler to work with, and simpler to understand.

Second, and this applies especially to programmers, take the time to learn the rudiments of Octave. Time invested up front on learning Octave, especially its methods for constructing and manipulating arrays, will be repaid in time saved during the assigments. Here’s how Doug Crockford puts it in “JavaScript: the Good Parts”

Most people … don’t even bother to learn JavaScript first, and then they are surprised when JavaScript turns out to have significant differences from SOME OTHER LANGUAGE they would rather be using, and that those differences matter.

Don’t be that guy.

Swap alt and ctrl keys

Another addition to the cybernetic memory: how to use xmodmap to swap the alt and ctrl keys on linux. A very useful change if you happen to use Emacs, especially on a laptop.

Step zero

Use the xev program to determine the keycodes for all four keys to be swapped. On my system they were as follows:

  • right-alt 108
  • right-ctrl 105
  • left-alt 64
  • left-ctrl 37

Step one

Create a .Xmodmap file. Here’s mine; if your key codes are not the same as mine
adjust accordingly.

clear control
clear mod1
keycode 37 = Alt_L Meta_L
keycode 64 = Control_L
keycode 105 = Alt_R Meta_R
keycode 108 = Control_R
add control = Control_L Control_R
add mod1 = Alt_L Meta_L Alt_R Meta_R


Find a way to get the .Xmodmap file to run at startup. This is unfortunately distro and version specific. I am running Linux Mint 14, the following worked for me. Open menu::preferences::Starup Applications. Then add a new entry containing the command xmodmap ~/home/drc/.Xmodmap (adjust path to your .Xmodmap as needed). I suspect using ~/.Xmodmap would have worked, but I did not bother to try.

A quine in SAS

I’ve been playing around with quines in various languages. I searched for one written in SAS, but had no luck. So in accordance with the principle of plentitude, I wrote one. Here it is; tested under SAS versions 9.2 and 9.3.

data _null_;
q = put(byte(34),$char1.);
length C1-C10 $ 80;
array C{10} $ C1-C10 (
" data _null_; " ,
" q = put(byte(34),$char1.); " ,
" length C1-C10 $ 80; " ,
" array C{10} $ C1-C10 ( " ,
" ) ; " ,
" do i = 1 to 4; put C{i}; end; " ,
" do i = 1 to dim(C)-1; put q C{i} q ','; end; " ,
" put q C{dim(C)} q ; " ,
" do i = 5 to dim(C); put C{i}; end; " ,
" run; "
) ;
do i = 1 to 4; put C{i}; end;
do i = 1 to dim(C)-1; put q C{i} q ','; end;
put q C{dim(C)} q ;
do i = 5 to dim(C); put C{i}; end;

xft enabled dmenu

This is a note re getting an xft capable version of dmenu, so that I won’t have to dig next time.

Grab dmenu from Current version at time of this post is 4.5. Download, unzip.

Out of box, dmenu does not support xtf fonts. There is a patch for this at Grab the patch matching the version of dmenu you are using.

Apply the patch:

beavis::  ~/Downloads/dmenu-4.5 
λ patch -p1 < ../dmenu-4.5-xft.diff 
patching file
patching file dmenu.1
patching file dmenu.c
patching file draw.c
patching file draw.h

Build and install:

beavis::  ~/Downloads/dmenu-4.5 
λ sudo make clean install

Initially, I saw this:

dmenu build options:
CFLAGS   = -std=c99 -pedantic -Wall -Os -I/usr/X11R6/include 
-I/usr/local/include/freetype2 -D_BSD_SOURCE 
LDFLAGS  = -s -L/usr/X11R6/lib -lX11 -lXinerama -lXft -lXrender -lfreetype -lz -lfontconfig
CC       = cc
CC -c dmenu.c
In file included from /usr/include/X11/Xft/Xft.h:39:0,
                 from draw.h:3,
                 from dmenu.c:14:
/usr/include/ft2build.h:56:38: fatal error: 
freetype/config/ftheader.h: No such file or directory
compilation terminated.
make: *** [dmenu.o] Error 1

This fixed it for me:

λ sudo ln -s /usr/include/freetype2/freetype /usr/include/freetype

Two remarks: The xft version of dmenu looks great, well worth the effort. The lambda character, λ, is not part of the various commands. It is part of my two-line zsh prompt.

The Devil and the language designers

The reader may be surprised to learn that the Devil is an accomplished programming language designer. Indeed, he has made significant contributions to the field. But the Devil takes no credit for his work, because it is a strategic desideratum of applied deviltry that devilish involvement in human affairs should go unnoticed. This strategy of concealment has been so effective that most readers will take the story I am about to tell as allegory. By all means, believe that. You’ll sleep better.

A long, long time ago, in a land far away, the Devil approached the great language designer Mr. C with an idea.

Devil: Mr. C, I have a delightful notational innovation I’d like to share. May I show it to you?

Mr. C: By all means.

Devil: It is a modest extension to your own work, but one of great power. Consider a simple expression, say


My proposed extension is


Mr. C: What does this new expression mean?

Devil: The δ symbol is used to introduce an additional ‘special’ value z, which takes on random values.

Mr C: I see. This would make it difficult to reason about the value of the resultant expression, would it not?

Devil: Yes, exactly. You have grasped the matter immediately. Are you on board?

Mr. C, an old-fashioned mathematician obsessed with mere correctness, was not on board. The Devil was disappointed, but he is nothing if not patient. He spent some years thinking about the matter and approached another language designer, the celebrated Mr. S.

Devil: Mr. S, I have a wonderful notational advance I’d like to share. May I show it to you?

Mr. S: Please do.

Devil: It is a simple extension to your own work, but one of transformative power. Consider a simple expression, say

(define (f x y) (+ x y))

My proposed extension is

(define (f x y [z]) (+ x y z))

Mr. S: What does your proposed extension accomplish?

Devil: The [z] notation is used to introduce an additional ‘special’ value z, which can effectively take on random values. At runtime, any process whatsoever is allowed to change the value of z, at any time. The caller does not even have to provide a value for z, the runtime will do it. Or if a value is provided, the runtime may change it. Or not.

Mr S: An interesting idea. Would this not lead to pain, madness, chaos, and despair among users of this new feature?

Devil: Why yes, yes it would. I can see you have insight, and are a man after my own heart.

Alas, the Devil had misjudged Mr. S, who was not tempted.

The Devil could hardly believe Mr S would turn down such an opportunity. He had a nagging suspicion that his wonderful idea was being rejected for essentially syntactic reasons. Language designers are funny that way. Besides, what else could it be? So the Devil waited and thought, and redesigned, and waited.

Eventually the Devil approached another language designer, one whom propriety bids I must leave nameless.

Devil: My good fellow, I have a most wonderful construct I’d like to show you.

Designer: What do you have?

Devil: Check this out.

Class Foo {
  int z;
  void set_z (int w){this.z = w;}
  int bar(int x, int y){return x+y+z;}

Designer: You do realize the language is multi-threaded?

Devil: Why yes, yes I do. I’m counting on it. Allows mutation too, as I recall.

Designer: The syntax may need a bit of work, but I think we have something here.

The rest is history.

A note on fractions

I ran across a blog post called Why 0.1 Does Not Exist In Floating-Point. The author uses long division in binary to demonstrate his point. This is understandable, given that the host blog is called Exploring Binary, but there is a an easier way to determine whether the fraction p/q has a terminating floating point representation for a given base.

Say we want to know whether p/q = 20/24 terminates in when expressed in base B=18. First divide out any common factors of p/q; in our case we are left with p/q = 5/6. Now we can forget p; it plays no further role. The fraction terminates if and only if every prime factor of q is a factor of B. Here q = 6 = 2*3. Since B = 18 = 2*3*3, the fraction does terminate. On the other hand, 5/6 does not terminate in base 10, because 3 is not a factor of 10.

Let’s see why this works. What I’m going to show you works in any base, so we might as well use the familiar base 10.

Consider 1/8 = .125 ; what that means is that 1/8 = 125/1000. To say that the fraction p/q has a terminating representation in base B means that p/q = x/(B^k) for some integers x and k. By looking at the prime factors of q and B, it is easy to either construct x and k, or see that the construction fails. Here are some examples.

1/8 = (1/2)(1/2)(1/2)
    = (1/2)(5/5) (1/2)(5/5) (1/2)(5/5)
    = (5/10)(5/10)(5/10)
    = 125/(10^3)

Here’s the 5/6 in base 18 example.

5/6 = (5/2)(1/3)
    = (5/2)(9/9) (1/3)(6/6)
    = (45/18)(1/18)
    = 45/(18^2)

I mentioned that 5/6 does not terminate in base 10. Here’s what happens if you try as above.

5/6 = (5/2)(1/3)
    = (5/2)(5/5) (1/3)
    = (25/10)(1/3)

There is no integer x such that 3x = 10, so the process fails. I hope this intuitively clear.

The failure case is the more interesting one, so I say want to say a bit more about it. We tried to show that p/q = x/(B^k), where all the variables must be integers. This is the same as saying that p(B^k)/q is an integer, and so q must be a divisor of p(B^k). Let m be an integer. It is a trivial fact that if m divides q and q divides p(B^k), then m divides p(B^k). Now suppose m is a prime divisor of q. It is a deep and non-trivial fact that if a prime m divides a product p(B^k), then either m divides p or m divides B^k (or both). Recall that we began by dividing out common factors of p and q, so m does not divide p. Then m must divide (B^k), and by the same argument, m must divide B. To recap, the only way q can be a divisor of p(B^k) is if each of its prime factors divides B

That’s all folks.

Equational Reasoning

Alan Dipert tweeted an interesting puzzle:

pop quiz: solve point-free. answer must be a function value! #clojure

The 4clojure version of the problem is simple enough, but the added requirement of a point-free solution makes Alan’s version of the problem harder. And harder to resist. So I took a shot at it.

My plan was simple: solve the problem in the natural way, and then transform the solution into the point-free style. Piece of cake. Not.

Here’s my starting place:

   (fn [x] (fn [y] (reduce * (repeat x y))))

That solves the 4clojure version of the problem, but I need to get rid of the arguments x and y, and the lambdas. To make a long story short, I couldn’t do it. The best I could manage was this:

   #(comp (partial reduce *) (partial repeat %))

I managed to rid of one lambda and one variable, but could not get rid of the remaining argument, now slightly disguised as %, and the remaining lambda, disguised as #().

I wanted to know the answer, so I asked Alan, who directed me to a tweet by Baishampayan Ghose. Now the story starts to get interesting. This was @ghoseb’s answer:

   (partial partial (comp (partial reduce *) repeat))

Yow! I would not want to run into that in the wild. No matter how hard I stared at that code, I could not get a clear understanding of how it works. But there are methods other than staring, say abstraction and equational reasoning. Let’s use them to analyze @ghoseb’s code.

I mentioned equational reasoning, so let’s introduce a couple of equations that describe the action of the functions partial and comp.

  • P: ((partial f g) x) == (f g x)
  • C: ((comp f g) x) == (f (g x))

I have labelled the equations P and C, so that I can refer to them later. The equations combine infix and prefix notation. Don’t let that bother you; it’s ok to be less than rigorous in meta-language. Mathematicians do it all the time. The symbols f, g, and x are just variables.

Each equation states that two expressions are equivalent. Here are concrete examples of P and C, written in executable Clojure.

   (= ((partial * 2 3) 4)  ;; illustrates P
      (* 2 3 4))
   (= ((comp #(* % % ) inc) 1) ;; illustrates C
      (#(* % %) (inc 1)) )

Now let’s put P and C to work on @ghoseb’s code. We’ll start by writing out what we know from the problem. We know, and can test by execution, that the following expression evaluates to 256.

   (((partial partial (comp (partial reduce *) repeat)) 2) 16)

I mentioned that we would use abstraction; by abstraction I mean giving things names. There is one chunk of code in there, enclosed in the innermost parens, that is easy to understand. Let’s give it a name.

   (def mult (partial reduce *))

The mult function applied to a finite sequence of numbers returns their product:

   user> (mult [1 2 3 4])

We can rewrite our expression using mult and get a slightly simpler expression that still evaluates to 256.

   (((partial partial (comp mult repeat)) 2) 16)

Next we’ll simplify a sub-expression. P allows us to infer that

  ;; ((partial partial (comp mult repeat)) 2) == (partial (comp mult repeat) 2)

Protip: forget the semantics for a moment, and just look at the syntax. P allows us to drop one set of parens, and one instance of partial.

Now we go back to our expression that evaluates to 256, and substitute the simplified subexpression.

   ((partial (comp mult repeat) 2) 16)

The P rule has been working for us, let’s use it again. Think symbolically, algebraically. Lose a set of parens, and a partial.

   ((comp mult repeat) 2 16)

We have eliminated all instances of partial. So let’s use C to get rid of the comp.

   (mult (repeat 2 16))

This expression is easy to understand, provided you understand mult and repeat.

Just one thing. Did you buy that last substitution? I claimed above that the C rule works as follows:

   ((comp f g) x) == (f (g x))

But the rule I actually used was this:

   ((comp f g) x y) == (f (g x y))

The thing to understand is that because Clojure functions can be variadic, C is really a pattern of rules, dependent on the arity of g.

  • ((comp f g) x) == (f (g x))
  • ((comp f g) x y) == (f (g x y))
  • ((comp f g) x y z) == (f (g x y z))
  • … etc …

Similarly, P is also a family of rules:

  • ((partial f g) x) == (f g x)
  • ((partial f g) x y) == (f g x y)
  • ((partial f g) x y z) == (f g x z)
  • … etc …

The situation with P is even trickier, because we have to also account for the fact that partial is variadic:

  • ((partial f g) x) == (f g x)
  • ((partial f g h) x) == ((partial f g) h x) == (f g h x)
  • … etc …

In general, P allows us to transform (f x1 x2 … xn) into ((partial f x1 …. xk) xk+1 … xn) for our choice of 1 <= k < n. Something to note is that the P and C rules are bidirectional. We can take advantage of that to run the analysis of @ghoseb’s code backwards, and maybe get some insight into how to construct a point free solution in the first place.

The solution will have to implement exponentiation. Whatever the implementation, it will have to be true that ((solution 2) 16) evaluates to 256. We don’t know the details, but we see the shape of it. We can start with a fairly natural solution with the wrong shape, below, and use the P and C rules to mold the solution into the right shape.

   (mult (repeat 2 16)) ;; now use C
   ((comp mult repeat) 2 16) ;; use P
   ((partial (comp mult repeat) 2) 16) ;; use P
   (((partial partial (comp mult repeat)) 2) 16) :; use def of mult 
   (((partial partial (comp (partial *) repeat)) 2) 16)

And done. The trick, as in algebra, is to know which rule to apply when. Practice helps, as does a clear understanding of what rules apply, and why. I hope anyone who got this far finds some value in this analysis.

My thanks to @alandipert and @ghoseb for the lovely puzzle. It was instructive, in more than one way.

Years ago I made a promise to my wife that I would not allow my daughter to “check out from math”. I have kept that promise, and in so doing have tried to bear in mind that it is hard to learn to think with symbols. This fall my daughter came home first day of school with a math assignment, a variety of simple algebra problems. It was a diagnostic, the problems were nothing she hadn’t seen before, but she was out of practice after a summer off. I walked her through the assignment, but I too was out of practice. Not with the math, that I could do without thinking. And I did, without thinking. And without the patience and understanding that I should have shown. My daughter thought I was disappointed with her. My fault. This problem reminded me of what it’s like to struggle with symbolic thinking. I hacked without plan or self-awareness, and I floundered.

After I saw @ghoseb’s solution I took the time to think systematically, and was rewarded with insight. I learned a bit about programming, and a bit about humility and patience. It was a good day.

An invitation to FP for Clojure noobs

I’ve heard newcomers to Clojure ask how to get started with functional programming. I believe that learning to program in the functional style is mostly a matter of practice. The newcomer needs to become familiar with a handful of higher order functions, and how they are used in common idioms. This can be done by practice with simple, well defined problems. I assume that the prospective reader already has a grasp of the rudiments of Clojure, and can operate the Leiningen build tool.

Here is what I propose. I have prepared a set of annotated exercises illustrating typical Clojure fp idioms. The exercises are the first 31 Project Euler problems, one for each day of the month. I believe these problems are ideal for the purpose at hand. Each problem is succinctly stated, interesting, and well defined. Each lends itself to a natural functional solution.

I have packaged the problems as a Leiningen project. Each problem is stated, and then solved. Each problem is in its own name space, to enable code reuse across problems. Each problem is documented using the fabulous Marginalia literate programming tool. Running commentary tries to point out common idioms, and provides links to ClojureDocs documentation for newly introduced functions. Unit tests are provided to check the provided solution against the official Project Euler solution. The solutions use current Clojure (1.4.0 beta). Nothing is used from the old clojure.contrib. Each solution is written in an explicitly functional style. Many of the solutions consist solely of function definitions. Vars are used only to name data given as part of a problem statement, or data structures constructed from such data. All data is treated as immutable.

What I recommend is deceptively simple. Read each problem statement. Think about how you might solve it; do so if you like. Then read the provided solution. Read it actively: take the time to run the code, and to understand every function and every construct in the solution. Play with the code. Make changes, run things in the repl. Use ‘lein test’ to see whether your modifications still get the same answer.

Here’s the important point: once you understand the solution, delete it and then reconstruct it. Type it in. Look up functions you don’t remember. Did I mention you should type the code in? Knowledge enters the mind through the fingers, not just the eyes. Do this and I believe you will quickly jump-start your understanding of functional programming in Clojure.

Here are a few things to avoid.

Do not attempt to learn Clojure and new tooling, say Emacs+slime, at the same time. Keep it simple. Edit code using whatever editor you normally use, and run a Clojure repl in whatever terminal emulator you normally use. No fancy IDE or editor hacking is required. Separation of concerns applies to learning as much as to coding.

Don’t worry about parts of Clojure not immediately related to functional programming. In particular, don’t worry about concurrency constructs, STM, protocols, multi-methods, macros, transients, or whatever. Instead, worry about knowing how to define functions using any and all of (defn …), (def …), #(…), and (fn …). Learn map, filter, reduce, for, let, letfn, and iterate.

The Project Euler problems do require a bit of math. Don’t let that put you off. Worst comes to worst, I have already done the math for you. The goal here is to learn fp not math, so don’t get hung up on solving the problems ab initio. The problems are a device to present solutions written in a functional style. If a particular problem hangs you up, skip it.

Don’t worry about lambda calculus, type theory, category theory, monads, morphisms, or any such abstract concerns. Your functional programming journey may eventually take you to those, but you need not start there. Abstraction is best appreciated in the light of experience with concrete examples. Begin by reading and writing code to solve many small, concrete problems. See what the solutions have in common. That is what you want to learn. The more these solutions seem like variations on a theme, the better.

When I first started learning Clojure, I did so by working Project Euler problems. PE is a lovely and tasteful collection of problems, one that has provided countless hours of instruction and fun to many, many people. While the PE problems are outstanding, I do not claim any special virtue for my solutions. But they should not require any special virtue, that is the point. I want you to see functional programming as a simple and natural way to express intent, as one more way to get things done. If these solutions seem routine, then I have done what I set out to do.

The project repository is is available here. If you want a preview, you can jump straight to the Marginalia docs. Enjoy.

Tree reparenting

Chris Houser contributed the very pretty problem 130 to 4clojure. I won’t repeat the problem statement here, just my thoughts on how to approach the problem, and my solution.

The strategy is to decompose a complex task into a sequence of simpler tasks. The complex task is to transform a tree: we are given a tree with root r, and some designated node n. The goal is to rewrite the tree representation so that n is the new root. The further n is from r, the more complex the task. It turns out that the task is fairly simple if n is an immediate descendant of r. So the strategy is this. Find the path (sequence of nodes) from r to n, say it is [r a b c n]. We first rewrite the tree so a is the root. Then rewrite that tree so b is the root, etc. It’s that simple.

In the code I will refer to labels rather than nodes. I think of the trees this way. We have a set L of labels. Trees are constructed by induction, as follows.

  • For any element x in L, [x] is a tree.
  • Any vector whose first element is a label and remaining elements are trees is itself a tree.

We consider only trees wherein all labels are distinct, otherwise the problem has no unique solution. Here’s the code.

(defn get-label [x]
  (when (sequential? x) (first x)))
(defn path-to
  ;;The path from the root of the tree to the label.
  ([label tree] (path-to [] label tree))
  ([path label tree]
     (let [lbl (get-label tree) path (conj path lbl)]
       (if (= label lbl)
         (mapcat (partial path-to path label)
                 (rest tree))))))
(defn lift-local
  ;;Rewrite the tree so that label is the root.
   Here label is required to belong to a subtree
   immediately descended from the root.&quot;
  [tree label]
  (let [match  #(= label (get-label %))
        subtree     (first (filter match tree))
        but-subtree (remove match tree)]
    ;; append but-subtree to subtree
    (concat subtree (list but-subtree))))
(defn lift
  ;;Rewrite the tree so that label is the root.
  [label tree]
  (reduce lift-local
          (rest (path-to label tree))))

Final thoughts: I inductively defined trees in terms of vectors, but the code does not depend on vector specific operations, and should work with trees represented by either vectors or lists. For a very different take on this problem, see Meikel Brandmeyer’s solution using logic programming.

A simple parallel computation

The 2 June 2011 issue of The Economist contains an article on parallel programming. The article points out what is obvious to some, mostly because they have learned it the hard way.

You might expect a six-core machine to be six times faster than a machine with a single-core microprocessor. Yet for most tasks it is not.

Indeed. Every once in a while, if you pick the right problem and the right approach, you can get a significant speedup on a multi-core box. I’ll give you a simple example.

The problem is that old chestnut, computing the nth Fibonacci number, for large n. By “large n” I mean n=1,000,000 or so. By then fib(n) has enough digits to fill a book. For example, fib(1,000,000) has 208,988 digits and fib(2,000,000) has 417,975 digits.

To have a basis for comparison, here is a standard approach to computing the nth Fibonacci number. [clj](defn std-fib
"A slow but simple way to compute nth Fibonacci number."
(loop [a 0 b 1 k n]
(if (zero? k)
(recur b (+ a b) (dec k)))))[/clj]

The algorithm is familiar, but a couple of points are worth noting. The first is that computing the nth Fibonacci number this way will require n additions. The second is that though this approach is tail recursive, it does not use constant space. The space usage will be linear in n, for the following reason. The number fib(n) grows exponentially with n. The standard ways to represent a number n grow logarithmically with n. The log and the exponential functions more or less cancel, and the space required will grow proportionally to n.

We’ll turn the iterative std-fib into a parallel computation in two steps. The first step is to rethink our approach, and switch to a matrix computation. Then we’ll tweak the matrix approach to make it parallel.

Here’s the quick review of matrix multiplication:


Because it is tedious to work with LaTeX in WordPress, as above, I am going to change the notation a bit and write the 2 x 2 matrix A as [a b c d], etc. This simpler notation works well with both WordPress and Clojure.

Here’s the basis of computing Fibonacci numbers with matrices. Consider the matrix

F = [0 1 1 1]  = [fib(0) fib(1) fib(1) fib(2)]

The matrix powers of F have a nice property. For integers n > 0:

Fn+1 = [fib(n) fib(n+1)  fib(n+1) fib(n+2)]

So we’ll compute the nth Fibonacci number by computing Fn+1 and extracting its first entry. At first glance that might seem crazy. Multiplying two matrices will involve several multiplications and several additions, whereas std-fib did only a single addition. How can the proposed matrix approach possibly be faster?

Suppose we want to compute F1000. We will not perform 1000 matrix multiplications. Instead we’ll note that F1000 is the square of F500. Suddenly half of our multiplications are gone. And of course F500 is the square of F250, and so on, recursively.

We’ll compute matrix powers by squaring when we can, and multiplying when we have to.[clj](defn mat-power
"2×2 matrix M raised to nth power"
[M n]
(= 0 n) [1 0 0 1]
(even? n) (recur (mat-sq M ) (/ n 2))
:else (mat-mult M (mat-power M (dec n)))))

The mat-power function is defined in terms of two other functions, mat-mult and mat-sq. Let’s take a look at them.[clj](defn mat-mult
"Matrix multiplication for 2×2 matrices, both arguments are powers of the same
symmetric matrix. Note that this version of multiplication is _not_ correct in the
general case. In particular, the product of symmetric matrices need not be symmetric.
This function should be used only to compute a positive integer power of a symmetric
matrix. Such a product is guaranteed to be symmetric."
[[a b _ d] [e f _ h]]
(let [[ae bf dh af bh] (map *
[a b d a b]
[e f h f h])
[w x y] (map +
[ae af bf]
[bf bh dh])]
[w x x y]))

(defn mat-sq
"Square of a symmetric 2×2 matrix"
[[a b b c]]
(let [[aa bb cc x]
(map *
[a b c b ]
[a b c (+ a c) ])

[w y]
(map +
[aa bb]
[bb cc])]
[w x x y]))

(defn fib
"Compute the nth Fibonacci number."
(let [F [0 1 1 1]
M (mat-power F (inc n))]
(M 0)))

Above I exploit a feature of the problem at hand to reduce the amount of work done. We are interested in computing powers of the symmetric matrix F = [0 1 1 1]. A positive integer power of a symmetric matrix is guaranteed to be symmetric. The two off-diagonal elements must be equal, so instead of computing four entries, I can get away with computing only three, cutting the work to be done by 25% or so.

Let’s see how the matrix approach compares to std-fib when timed on largish values of n. On my machine, I see

fast-fib.fib> (time (do (std-fib 1000000) nil))
"Elapsed time: 44284.767864 msecs"

fast-fib.fib> (time (do (fib 1000000) nil))
"Elapsed time: 3208.876229 msecs"

Down from 44 seconds to 3 seconds; std-fib took 13.8 times as long as fib. Nice. Let’s go a little higher.

fast-fib.fib> (time (do (std-fib 2000000) nil))
"Elapsed time: 168510.795379 msecs"

fast-fib.fib> (time (do (fib 2000000) nil))
"Elapsed time: 12722.213307 msecs"

Again, std-fib took over 13 times as long as fib.

Now let’s see how to use those extra cores. We’ll write parallel versions of our earlier functions, and distinguish them by prefixing the function name with a “p”.[clj]
(defn pmat-mult
"Matrix multiplication for 2×2 matrices, both powers of a single symmetric matrix"
[[a b _ d] [e f _ h]]
(let [[ae bf dh af bh] (pmap *
[a b d a b]
[e f h f h])
[w x y] (pmap +
[ae af bf]
[bf bh dh])]
[w x x y]))

(defn pmat-sq
"Square of symmetric 2×2 matrix"
[[a b b c]]
(let [[aa bb cc x]
(pmap *
[a b c b ]
[a b c (+ a c) ])

[w y]
(pmap +
[aa bb]
[bb cc])]
[w x x y]))

(defn pmat-power
"Symmetric 2×2 matrix M raised to nth power"
[M n]
(= 0 n) [1 0 0 1]
(even? n) (recur (pmat-sq M ) (/ n 2))
:else (pmat-mult M (pmat-power M (dec n)))))

(defn pfib
"The k-th Fibonacci number, parallel version."
(let [F [0 1 1 1]
M (pmat-power F (inc k))]
(M 0)))

Basically, I replaced map with pmap. That was the plan all along, and the reason I wrote the matrix multiplication functions using map. So does it work? Well, I have a four core box. Check it out.

fast-fib.fib> (time (do (fib 1000000) nil))
"Elapsed time: 3184.672283 msecs"

fast-fib.fib> (time (do (pfib 1000000) nil))
"Elapsed time: 1029.386271 msecs"

That’s a ratio of about 3.1. Let’s double n.

fast-fib.fib> (time (do (fib 2000000) nil))
"Elapsed time: 12660.566862 msecs"

fast-fib.fib> (time (do (pfib 2000000) nil))
"Elapsed time: 3617.482009 msecs"

That’s a ratio of 3.5. We see fib taking about three times as long as pfib. Recall that we are computing three numbers. A speedup of three is about all we could hope for. If you want to grab the code and play around with it yourself, you can get it here.

I wrote this code because I wanted an unequivocal example of the four cores on my machine speeding up a computation. An earlier attempt at much the same computation was misconceived. I tried a similar approach, but the devil is in the details. I used the matrix Fibonacci computation, but explicitly computed what binary powers of F would need to be multiplied together, computed them using only squaring, and then used a home-brew parallel reduce operator to multiply the resultant matrices in parallel. This was no faster than fib, and more complicated. I was taken in by a too-clever scheme which focused on making the wrong thing parallel. Multiplying whole matrices in parallel did not work nearly so well as the simpler approach I showed you here.

Final thought: I raced through the explanation of the matrix approach to calculating Fibonacci numbers. The method is well known, and there are detailed explanations available online. I like one I ran across not too long ago, on Code Overflow.