I read the following in chapter seven of Beautiful Code. The author wants to make some points about testing, and opts to use binary search as a compact but non-trivial example.

If you have never implemented binary search, or haven’t done so in a few years, I suggest you try that yourself before going forward; it will give you greater appreciation for what follows.

Binary search is a great example because it’s so simple and yet it’s so easy to implement incorrectly. In Programming Pearls, Jon Bentley shares how, over the years, he asked hundreds of professional programmers to implement binary search after providing them with a description of the basic algorithm. He gave them a very generous two hours to write it, and even allowed them to use the high-level language of their choice (including pseudocode). Surprisingly, only about 10 percent of the professional programmers implemented binary search correctly.

I took the author’s advice and wrote my own binary search. I had three separate errors in my first iteration; all were flushed out by unit testing. My unit test strategy, one I try to use whenever possible, was to generate large numbers of random test cases on the fly. I suppose I should have used eunit, but I just rolled my own. Here’s the code.

-module(bs). -export([search/2, test/1]). %------------------------------------------------------------------------------ %% Return a random integer in 2 .. 5, or return 0 %% In particular, do not return 1; used for generating test cases %------------------------------------------------------------------------------ rint() -> rint(5). rint(Limit) -> Chance = random:uniform(Limit), if Chance == Limit -> 0; true -> 1 + random:uniform(Limit) end. %------------------------------------------------------------------------------ %% Return a length N list of integers in non-decreasing order %% 1: repeated elements can occur, %% 2: distinct consecutive elements must differ by at least 2 %% rint and rlist were designed to force these properties, useful in tests %------------------------------------------------------------------------------ rlist(N) -> rlist(N, []). rlist(0, Result)-> lists:reverse(Result); rlist(N, []) -> rlist(N-1, [rint()]); rlist(N, [H|T]) -> rlist(N-1, [H+rint(),H |T]). %%non-decreasing by construction %% avoid magic constant failure() -> -1. %% take care of trivial 0 element case in binary searh search(_, []) -> failure(); %% List is a list of integers %% K is an integer; return index of K in List, or failure() search(K, List)-> search(K, List, 1, length(List) ). %% Lo and Hi inclusively bound the portion %% of List to be searched for K search(K, List, Lo, Hi) -> if Hi < Lo -> failure(); true -> Probe = (Lo + Hi) div 2, Elt = lists:nth(Probe, List), if K < Elt -> search(K,List,Lo,Probe-1); K > Elt -> search(K,List,Probe+1,Hi); true -> %% when K == elt Probe end end. testcase(N) -> List = rlist(N), Value = lists:nth(random:uniform(N), List), %% we know Value is in List Value = lists:nth(search(Value, List), List), %% we know some other values not in the list Failed = failure(), Failed = search(Value+1, List), Failed = search(Value-1, List), Failed = search(lists:nth(N,List)+1, List), Failed = search(lists:nth(1,List)-1, List), %% return 1 if the searches do not cause an exception 1. test(NumCases) -> test(NumCases, 0). test(0, NumCorrect) -> NumCorrect; test(M, NumCorrect) -> test(M-1, NumCorrect + testcase(random:uniform(100))).

A million random tests passed, so I feel reasonably confident. Interestingly, Beautiful Code article comments at length about a subtle bug arising in a Java implementation. In the Java equivalent of my

Probe = (Hi + Lo) div 2

one needs to watch out for a possible integer overflow which would render Probe negative. It seems to me the flaw is in Java’s practice of allowing integer overflows without throwing an exception. But better than overflowing with an exception is not overflowing at all, that is, having a proper implementation of integers as do erlang, Haskell, lisp, Mathematica, and a host of other languages.

A final note: this code is merely an exercise, done for the sake of learning some erlang. The code, though very likely logically correct, suffers from a terrible performance flaw: accessing the kth element of an erlang list requires time proportional to k. Normally the whole point of binary search is to find things in logarithmic time. That clearly cannot not happen with this implementation. Caveat emptor.

hey, nice post!

I concerned about the following line:

Elt = lists:nth(Probe, List)

Do u know how complex is lists:nth()? Is it O(n) or O(1)?

Do u agree that the entire algorithm complexity ll be O(n*log(n)) if nth() is O(n)?

Best regards.

October 29, 2007 @ 4:45 pm

Hi – I saw the discussion on “The Erlane project” about binary tree v binary search in Erlang. I (also an erlang newbie) found it pretty easy to code a binary tree implementation which I posted on my blog http://mark.aufflick.com/blog/2007/11/30/trees-in-erlang

A kind commenter on my blog entry pointed me to the fact that there is a good binary tree implementation in the standard library that comes with your erlang installation in lib/stdlib/src/gb_trees.erl

December 18, 2007 @ 7:08 am