Archives for Halfbaked Ideas

Erlang surprises me.

I recently wanted to calculate some binomial coefficients in Erlang, and so needed an implementation of the factorial function. No problem. The factorial function is the canonical illustration of how to define a recursive function in Erlang. It goes something like this:

fac(0) -> 1;
fac(N) -> N * fac(N-1).

I have to confess that definition has always bothered me because it is not tail recursive. And it only takes one more line to write a tail recursive version, just the way God intended:

fac1(N) -> fac1(N,1).
fac1(0,R) -> R;
fac1(N,R) -> fac1(N-1, N*R).

The advantage of fac1 over fac is that fac1 operates in constant space, while fac requires space linear in N. For large N one has to wonder whether fac's chain of deferred multiplications will overflow the stack. But is this really a problem? Just how large a value of N is required to overflow the stack? I decided to try to find out.

I ran some test cases: fac(1000), fac(10000), fac(50000). No stack overflow yet, but I could see it might get old waiting for the enormous answers to be computed and displayed. I decided to switch gears. To avoid suffering through monster integers, I modified the test. Rather than testing with fac, I settled on using the structurally similar function sum:

sum(0) -> 0;
sum(N) -> N + sum(N-1).

Just for grins, I decided to try the same thing in a couple of other languages. Here is sum in Clojure:

(defn sum [n]
   (if (= n 0)
       0
       (+ n (sum (- n 1)))))


And here it is in Haskell, where I changed the spelling to summ to avoid a name collision with a Prelude function called sum:

summ 0 = 0
summ n = n + summ (n-1)

I was surprised by the results of my tests.

Just to build up the suspense, I'll pause to describe the test machine. It's a Dell T105, quad-core AMD Opteron, 8GB of ram, running Ubuntu Intrepid. Clojure is revision 1173, running on top of Java 1.6.0_10. Haskell is ghci 6.10.1, and Erlang is R12B-5.

On to the results. First up is Clojure:

user=> (sum 10000)
50005000
user=> (sum 20000)
200010000
user=> (sum 30000)
450015000
user=> (sum 40000)
java.lang.StackOverflowError (NO_SOURCE_FILE:0)

The stack overflow is just the sort of thing that makes me prefer fac1 over fac. Let's see what happens in Haskell.

*Main> summ 10000
50005000
*Main> summ 30000
450015000
*Main> summ 40000
800020000
*Main> summ 50000
1250025000
*Main> summ 75000
2812537500
*Main> summ 100000
5000050000
*Main> summ 150000
11250075000
*Main> summ 175000
*** Exception: stack overflow

Haskell, or rather ghci, did roughly four times better. It's probably worth pointing out that this is not really a comparison of the languages so much as a comparison of how the language implementations handle one very specific detail.

Time to see how the Erlang virtual machine fares. I stuck the sum function in a module called fac. No matter.


Eshell V5.6.5  (abort with ^G)
1> fac:sum(1000).
500500
2> fac:sum(10000).
50005000
3> fac:sum(50000).
1250025000
4> fac:sum(100000).
5000050000
5> fac:sum(200000).
20000100000
6> fac:sum(500000).
125000250000
7> fac:sum(1000000).
500000500000
8> fac:sum(5000000).
12500002500000
9> fac:sum(10000000).
50000005000000
10> fac:sum(50000000).
1250000025000000
11> fac:sum(100000000).
5000000050000000
12> fac:sum(200000000).
20000000100000000


At that point I went for broke, and tried to compute sum(1000000000). Maybe it would have worked eventually, but I killed it. My machine had gone nearly non-responsive, and I could hear my hard drive thrashing. Those disk sounds incline me to guess that when Erlang ran out of stack space, it moved the computation into virtual memory and soldiered on. Whatever the actual means, N=200,000,000 is pretty impressive. That's three orders of magnitude better than ghci or the jvm. Kudos to the authors of the Erlang virtual machine.

One final thought. In spite of what I saw here, I'll still go with the tail recursive version of the factorial function. It better satisfies my sense of aesthetics. But it's nice to know I don't have to.

A fix for the ipod recovery mode loop

My daugter’s ipod shuffle fell into the dreaded recovery mode loop. Briefly, itunes claims that the ipod is in recovery mode, and must be restored. Restoration and reboot is followed by itunes claiming that the ipod is in recovery mode, and must be restored. The linked discussion on an Apple discussion forum offers various suggestions, most of which either refer to Windows specific issues (no Windows here) or to diagnostic mode (shuffles have no such mode).

I came up with a solution that worked for me. I make no claim it will work for you, but here it is. Fire up the Apple disk utility ( /Applications/Utilities/Disk Utility ). Select the volume corresponding to your ipod, and then select the Erase tab. Yes, the procedure is to erase all data on the ipod. But not just any erase, notice the Security Options button. Click it, and check the Zero Out Data box. The goal is to actually overwrite the data with zeros. In my early tries, simple erase did nothing, but this version did the trick. YMMV. Do keep in mind your music will not survive this process. Don’t try it unless you have a backup or are willing to lose the tunes.

After erasing the data, one more round of itunes restore did the trick. If you try this, leave a comment and let me know whether it worked for you. Good luck.

installing gem on ubuntu ibex

Installing ruby and gem on Ubuntu (Intrepid Ibex) is somewhat problematic because of differences of opinion between the apt folk and the gem folk. If you want a working version of gem on Ubuntu Ibex, the RailsOnUbuntu site has the instructions. Worked for me.

An excellent domain registrar

I’ve used a number of domain name registars, including GoDaddy, Network Solutions, Yahoo, and the dreadful Registerfly. After kissing too many toads, I’ve finally found a princess, gandi.net.

Their prices are fair, their site is well designed and easy to use, and they don’t try to cross sell, up sell, or otherwise try to get you to buy services you don’t want. My impression is that the folks at gandi believe technical competence combined with ethical business practices will win and keep customers. Truly a breath of fresh air.

Mirroring an SVN repository

One advantage of a distributed version control system such as Mercurial is that any clone of a project is complete. The clone contains the entire history of the project. That makes backing up a repository trivial.

Not so for a centralized vcs such as Subversion. If you check out a copy of a project, you have a local copy sans the history of the project. Backing up the entire history of the project takes a bit of doing. This post concerns a method for creating a live mirror of a Subversion repository.

Here is the problem we are going to solve. You have a working Subversion (svn) repository. It contains valuable code, so naturally you want to back it up, both on a local machine, and maybe an offsite machine as insurance against disaster striking your site.

You could try to back up your repository by backing up the physical media containing the files which constitute your repository, say using rsynch-backup or some such. I don’t like that approach because if you ever have to use your backup, your first step will be to try to turn a bunch of files into a working svn repository. No thanks, I’d rather have a running svn instance all set to go.

I’ll show you how to back up your svn repository using three very simple bash scripts, and later give a link to to where you can download the scripts. Here’s the first one, make-svnrepo.

#!/bin/bash
## The user should edit this script to insert appropriate repository paths.
## The goal is to mirror ${REMOTESVN}/${REPO} in ${LOCALSVN}/${REPO}, which is created here.
## The population of ${LOCALSVN}/${REPO} is done via another script, sync-svnrepo

REPO=${1}                             # name of repository passed in by user
REMOTESVN='10.0.0.11'                 # omit final "/" since script provides it later
LOCALSVN='/home/drc/repo/saffronsvn'  # ditto
LOCALREPO=${LOCALSVN}/${REPO}         # the file path to local svn repository about to be created
PROTOCOL=svn                          # the transfer protocol to be used : one of {svn,  svn+ssh, file}

## Sanity check: uncomment the svn info command below, and make sure it works.
## If it does not, you are not talking to the remote repo, and may not have correctly specified some component above.
## svn info ${PROTOCOL}://${REMOTESVN}/${REPO}

## do the magic
svnadmin create ${LOCALREPO}
echo "#!/bin/bash" > ${LOCALREPO}/hooks/pre-revprop-change
chmod +x ${LOCALREPO}/hooks/pre-revprop-change
svnsync init file://${LOCALREPO} ${PROTOCOL}://${REMOTESVN}/${REPO}

How would I use this? Well, I have an existing svn repository running on my LAN, on a host with internal ip address 10.0.0.11. Suppose it hosts a project called MyProject. I want to mirror that entire project on my local machine, in the existing directory /home/drc/repo/svn. I have already installed Subversion on my local machine, and make-svnrepo is marked executable and is in my path. What I have to do is run the following in the console:

make-svnrepo My-Project

This will create an empty project. To populate it, use the next script, sync-svnrepo:

#!/bin/bash
## You will want to run this periodically, say via a cron job. I run it 4x daily.
LOCALSVN='/home/drc/repo/saffronsvn'  # should agree with LOCALSVN in make-svnrepo
REPO=${1}
REPOSITORY=${LOCALSVN}/${REPO}
svnsync sync file://${REPOSITORY}

As you might expect, we would populate the mirror created in the first step by running the following in a console:

sync-svnrepo My-Project

This has to be done on a regular basis to keep up with changes; cron does the trick for me. I imagine it would be possible to use hooks within the original repo to trigger backups whenever code is committed, but I prefer not to touch the original repository.

What I’ve done at work is to mirror our working repository in three separate mirrors. Two are onsite, one is offsite. I use the following script, query-svnrepo, to check up on the status of the mirrors.

#!/bin/bash
NAME=${1}
echo "---> " ${NAME}

## The idea here is that we have mirrored the repo named by NAME is in several places
## We want to check that revision numbers on the mirrors more or less agree with the original repo's numbers

ORIGINAL=10.0.0.11                       ## on our lan
MIRROR1="/home/drc/repo/svn"             ## file on localhost
MIRROR2=10.0.0.66/svn                    ## on our LAN
MIRROR3="foo.bar.org/home/drc/repo/svn"  ## offsite, requires ssh access
COMMAND='svn info '

## adjust the protocol as required for each mirror
REPOSITORY=${ORIGINAL}
PROTOCOL='svn://'
RESULT=`${COMMAND} ${PROTOCOL}${REPOSITORY}/${NAME} | grep "Revision:"`
MSG="original:  ${RESULT}"
echo $MSG

REPOSITORY=${MIRROR1}
PROTOCOL='file:///'   ## the third / is needed as an escape because the path begins with /
RESULT=`${COMMAND} ${PROTOCOL}${REPOSITORY}/${NAME} | grep "Revision:"`
MSG=" mirror1:  ${RESULT}"
echo $MSG

REPOSITORY=${MIRROR2}
PROTOCOL='svn://'
RESULT=`${COMMAND} ${PROTOCOL}${REPOSITORY}/${NAME} | grep "Revision:"`
MSG=" mirror2:  ${RESULT}"
echo $MSG

REPOSITORY=${MIRROR3}
PROTOCOL='svn+ssh://'
RESULT=`${COMMAND} ${PROTOCOL}${REPOSITORY}/${NAME} | grep "Revision:"`
MSG=" mirror3:  ${RESULT}"
echo $MSG

This system has worked like a champ for me. I have never had to use the backups, but it’s nice to know they are there. And it is real nice to know that they work. This can easily be tested by checking out the project from one or all of the mirrors.

Screen scraping code can be a pain because blogging software sometimes does weird things to certain characters. You can get clean copies of the scripts from my repository on BitBucket.

If you actually use any of these scripts, let me know how it goes. I’d be happy to hear any suggested improvements.

Dumb luck and disk drives

The CEO of the company I work for mentioned that she was having problem with a USB drive containing a bunch of her data. I offered take a look. Sure enough, her Mac could not see the drive. Worse, while spinning up the drive made a lot of odd, periodic noises that did not inspire confidence.

I brought the device back to my ubuntu box, plugged it in, and nothing. The drive did not auto-mount, there was no sign of it in /media, and I could not see any new device using “sudo fdisk -l”. At that point things did not look entirely promising. It’s hard to use recovery tools on a device if you cannot see it.

And then I got lucky.

The drive was powered up, spinning away, and I was holding it in my hand, thinking about what to try next. I don’t know the rotational speed of the drive, but it must be pretty high. I could feel the gyroscopic precession effect resisting changes to the drive’s axis of rotation when I moved my hand. It is an interesting, almost eerie sensation. I played with drive, feeling the effect, for about 10, maybe 15 seconds. As I did so I heard a clicking sound, and suddenly the drive auto-mounted and a Nautilus window came up showing the contents of the drive. I quickly dumped the 14GB of data to another drive.

It’s good to be lucky.

An email tip

Many email clients share a user interface mis-feature. The user starts to compose an email, and the first thing the client does is put the user in the input area for the to: field. The user dutifully fills in the addressee, and most of the time things work out fine. Once in a while, the user sends a half completed message by mistake. This is especially annoying if you happen to be writing to a mailing list with many subscribers.

I make a habit of leaving the to: field empty until the body of the email is completed. That way if I hit send by mistake, the client has no one to send to. This has proved useful several times.

Installing antialiased Leo on Ubuntu

A little over a month ago I wrote about my off and on efforts to get the Leo text editor running under linux with decent looking antialiased fonts. At long last I have succeeded. Not through any heroic efforts on my part, but by waiting for python and tcltk to finally provide the needed font support. Here’s the drill.

First I downloaded the source for the recently released python 2.6, along with the source for the current versions of tcl and tk. I put all three in /home/drc/local/:

cd /home/drc/local
wget http://www.python.org/ftp/python/2.6/Python-2.6.tgz
wget http://prdownloads.sourceforge.net/tcl/tcl8.5.4-src.tar.gz
wget http://prdownloads.sourceforge.net/tcl/tk8.5.4-src.tar.gz

I then unpacked all three downloads, and built them in this order. First tcl:

cd /home/drc/local/tcl8.5.4/unix
./configure; make; sudo make install

To get the nice fonts with tk requires a little preparation.

sudo aptitude install libxft-dev

Now we can build tk.

cd /home/drc/local/tk8.5.4/unix
./configure --enable-xft; make; sudo make install

Next we build python 2.6. We can take care of getting the dependencies set up by picking up the dependencies for python 2.5.

sudo apt-get build-dep python2.5

Finally we are ready to compile python 2.6. I kept it local so not to mess up Ubuntu Hardy, which relies on python 2.5.

cd /home/drc/local/Python-2.6
./configure --prefix=/home/drc/local/python2.6
make; make install

Everything is built. All we need now is a simple bash script to launch Leo. Here’s what I use:

#!/bin/bash
LOCAL=/home/drc/local
PYTHON=${LOCAL}/python2.6/bin/python
LEO=${LOCAL}/Leo-4-5-1-final/
${PYTHON} ${LEO}/launchLeo.py ${1}

If you happen to find this useful, or have any corrections, suggestions, or comments, I’d love to hear from you. Leo is a wonderful and very useful program. I am delighted to finally have a nice looking version running on linux.

Here’s the obligatory screen shot.

apt-get build-dep

I ran across a  tip I want to preserve and pass along.  The original post is at The Telarah Times.

apt-get build-dep <package_name>

This will get all the dependencies needed to compile your package – providing that package_name is something that would normally install using a simple apt-get install command.

For example:
If you wanted to download and install the latest version of gnucash from tar.gz. You would run:

apt-get build-dep gnucash

Intro to gen_fsm and gen_server

Mitchell Hashimoto, writing at the aptly named spawn_link blog, has put up a couple of very nice tutorials on gen_fsm and gen_server. It is clear that Mitchell put a lot of thought and effort into these posts. He is a skilled expositor with an obvious enthusiasm for the material. As someone who is trying to master Erlang, I want to express my thanks.