The HiPE is well worth it
After being so badly disillusioned with Lisp, I've spent the last couple of weeks getting acquainted with Erlang. Since concurrency, scheduling, clustering, and scaling are my biggest interests, I (sceptically) figured that I'd give it a try. I've been pretty impressed. I do find that it's hard to explain to people why Erlang is useful or worth learning, since most people that I speak to seem to want to simple scripting tasks, and Erlang is pretty bad at those (its string-handling has room for improvement, and it takes a pretty large system to get it to shine).
It's been my first truly functional language, and making the jump from loops and threads to recursion-only and Erlang's processes has been a bit of a mind-jump, but I seem to have gotten used to it.
One of my new-language programs is one to calculate items in the Fibonacci set. It's good for learning the "isms" of a language; for instance, I used it while learning Lisp to experiment with Lisp's killer feature, macros. So here's how it looks in Erlang (scaffolding and all):
-module(fib). -export([fib/1]). fib(N) when N<2 -> 2; fib(N) -> fib(N-1)+fib(N-2).
Of course, that's a recursive implementation, that can be quite slow, so if you really needed to be calculating numbers in the Fibonacci set, don't use this implementation. But that doesn't make much use of Erlang's killer feature, concurrency. Since fib(N-1) and fib(N-2) can be done at the same time, and Erlang makes concurrency easy, we should try to do these concurrently, So here's another implementation that does. It looks a lot more complicated, but I'll summarise it.
-module(fib). -export([fib/1,fib/2]). % for spawn() -export([fibber/3]). -define(fib_threads,4). fibber(Result_Pid, N, Thread_Level) -> % io:format("Spawned for fib(~p,~p)~n",[N,Thread_Level]), Result_Pid ! {fib_result, fib(N, Thread_Level)}. fib(N, _Thread_Level) when N<2 -> 1; fib(N, Thread_Level) when Thread_Level < 1 -> fib(N-1,0)+fib(N-2,0); fib(N, Thread_Level) -> spawn(fib,fibber,[self(),N-1, Thread_Level-1]), spawn(fib,fibber,[self(),N-2, Thread_Level-1]), ((receive {fib_result, Result1} -> Result1 end) + (receive {fib_result, Result2} -> Result2 end)). fib(N) -> fib(N,?fib_threads).
In this version, fib/1 (that is, fib() that takes one argument) is a stub that calls fib/2 with the default "thread level" (set using the define in the header, that does what you'd expect from C). That function spawns two fibber processes to calculate fib(N-1) and fib(N-2) respectively with a Thread_Level one less than its own. The fibber sends the result in a message back to its spawner, who waits for their arrival, and adds them together once it has both, and returns the result. When the Thread_Level gets down to 0, it doesn't spawn any, and just does the calculation. So this ends up spawning 2Thread_Level processes in total, which can be quite a few. A Thread_Level of 4 turned out to be optimal after some experimentation.
Now lets try to compare implementations. Here's the same module with some timing functions:
-module(fib). -export([fib/1,fib/2]). -export([time_takes/3]). % for spawn() -export([fibber/3]). -define(fib_threads,4). time_takes(Mod,Fun,Args) -> Start=erlang:now(), Result = apply(Mod,Fun,Args), Stop=erlang:now(), io:format("~p~n",[time_diff(Start,Stop)]), Result. time_diff({A1,A2,A3}, {B1,B2,B3}) -> (B1 - A1) * 1000000 + (B2 - A2) + (B3 - A3) / 1000000.0 . fibber(Result_Pid, N, Thread_Level) -> % io:format("Spawned for fib(~p,~p)~n",[N,Thread_Level]), Result_Pid ! {fib_result, fib(N, Thread_Level)}. fib(N, _Thread_Level) when N<2 -> 1; fib(N, Thread_Level) when Thread_Level < 1 -> fib(N-1,0)+fib(N-2,0); fib(N, Thread_Level) -> spawn(fib,fibber,[self(),N-1, Thread_Level-1]), spawn(fib,fibber,[self(),N-2, Thread_Level-1]), ((receive {fib_result, Result1} -> Result1 end) + (receive {fib_result, Result2} -> Result2 end)). fib(N) -> fib(N,?fib_threads).
My machine has two processors, so I launch Erlang with erl -smp enable. Since calling fib/2 with a Thread_Level of 0 is the same as our previous implementation, we'll use it for comparison.
4> fib:time_takes(fib,fib,[36,0]). 30.9048 24157817
So the original implementation takes about 30.9 seconds on my machine. Here it is with a Thread_Level of 4:
5> fib:time_takes(fib,fib,[36,4]). 17.1832 24157817
That's 17.2 seconds when done concurrently, or 55.7% of the time.
The title of this post involves HiPE, so you must be here to read about the improvement that it offers in this situation. HiPE is an extension to Erlang that compiles native code along-side the BEAM VM code that Erlang normally runs, and includes both in the BEAM file (like a fat binary). First, let's compile that module with HiPE:
1> c(fib,[native]).
{ok,fib}
And try both tests. Without concurrency:
2> fib:time_takes(fib,fib,[36,0]). 2.77270 24157817
2.7 seconds down from 30.9! And here it is with concurrency:
3> fib:time_takes(fib,fib,[36,4]). 1.82664 24157817
1.8 seconds, down from 17.2. Clearly there is some performance to be had! So why isn't HiPE the default?
Edit (2007-07-15): Someone asked me for some more in-depth performance data, so I ran some tests and here it is. This is the time in seconds that fib(40,Thread_Level) took at various values of Thread_Level with HiPE turned on and off
| Thread_Level | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| With-HiPE | 57.4888 | 38.7449 | 37.4630 | 35.6048 | 35.0019 | 34.0274 | 33.7795 | 33.4127 | 33.5097 | 33.4096 | 33.5109 |
| No-HiPE | 265.256 | 176.483 | 166.955 | 179.193 | 169.433 | 152.756 | 153.820 | 151.185 | 150.249 | 149.857 | 151.183 |
fib(N) when N<2 -> 1;
Posted by mnp on September 05, 2007 at 05:37 AM PDT #