Strangelights

“Comparisons are odious” as my Mum used to say, but I hope this little comparison doesn’t pong too much. I have a range a little competitions of my own, a language bench mark, based on “The Computer Language Benchmarks Game” (http://shootout.alioth.debian.org), between IronPython and F#, with measurements for C# thrown in as a kind of control group. I would have liked to chuck in the new IronRuby language too, but I had problems finding a current release. This was largely inspired by the benchmarks Ben Jackson posted to the hubfs.net (http://cs.hubfs.net/forums/thread/3196.aspx), thanks Ben!

Now obviously as an author of a book on F# I have vested interest in the outcome of such a competition, but I have tried to be as scientific as possible. My methods were as follows: I took the source from the shootout website (source for python, ocaml and Mono C#) and compiled it to an executable (using the –X:SaveAssemblies option in python’s case) using the highest level of optimization available from the compiler (this is perhaps a little unfair to IronPython as the compiler doesn’t offer optimization, but it seemed silly not use the optimization offered with the other compilers). The python and ocaml source both needed some minor adjustments to compile correctly, but nothing that affected the structure of the program, and therefore one would think these changes would not affect the results much. The test where run 4 times, with the fist result thrown away and the remaining averaged. The source of the programs I used and the compilation commands are available for download here, so you can check the results for yourself if you like.

The versions of each compiler used where:
C# 8.00.50727.312
F# 1.9.1.18
IronPython 1.1

The results are summarised in the tables and graphs below.

Test	N	C#		F#		IronPython
		Lines	Time (Sec)	Lines	Time (Sec)	Lines	Time (Sec)
binarytrees	16	81	2.146667	43	3.881333	44	31.90967
Nbody	200000	169	0.282	115	0.572667	132	32.909
Nsieve	9	40	0.527667	39	1.164667	30	15.93933
Pidigits	2500		[1]	59	15.419	39	11.02267
recursive	11	46	7.595333	34	6.377	38	[2]
fannkuch	11	76	14.60233	49	12.42067	49	829.6697
partialsums	2500000	42	3.191	30	3.481333	46	586.463

[Edit 7-9-2007: Some of N values in chart were not the N values used, this is now corrected]
[1] This test relies on a mono library that provides infinite precision floating point numbers, I did not have this library so I couldn’t compile the test
[2] This test gave a stack over flow exception , so the test could not complete

Results Analysis

F# and IronPython both score well on the number of lines to implement the solution, solutions could be implemented in a very similar number of lines, where as C# generally lags quite a bit behind taking as much as double the number of lines in some case.

In terms of execution speed both C# and F# score very similarly, in some test C# has the edge in others F# has the edge. IronPython generally lags a long way behind both of them; although it does score better than F# in one case the pidigits test in the worst case, partialsums, it was 168.5 times slower than F# and 183.8 times slower than C#.

So why is IronPython so slow? I think the clue is the fact that IronPython stack over flowed for one test, where as C# and F# did not. This suggests that IronPython makes more method calls that C# or F# leading to the slower execution times. I took a look at the generated source using reflector, I could see that to perform mathematical operation, such as addition and subtraction, IronPython calls into an Operations class where as C# and F# just use IL instructions, presumably this is much quicker than a method call. If this is true why does python score better than F# in the pidigits test? The F# team discovered some performance issues with F# printf function (http://cs.hubfs.net/forums/thread/3196.aspx) that is used heavily in this test, they have promised to fix this.

Conclusion

It’s important not to read too much into this sort of performance benchmark as often you can get a bigger difference in the way that an application in particular language than by the language is it implemented in. Also with these sort of test there is often a danger of ending up with the results you where looking to for by some kind of error. But, it does look for the type of number crunching app tested in this performance bench mark then implementing your application in F# or C# will give you performance gains over IronPython. So if you’re looking for a nice pithy language to do some scientific or mathematical number crunching you’re probably best off going for F#.

Update (18 June 2007): I’m informed that the .NET platform will get infinite precisions numbers at some point; this will allow C# to compete in the pidigit test more easily and smooth out the results for F#.
I’d also like to point out that the F# code is ocaml code untuned for the CLR, I’ll be running the tests with tuned code at some point to see if this makes any difference to the code base size and the performance. If anyone would like to tune the C# or IronPython, or contribute programs for other .NET languages, programs just send me the updates. It would be nice to build a platform like the “shootout game” for the CLR, but that’s going take a lot of rainy Sunday afternoon J
Finally, in the coming months, I will be contributing the tuned F# code to the “shootout game” itself, once I’ve assured myself that everything works okay under Mono/Linux. Thanks to Isaac Gouy for this suggestion.

Feedback:

Feedback was imported from my only blog engine, it's no longer possible to post feedback here.

contribute? - Isaac Gouy

You could always contribute your F# programs to the benchmarks game ;-)

http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=fsharp

http://shootout.alioth.debian.org/gp4sandbox/faq.php#contribute

re: F# versus IronPython - Jack Diederich

I'm a CPython dev so while I can't speak to the IronPython implementation directly I do know what they have to do (implement Python).

Because Python the language is highly dynamic Python interpreters have to check every instance at every operation to see if the operation (e.g. addition) has been overridden. In CPython this means some macros that do type checks to see if they can fast-path builtin C-based types and fallback on doing some function calls on pure-python types (does this class's namespace have an "add" function?).

At a guess all that logic happens in the Operations class for IronPython.

re: F# versus IronPython - anon

You should try the same tests with Boo simply because it should be very easy to translate your IronPython code to Boo.

re: F# versus IronPython - Szabolcs

The N values in the result table are messed up a bit.

Also, please run each test with N big enough so that the running time will be at least several seconds. The nbody results for C# and F# are completely meaningless.

re: F# versus IronPython - Robert Pickering

I've correct the N values in the table. Agree the comparison between F# and C# for nbody is not particully meaningful, but I was not really trying to compare C# and F#, C# was added more as control group. If you start to increase N for nbody the IronPython task takes too long to finish.

Strangelights

Another tech blog.

Feedback: