Floating Point Benchmark: Raku (Perl 6) Language Added

I have posted an update to my trigonometry-intense floating point benchmark which adds Raku to the list of languages in which the benchmark is implemented. A new release of the benchmark collection including Raku is now available for downloading.

Raku_Camelia

Raku has had a gestation period which may be longer than any other programming language to ever actually be released. Following the release of version 5 of Perl, Larry Wall began the process of designing its successor, dubbed, not surprisingly, “Perl 6”. This was around the year 2000, and design documents began to appear. Perl 6 was intended to be a major overhaul of the language, in which upward compatibility with existing code would be sacrificed where necessary to clean up some of the infelicitous syntax and semantics that Perl had accreted over the years and gave it a reputation of being, while powerful, expressive, and concise, ugly to read and confusing and error-prone to write.

Over the years, it became apparent that Perl 6 was a moving target, as many of the design documents superseded aspects of those which preceded them. Further, the landscape of programming languages was changing over time, with techniques such as functional programming, asynchronous concurrent processing, optimisation for vector and parallel architectures, and strong type checking to guard against common programming errors, coming into use and expected to be present in any new language to enter the arena.

Perl version 5, while once dominating the system administration tool application space, began to look increasingly long in the tooth, with newer entrants such as Ruby and especially Python being the tools of choice for new projects and programmers entering the job market. By 2019, it was judged that the language, while clearly descended from Perl, had evolved into something sufficiently different that a new name was called for, and Perl 6 was henceforth called “Raku”. Unlike Perl, whose operation was essentially defined by what the implementation did, Raku has a formal specification, allowing multiple implementers to build their own compilers and libraries for the language. At the moment, the most widely used and actively developed compiler is Rakudo, which is available for a variety of machines and operating systems.

I developed the Raku implementation of the floating point benchmark with Rakudo v2021.09, which implements Raku language specification v6.d, running on Xubuntu Linux version 20.04 LTS on a 64-bit Intel x86 architecture machine. I developed and benchmarked two separate versions of the program. The first was a minimal port of the existing Perl implementation of fbench. I simply fixed the code to accommodate changes in the language, but used none of the new features or program structuring tools introduced in Raku—I call this the "port" version. The second was a clean sheet Raku implementation based upon the object oriented architecture used by the C++ version and other modern language implementations such as Haskell, Scala, Erlang, Rust, and Go. This I call the "native" version, which uses Raku's object oriented features, strong typing, enumeration and constant types, and improved control structures to structure the code.

For timing comparisons, I used the C version, compiled with GCC version 9.3.0, which executed the C benchmark with a timing of 0.795 microseconds per iteration, and Perl version v5.30.0, which ran the Perl implementation at 32.1619 microseconds/iteration, or 40.46 times slower than C.

So, how does Raku stack up against these two mainstays of the systems programming world? Well, the good news is that it got identical answers to the eleven decimal places we validate. The bad news is that it is hideously, nay, appallingly slow. How slow? Well, the native version, where I used Raku the way I understand it is supposed to be used, and with the benefit of more than a day's experimenting, tweaking the code, and trying to understand what was going on, produced a timing of 163.42 microseconds per iteration, which is five times slower than Perl and two hundred and six times slower than C. And the minimal port, representative of what you get if you take an existing Perl program performing numerical calculations and migrate it to Raku by simply fixing the changes in language? It runs at a rate of 584.6 microseconds per iteration, which is eighteen times slower than Perl and seven hundred and thirty-five times slower than C.

What is going on here? First of all, I suspect that with development of a compiler and support libraries to support such an ambitious and, until recently, rapidly evolving language, priority has rightly been given to a complete and correct implementation of the language specification rather than optimisation. This is to be expected, and performance should improve over time, especially since language features in Raku such as strong typing and immutable and private variables should permit compiler optimisation better than that of Perl, whose heritage was a typeless, interpreted language.

But, for numerical programming (which, to be fair, was never Perl's strong point or a major application area), Raku's type system is distinctly odd and full of pitfalls for the incautious programmer unaware of what is going on under the hood. Raku has base numeric types of Int (arbitrary precision integers), Rat (arbitrary precision rational numbers [fractions]), Num (IEEE 754 double precision floating point), and Complex (complex numbers made up of two Nums). These, in turn, form “roles” such as Real, which encompasses the Int, Rat, and Num types. Now, this seems eminently reasonable. But now consider the following innocent statement:

     my Num $f = 0;

which declares a floating point variable and sets it to zero. What happens? Why, you get a fatal error message because you've tried to initialise a variable of type Num to an Int value, the constant zero. All right, you say, this seems a bit reminiscent of 1950s programming languages where every floating point number had to have a decimal point or an exponent, so you replace “0” with “0.0” and try again. (At least you don't have to repunch the card and hand your deck in across the counter then wait six hours to see what happens.) And the result is…blooie!—now you've tried to assign a Rat (rational number) to a Num, because everybody knows that “0.0” is just shorthand for “0/10” a rational number if anybody's ever seen one. The only way to get this statement past the compiler is to write the right hand side as “0e0” or “0.0e0”, which it deems to be a Num, or else explicitly convert the type with 0.Num or Num(0).

This may seem to be runaway pedantry which will probably get fixed in a future release, but it's actually more pernicious than you might think. Suppose you declare all of your floating point variables as Real, which encompasses all the (non-complex) numeric types. Now you can assign integers, decimal numbers, and floating point numbers with exponents to them without error. But if you assign a value like, say, 5895.944 (the wavelength, in angstroms, of the “D” spectral line used in evaluating optical designs) to your variable, it takes on a type of Rat, and calculations with it will be performed in library-implemented rational arithmetic, which is much slower than hardware floating point. And when you have a mixed-type expression involving a Num, it has to promote the value from rational to floating point, another slow operation. Note that this will happen if you so much as use a decimal constant in an expression involving floating point values. If you fail to tack an exponent onto it, everything slows down like molasses in mid-winter. Before I figured this out and explicitly typed all of the constants in my program, the “native” version ran more than three times slower than the results I report here.

A suitably smart compiler should be able to analyse the code and do much of this conversion at compile time, but the existence of polymorphic types such as Real may render this impossible in some cases. In any case, a programming language which requires such extreme fussiness to avoid painful and non-obvious speed penalties will have a steep hill to climb in competition with others that impose no such burden on their users.

The relative performance of the various language implementations (with C taken as 1) is as follows. All language implementations of the benchmark listed below produced identical results to the last (11th) decimal place. In the table below, I show Perl as 23.6 times slower than C, not the 40.46 times I measured as a part of these tests. I suspect this is due to the value in the table being measured on a 32 bit machine where the advantage of C-generated machine code is less than the 64 bit machine on which I ran this test.

Language Relative
Time
Details
C 1 GCC 3.2.3 -O3, Linux
JavaScript 0.372
0.424
1.334
1.378
1.386
1.495
Mozilla Firefox 55.0.2, Linux
Safari 11.0, MacOS X
Brave 0.18.36, Linux
Google Chrome 61.0.3163.91, Linux
Chromium 60.0.3112.113, Linux
Node.js v6.11.3, Linux
Chapel 0.528
0.0314
Chapel 1.16.0, -fast, Linux
Parallel, 64 threads
Visual Basic .NET 0.866 All optimisations, Windows XP
C++ 0.939
0.964
31.00
189.7
499.9
G++ 5.4.0, -O3, Linux, double
long double (80 bit)
__float128 (128 bit)
MPFR (128 bit)
MPFR (512 bit)
Modula-2 0.941 GNU Modula-2 gm2-1.6.4 -O3, Linux
FORTRAN 1.008 GNU Fortran (g77) 3.2.3 -O3, Linux
Pascal 1.027
1.077
Free Pascal 2.2.0 -O3, Linux
GNU Pascal 2.1 (GCC 2.95.2) -O3, Linux
Swift 1.054 Swift 3.0.1, -O, Linux
Rust 1.077 Rust 0.13.0, --release, Linux
Java 1.121 Sun JDK 1.5.0_04-b05, Linux
Visual Basic 6 1.132 All optimisations, Windows XP
Haskell 1.223 GHC 7.4.1-O2 -funbox-strict-fields, Linux
Scala 1.263 Scala 2.12.3, OpenJDK 9, Linux
FreeBASIC 1.306 FreeBASIC 1.05.0, Linux
Ada 1.401 GNAT/GCC 3.4.4 -O3, Linux
Go 1.481 Go version go1.1.1 linux/amd64, Linux
Julia 1.501 Julia version 0.6.1 64-bit -O2 --check-bounds=no, Linux
Simula 2.099 GNU Cim 5.1, GCC 4.8.1 -O2, Linux
Lua 2.515
22.7
LuaJIT 2.0.3, Linux
Lua 5.2.3, Linux
Python 2.633
30.0
PyPy 2.2.1 (Python 2.7.3), Linux
Python 2.7.6, Linux
Erlang 3.663
9.335
Erlang/OTP 17, emulator 6.0, HiPE [native, {hipe, [o3]}]
Byte code (BEAM), Linux
ALGOL 60 3.951 MARST 2.7, GCC 4.8.1 -O3, Linux
PHP 5.033 PHP (cli) 7.0.22, Linux
PL/I 5.667 Iron Spring PL/I 0.9.9b beta, Linux
Lisp 7.41
19.8
GNU Common Lisp 2.6.7, Compiled, Linux
GNU Common Lisp 2.6.7, Interpreted
Smalltalk 7.59 GNU Smalltalk 2.3.5, Linux
Ruby 7.832 Ruby 2.4.2p198, Linux
Forth 9.92 Gforth 0.7.0, Linux
Prolog 11.72
5.747
SWI-Prolog 7.6.0-rc2, Linux
GNU Prolog 1.4.4, Linux, (limited iterations)
COBOL 12.5
46.3
Micro Focus Visual COBOL 2010, Windows 7
Fixed decimal instead of computational-2
Algol 68 15.2 Algol 68 Genie 2.4.1 -O3, Linux
Perl 23.6 Perl v5.8.0, Linux
BASICA/GW-BASIC 53.42 Bas 2.4, Linux
QBasic 148.3 MS-DOS QBasic 1.1, Windows XP Console
Raku 205.6
735.3
Rakudo v2021.09/v6.d, Linux, object-oriented rewrite
Minimal port of Perl version
Mathematica 391.6 Mathematica 10.3.1.0, Raspberry Pi 3, Raspbian

Download floating point benchmark collection

Kudos for the rigorous cross-environment FP benchmarking! I made a Perl Data Language (PDL) version which (by being parallelisable) can go much faster than a naive C implementation on the right hardware, just by setting an environment variable: