wiki:OctPerformance
close Warning: Can't synchronize with repository "(default)" ("(default)" is not readable or not a Git repository.). Look in the Trac log for more information.

Here are some simple benchmarks on the performance of this quad-double implementation. These benchmarks were run using CMUCL on a 1.42 GHz PPC. The columns are times relative to a double-float. The %quad-double represents the time using the internal implementaiton, without the overhead of CLOS. The QD-REAL column shows the effect of CLOS dispatch.

Operation %quad-double QD-REAL Notes
Addition 36 73
Multiplication 420 950
Division 900 1200
Square root 125 133 There is no FP sqrt instruction on a PPC

Here are some timing results using CMUCL on a 1.5 GHz UltraSparc IIIi

Operation %quad-double QD-REAL Notes
Addition 120 240
Multiplication 390 660
Division 1100 1450
Square root 13400 13600 UltraSparc has a FP sqrt instruction

Here are some timing results using CMUCL with SSE2 support on a 3.06 GHz Core i3

Operation %quad-double QD-REAL Notes
Addition 288 390
Multiplication 536 673
Division 2528 2785
Square root 3572 3739

Hida's QD package has a few timing tests. The lisp equivalent was written and here are the timing results. Note that the Lisp equivalent tried to be exactly the same as the QD reference, but no guarantees on that.

Test QD Oct Relative speed Oct/QD
add 0.236 1.16 4.91
mul 0.749 1.54 2.06
div 3.00 3.11 1.03
sqrt 10.57 12.2 1.15
sin 57.33 64.5 1.12
log 194 119 0.613

The second and third columns are microsec per operation. The last column is the relative time of Oct vs QD. All of these were run on a 1.5 GHz Ultrasparc III. Sun Studio 11 was used to compile the C code. CMUCL 2007-10 was used for the Lisp code.

It's surprising that Oct does as well as it does. To be fair, the times for Oct include the cost of CLOS dispatch since QD uses templates and classes in the tests. Except for add and mul, QD and Oct are within a few percent. The sin test is a bit slower in Oct. I don't know why, but the test did include the accurate argument reduction. The log test is quite a bit faster for Oct. This is probably due to using a different algorithm. QD uses a Newton iteration to compute the log. Oct uses Halley's iteration.

Last modified 11 years ago Last modified on 02/10/13 18:01:08