Here are some simple benchmarks on the performance of this quad-double
implementation. These benchmarks were run using CMUCL on a 1.42 GHz
PPC. The columns are times relative to a double-float
. The
%quad-double
represents the time using the internal
implementaiton, without the overhead of CLOS
. The QD-REAL
column shows the effect of CLOS
dispatch.
Operation | %quad-double | QD-REAL | Notes |
Addition | 36 | 73 | |
Multiplication | 420 | 950 | |
Division | 900 | 1200 | |
Square root | 125 | 133 | There is no FP sqrt instruction on a PPC |
Here are some timing results using CMUCL on a 1.5 GHz UltraSparc IIIi
Operation | %quad-double | QD-REAL | Notes |
Addition | 120 | 240 | |
Multiplication | 390 | 660 | |
Division | 1100 | 1450 | |
Square root | 13400 | 13600 | UltraSparc has a FP sqrt instruction |
Here are some timing results using CMUCL with SSE2 support on a 3.06 GHz Core i3
Operation | %quad-double | QD-REAL | Notes |
Addition | 288 | 390 | |
Multiplication | 536 | 673 | |
Division | 2528 | 2785 | |
Square root | 3572 | 3739 |
Hida's QD
package has a few timing tests. The lisp equivalent
was written and here are the timing results. Note that the Lisp
equivalent tried to be exactly the same as the QD
reference, but
no guarantees on that.
Test | QD | Oct | Relative speed Oct/QD |
add | 0.236 | 1.16 | 4.91 |
mul | 0.749 | 1.54 | 2.06 |
div | 3.00 | 3.11 | 1.03 |
sqrt | 10.57 | 12.2 | 1.15 |
sin | 57.33 | 64.5 | 1.12 |
log | 194 | 119 | 0.613 |
The second and third columns are microsec per operation. The last
column is the relative time of Oct vs QD
. All of these were run on a
1.5 GHz Ultrasparc III. Sun Studio 11 was used to compile the C
code. CMUCL 2007-10 was used for the Lisp code.
It's surprising that Oct does as well as it does. To be fair, the
times for Oct include the cost of CLOS
dispatch since QD
uses templates and classes in the tests. Except for add and mul,
QD
and Oct
are within a few percent. The sin test is a bit
slower in Oct. I don't know why, but the test did include the accurate
argument reduction. The log test is quite a bit faster for Oct. This
is probably due to using a different algorithm. QD
uses a Newton
iteration to compute the log. Oct uses Halley's iteration.