Here are some simple benchmarks on the performance of this quad-double
implementation. These benchmarks were run using CMUCL on a 1.42 GHz
PPC. The columns are times relative to a `double-float`

. The
`%quad-double`

represents the time using the internal
implementaiton, without the overhead of `CLOS`

. The `QD-REAL`

column shows the effect of `CLOS`

dispatch.

Operation | %quad-double | `QD-REAL` | Notes |

Addition | 36 | 73 | |

Multiplication | 420 | 950 | |

Division | 900 | 1200 | |

Square root | 125 | 133 | There is no FP sqrt instruction on a PPC |

Here are some timing results using CMUCL on a 1.5 GHz UltraSparc IIIi

Operation | %quad-double | `QD-REAL` | Notes |

Addition | 120 | 240 | |

Multiplication | 390 | 660 | |

Division | 1100 | 1450 | |

Square root | 13400 | 13600 | UltraSparc has a FP sqrt instruction |

Here are some timing results using CMUCL with SSE2 support on a 3.06 GHz Core i3

Operation | %quad-double | `QD-REAL` | Notes |

Addition | 288 | 390 | |

Multiplication | 536 | 673 | |

Division | 2528 | 2785 | |

Square root | 3572 | 3739 |

Hida's `QD`

package has a few timing tests. The lisp equivalent
was written and here are the timing results. Note that the Lisp
equivalent tried to be exactly the same as the `QD`

reference, but
no guarantees on that.

Test | QD | Oct | Relative speed Oct/QD |

add | 0.236 | 1.16 | 4.91 |

mul | 0.749 | 1.54 | 2.06 |

div | 3.00 | 3.11 | 1.03 |

sqrt | 10.57 | 12.2 | 1.15 |

sin | 57.33 | 64.5 | 1.12 |

log | 194 | 119 | 0.613 |

The second and third columns are microsec per operation. The last
column is the relative time of Oct vs `QD`

. All of these were run on a
1.5 GHz Ultrasparc III. Sun Studio 11 was used to compile the C
code. CMUCL 2007-10 was used for the Lisp code.

It's surprising that Oct does as well as it does. To be fair, the
times for Oct include the cost of `CLOS`

dispatch since `QD`

uses templates and classes in the tests. Except for add and mul,
`QD`

and `Oct`

are within a few percent. The sin test is a bit
slower in Oct. I don't know why, but the test did include the accurate
argument reduction. The log test is quite a bit faster for Oct. This
is probably due to using a different algorithm. `QD`

uses a Newton
iteration to compute the log. Oct uses Halley's iteration.