Benchmarks

This (fairly old) page describes benchmarks we used to demonstrate DyC's capabilities and to compare with other dynamic-compilation systems, particularly Tempo and `C. Our most recent published results can be found in our paper in PLDI `99.

Speedup

One must be cautious in directly comparing speedup numbers because both the static compiler used to compile the baseline (gcc for Tempo and `C, Multiflow for us) and the architecture can dramatically affect the final numbers.  The issue width and instruction latencies of the processor on which the dynamically generated code runs have a large impact on speedup due to dynamic compilation.  In particular, on a multi-issue machine, if the instructions removed by dynamic compilation are not on the critical path of the computation, the total running time of the computation may not be affected.

For example, all instructions from the inner loop of the pow benchmark's kernel can be removed by dynamic compilation except one floating-point multiplication.  Unfortunately, on the DEC Alpha 21064, the latency of that operation, 6 cycles, dominates the loop's critical path.  Removing the other instructions, therefore, does not significantly reduce the time required to execute the loop.  For this reason, that benchmark is unlikely to benefit from dynamic compilaton on modern architectures.

A more detailed analysis of the architectural impact on speedup due to dynamic compilation for the pow benchmark can be found here.

Breakeven Point

The breakeven point measures how many invocations of the dynamically compiled kernel are necessary to amortize the dynamic-compilation overhead. The breakeven point thus depends on both the speedup due to dynamic compilation as well as the overhead.

The dynamic-compilation overhead measured by cycles per instruction generated varies so widely among the benchmarks because it includes the time spent executing static code that is a part of the original program.  So, more code executed at dynamic-compilation time results in less code generated and, therefore, a larger overhead per generated instruction.  This is most apparent in the dotproduct benchmark in the case where no code is generated for 90% of the iterations of the kernel's loop.  A better metric would exclude the cost of executing the static instructions and measure only the actual overhead, but we have not yet developed a method to do that.

The overhead also, of course, includes the costs of actually generating instructions, inserting constants into immediate fields, patching branches, maintaining i-cache coherency, memory management, caching, and context saving and restoring.

Benchmarks

mipsi is an instruction-level simulator for the MIPS R3000. chebyshev and romberg are benchmarks provided by the Tempo group, and are described in Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study by L. Hornof, C. Consel, and J. Lawall. binary, pow, and query are `C benchmarks, described in `C and tcc: A Language and Compiler for Dynamic Code Generation by M. Poletto, W. Hsieh, D. Engler, and M. F. Kaashoek. dotproduct is a benchmark used by both groups.


Last updated January 24, 2000.
Brian Grant (grant@cs.washington.edu)