Ideas for a submission to PLDI `98
The submission deadline is 5pm CST November 7.
Priorities
What follows is a list of Things To Do for the
submission. We need to prioritize the list, decide which subset to
target for the submission, and which (slightly larger) subset to
target for the final paper. Each of the things in the list (but not
all of them!) could be finished by the deadline for the final paper,
but not necessarily by the submission deadline.
- *Name for our system
- My latest proposal is Art Deco (for
At Run Time Dynamic Compilation)
- Previous proposals:
- Sorcerer (for
Statically Optimized
Run-time Compiler)
- Duchamp (dynamist artist)
- Defacto name: DyC
- *Benchmarks
- Find the code
- Run it through the 64-bit version of Multiflow
and get help from Charlie, if required
- Determine what functionality is required from our system
- Annotate it
- Examples
- Our previous benchmarks
- Byte-code interpreter
- New SPIN dispatcher mockup
- `C benchmarks (e.g., xv)
- Tempo benchmarks (e.g., FFT)
- Java
- OpenGL
- Perl, Tcl, etc.
- Andrew Certain's multi-resolution viewer
- *Regression tests
- Collect past test cases and re-annotate them
- Dust off the script(s)
- Develop a more comprehensive testing suite
- *Instrumentation
- Figure out how to use the Alpha cycle counter or another way
to accurately measure user cycles
- Based on what we want to claim, decide what to measure and why
- Overhead (partially, to estimate benefit of optimizations we
don't do)
- Categorized unit boundary crossings
(promotions, discordant merges, etc.)
- Cache lookups
- Work-list operations
- Function-call overhead (for lazy GE calls)
- Per-instruction-generated (with & without
above overheads)
- Speedup
- Breakeven points
- Amount of code reuse
- Features
- *Static annotation (@) for other unsafe
operations, such as calls and divide
- *Register actions
- Split the graph for multiple divisions
- Function annotations
- promote_all_invalidate,
promote_one_invalidate, and
promote_one_invalidate_unchecked and explicit invalidation
annotations
- cache_one
- cache(n)
- Global space management / global cache policies
- Program-point lazy annotation
- Division querying
- Run-time inlining
- Specialized varargs
- `C emulation layer (IR-building operations and a
specializing interpreter)
- Data specialization
- Optimizations
- *Multiflow's inlining for apps and for GE operations
- *Investigate excessive spilling at trace boundaries
- Static and dynamic IGOTOs
- Move loads of live static variables to their uses
- Discordant SELECTs (control-flow merge converted to
data-flow merge; we should treat them equally; convert dynamic SELECTs
of static operands back to control flow inside the BTA, saving
enough state to revert if the binding times change)
- Reduce scheduling constraints (eliminate use of trace
boundaries to stop code motion)
- Reduce caching overheads
- Make cache_one_unchecked a single jump
- Specialize lazy entries
- Sort vectors of live static variables by size to
reduce fragmentation
- PIC-style cache lookups
- Hierarchical-context-based caching
- Choose optimal cache-context composition
- Reachability conditions
- Reduce work-list overhead
- Peel / partially unroll the work-list loop
- Eliminate redundant copying of static variables
live across unit boundaries
- Reduce unit-boundary crossings
- Linearization
- Clustering
- Single-run-time-predecessor optimizations
- Further optimize memory allocation
- Eliminate redundant table-pointer construction
- Overwrite jmps with brs
- Eliminate branch laziness when possible
- Scrap merger and generate the GEs in a second pass of the
compiler; this would allow more scheduling constraints on the
template code to be removed
- Other Infrastructure
- Oracle
- Compile-time specializer (could we use Tempo?)
Other Paper Thoughts
- Slant
- Dynamic-compilation system that can provide speedups for
real apps written in C
- Useful, practical declarative system
- What functionality is required for speedups?
- How fast does dynamic compilation need to be?
- Are there applications of dynamic compilation outside
operating systems and OOPLs?
- Comparisions
- How will we compare our system with others?
- With statically compiled C
- With `C?
- With an oracle?
- Java: JIT? Toba? Vortex?
Abstract
Here is a sample abstract that I wrote a few months ago. It has a
weak slant, but no mention of contributions. Somebody munge it to
make it better or replace it.
We previously presented the design of a dynamic compilation system for
C [PEPM97]. The system requires the user to specify what code is to
be dynamically compiled and leaves most of the key cost/benefit
trade-offs open to user control through declarative annotatations.
From these annotations our system automatically produces light-weight
specialized dynamic compilers for annotated regions of the program.
In this paper, we present results in applying the system to several
real programs and reflect upon which functionality was required to
obtain those results and to what degree we have succeeded in making
dynamic compilation both fast and effective.
Introduction
What is DC; DC is cool
Dynamic Compilation, the generation and optimization of code
at runtime, has received much attention recently as a means of
transforming programs using information available at runtime. Proposed
applications for this capability have included specializing
architectural simulators for the configuration being simulated,
language interpreters for the program being interpreted, numerical
programs for dimensions and values of frequently used arrays, and
critical paths in operating systems for the type of data being
processed and the current state of the system. Trends in
software engineering toward dynamic reconfigurability,
parameterization for re-use and portability across hardware
architecture all imply a promising role for dynamic compilation.
People have shown DC is viable, but not for big programs
Recent research efforts have made considerable progress towards
proving the viability of dynamic compilation. In particular,
researchers have demonstrated that run-time optimization overhead can
be more than amortized quickly by the increased efficiency of the
optimized code for many small, frequently used kernels. Existing
techniques have, however, failed to demonstrate applicability on any
existing applications of the size and complexity of the interpreters and
simulators mentioned above. The underlying reason for the failure is the
tradeoff between expressivity of the language interface for dynamic
compilation and its ease of use:
- Systems that require the programmer to explicity construct,
compose and compile the dynamically generated code ("imperative
systems") imply large scale re-writing of existing code that
needs to be dynamically compiled. (The expressivity of these
systems is also in question: e.g. breaks out of loops)
- Systems that offer a declarative scheme of sparse
user-annotation to guide dynamic compilation ("declarative systems"
have not been able to express directions for compiling complex
programs in a natural, compact way.
This the new stuff we do
In this paper, we describe a declarative annotation-based system which:
Is expressive enough to exploit complex opportunities for dynamic
compilation in large, realistic programs. The required
expressiveness is significantly beyond that of existing
annotation-based DC systems.
Requires only sparse annotations of existing statically
compilable code, thereby eliminating the rewriting hurdle of
existing imperative systems
Is based on a principled extension of traditional offline
PE techniques to the online case, and
Produces code of quality comparable to that other existing
systems at similar speeds.
This is how we do it, in a paragraph
Summarize features (~list from PEPM paper intro):
- per program-point specialization.
- polyvariant specialization and division.
- on-demand specialization to avoid specializing for values not yet
computed.
- interprocedural specialization with support for separate compilation.
- automatic caching of specialized code
We can get speedups on reasonable inputs for _big_ programs
We demonstrate the effectiveness of our approach by profitably applying dynamic
compilation to javap, the bytecode interpreter for the Java
programming language, and to MIPSI, an instruction-level simulator
for the MIPS R3000. Translating the dynamically compiled regions of
these programs (x and y lines of C) into an imperative language would
clearly require a lengthy, painstaking effort by programmers using an
imperative scheme. In contrast, we enable dynamic compilation of
both systems by adding a few carefully selected annotations (x and y
lines) to their C source code. We also identify the features of these
applications that prevent existing annotation-based systems from optimizing
them without substantial re-writing. (These applications are representative--
talk of other similar alluring ones!)
JITs do much better than us; why? opportunities...
Using java provides us the opportunity...
The existence of hand-written "just-in-time" (and conventional)
compilers for Java bytecode allows us to measure a few more
data-points on the code-quality vs. compilation time curve.
We do as well as other systems: not trading off expressivity/ease
for speed
The improved expressivity and ease of use of our system do not come at
the expense of dynamically generated code quality or longer dynamic
compile times. Though our current system is not configured for
extremely aggressive run-time optimizations (we do not perform global
optimizations such as register allocation or scheduling), we use a
combination of cheap local optimization, extensive static global
optimization of templates for dynamic code and customized
code-generators to achieve speedups and code-generation costs
comparable to those of other existing systems. We
We will do much better, once our future work is done: microbenchmarks with
proposed optimizations
Show this (Brian's list):
- Cost of not specializing calls
- Categorized unit boundary crossings (promotions,
discordant merges, etc.)
- Cache lookups
- Work-list operations
- Function-call overhead
- Per-instruction-generated (with & without above overheads)
- Cost of not doing register actions.
Summary of sections (important: what will be the order and content of
sections?)
Outline
None yet. Anyone, anyone?