Related Work

Tempo [Consel:1996:POPL], a compile-time and run-time specialization
system for C, is most similar to DyC, but there are numerous and
substantial differences.  Tempo is a mature system that provides
polyvariant function specialization and division, performs alias and
side-effect analysis, performs interprocedural analysis, and deals
with partially static structures.  The two systems differ in the
degree of programmer control offered, annotation strategies, in the
granularity of specialization and division, in support for caching
specialized code, in support for lazy specialization, in the way
aliases and partially static structures are handled, in the way
unstructured code is handled, in support for interprocedural analysis
and separate compilation, in optimization of static and dynamic code,
and in implementation strategies.

In Tempo, the programmer annotates which parameters and global
variables to make static at run time when certain entry-point
functions are called.  These entry points are listed in a file
provided as input to the BTA in addition to the source program, rather
than inlined with the program text as in DyC.  The programmer also
must provide information about aliasing outside of the region of
specialization.  DyC's annotations support the equivalent of entry
points at any program point.  DyC requires no information about code
outside the region of specialziation, but requires annotations to make
pointer dereferences static.  Tempo's alias analysis can be unsound,
but does much more than ours.  On the other hand, our annotations
permit specialization of pointer data structures without accurate
alias information.  DyC's function annotations provide an analogous
solution to the absence of interprocedural analysis in our system.
They permit interprocedural specialization and work as well with
separate compilation.  We plan to add alias analysis and
interprocedural analysis to DyC, but the annotations will continue to
be useful in cases where the analyses fail to collect sufficiently
accurate information.

Because of the disadvantages of CPS conversion
[Jones:1993:PartialEvaluation], and based on study of our target
applications, we feel that program-point specialization is
particularly important for C programs.  In addition to its ability to
specialize functions, Tempo only has the capability to unroll simple
loops.  Our first-generation system also implemented this capability
in an ad-hoc manner [APCEB96].  Cmix was a partial-evaluation system
for C that provided polyvariant program-point specialization for C
[Andersen:1992:Self-ApplicableC].

Cmix handled unstructured code, but it seems to have lacked
reachability analysis [Andersen:1994:PhDThesis].  Tempo applies
Erosa's and Hendren's goto-elimination algorithm to convert
unstructured code to a structured equivalent
[ErosaHendren94,Consel:1996:AUniform].  This approach introduces a
number of additional tests and may also introduce loops.  We feel that
direct support of unstructured control flow is more appropriate for C.

We also feel that polyvariant program-point division is important,
especially when combined with our program-point annotations
make_static and make_dynamic to effect conditional specialization.
Others [Consel:1996:AUniform,Meyer:1991:Techniques] worried that
polyvariant program-point specialization and division could cause code
explosion, but instead of not supporting these powerful features, we
put them under control of the programmer.

Schism's filters permitted choices about whether to unfold or
residualize a function and which arguments to generalize, given
binding times for the functions parameters [Consel:1993:ATour].
Because filters are executed by the BTA, only binding-time information
can be used to make decisions, not the values of static or dynamic
variables, which can be used in specialization decisions in DyC.

Filters can be used, for example, to prevent unbounded unfolding and
unbounded specialization.  Both off-line partial evaluators such as
Schism and on-line specializers such as Fuse [WeiseDanie1991a] look
for dynamic conditionals as a signal that unbounded unfolding or
specialization could result, and then must halt.  Run-time
specializers have an additional option, which is to temporarily halt
specialization when dynamic conditionals are found in potential cycles
and insert lazy callbacks to the specializer, which is what DyC does.

Tempo does not permit caching of multiple specialized versions of
code.  Fabius, a compiler and run-time specializer for a pure subset
of ML, performs caching of multiple versions, but does not provide
control over caching policies [LeoneMarka1995b].

As does DyC, Tempo also constructs and optimizes templates for
dynamically generated code and uses them to produce generating
extensions that, when executed, produce specialized code at run time.
However, they block many inter-template optimizations to preserve
template integrity.  We need only restrict template instructions with
holes to their units, and prevent their speculation above static
branches.  Fabius does not optimize templates separately.

We have expended much more effort optimizing the specialized
run-time specializers.

`C extends the ANSI C language to support dynamic code generation in
an imperative style [EnHsKa96].  The programmer must specify code to
be generated at run time, substitute run-time values and combine code
fragments, perform certain optimizations, invoke the run-time
compiler, manage code reuse and code-space reclamation, and ensure
correctness [LeoneMarka1996a].  They claim that their flexibility is
unique in giving the programmer the ability to specify and compose
arbitrary expressions and statements at run time; however, it is not.

A declarative system such as DyC can do so as well, requiring much
less effort from the programmer, and also facilitating debugging and
ensuring correctness (if the safe subset of the annotations is used).
Tempo uses `C as one of its back-end code generators [Consel97], while
DyC could essentially serve as a back end to `C by specializing an
ICODE interpreter [PoEnKa97].  Generation of new function signatures
is handled by specializing functions with variable numbers of
arguments.  A more direct emulation of `C would be possible with the
addition to our system of run-time inlining through function pointers,
but we stress that, in practice, only a few annotations are necessary
to achieve an effect comparable to common `C idioms.  For example,
multi-way loop unrolling can be achieved with a single annotation on
the loop induction variable.

In fact, DyC has several capabilities that `C does not.  `C cannot
multi-way unroll loops with dynamic exit tests because jumps to labels
in other tick expressions are not permitted.  Also, because portions
of dynamically generated functions cannot be recompiled in `C, we
cannot see how they could unroll the interpreter supporting the
computed goto shown in section 2.  Moreover, DyC's nested and
overlapping dynamic regions are not matched by nested tick expressions
in `C.

Furthermore, better static optimization of dynamic code because
control-flow is more easily determined.  This could be one reason why
`C now applies limited partial evaluation to tick expressions,
including dead-code elimination and unrolling of simple static loops
[PoEnKa97].  They even plan to add a binding-time analysis, which
would yield a hybrid declarative/imperative run-time code generation
system.

We could implement any of the proposed run-time register allocation
strategies (e.g., ICODE [PoEnKa97], Fabius [LeoneMarka1995b], or register
actions [Wall86,APCEB96]), and select among them or select not to
perform run-time register allocation through new policies in the
annotations.

The bottom line, however, is that DyC and other declarative systems
can be applied to existing programs in a straightforward manner,
whereas imperative systems such as `C require substantial rewriting of
code.  [Unfortunately, I couldn't locate any concrete, citable
observation in this paper, so perhaps we shouldn't mention it. -bkg]
We therefore conclude, as did Ershov 15 years ago
[Ershov:1982:MixedComputation], that the declarative approach to
code generation is more general than the imperative approach.

**********************************************************************

Old text and notes:

While the ability to specialize at compile time is an advantage
over our system, Tempo's goal to treat compile-time and run-time
specialization uniformly restricts their treatment of run-time
specialization.  For example, arbitrary dynamic-to-static promotion of
variables, lazy specialization, and dynamic caching strategies are not
compatible with compile-time specialization, and so cannot be exposed
to the programmer.

Dead-variables problem:

Specializing on variables that are no longer used
[Jones:1993:PartialEvaluation].

Unstructured control flow:

Tempo applies the algorithm from [ErosaHendren94].

Could the algorithm presented in [Sakharov:1996:Specialization] be
used to do reachability analysis?

Polyvariant specialization:

Worries about code explosion due to program-point polyvariant
specialization in [Meyer:1991:Techniques].

Cmix does polyvariant program-point specialization.  Tempo does not.

Polyvariant division:

Worries about code explosion due to program-point polyvariant division
in [Consel:1996:AUniform].

Mix did polyvariant program-point division.  Tempo, Cmix, Fabius, etc. do not.

CPS:

[Consel:1991:ForABetter].

Incremental specialization:

By "incremental", Consel meant at several stages in an OS.  For example:
	-- compile time
	-- link time
	-- load time
	-- boot time
	-- run time
So, this is a little different from our lazy specialization, which is all
at run time.
(from his `93 PEPM paper)

Specialization units:

Specialization points are points reached by several static memory
states [Jones:1993:PartialEvaluation,
Bulyonkov:1996:PracticalAspects].

Our unit partitioning is vaguely similar to block specialization
[Bulyonkov:1996:PracticalAspects].  They insert explicit block
statements around statement sequences with the same "static memory
state" (think context).  Hence, their blocks begin at specialization
points.  Their approach is more restricted because of their
breadth-first "linear traversal" specialization algorithm.  They
insert specialization points when they need to delay specialization of
some statements to satisfy their (self-induced) constraints.  They
also disallow gotos into compound statements.  Configuration analysis?

Control over specialization:

Termination:

"Conditional markers" on their specializer's stack of calls being
specialized indicate "speculative" specialization.  "If there is a
conditional marker before the most recent call to the same closure, it
means that the call about to be made is speculative, and that
unfolding it may lead to specializer loops that don't correspond to
runtime loops.  When such a call is detected, Fuse elects to
specialize rather than unfold.  Abstraction of arguments occurs at
this point..."  [WeiseDanie1991a].

Tempo
	- polyvariant function division
	- polyvariant function specialization
	- special hack to unroll simple loops
	- can't deal with both static and dynamic loop exits
	- promotions of parameters or globals only at certain function
	  entries listed as entry points in an input file to the BTA;
	  variables to be promoted are annotated as static
	- first converts to structured code
	- also extracts templates
	- also produces generating extensions
	- performs compile-time specialization as well
	- interprocedural analysis
	- partially static structures
	- alias and side-effect analysis
	- static return values of residual calls

Schism

Fabius

Cmix
	- polyvariant specialization of C functions
	- polyvariance at program points: at control flow merges splitting is 
	  done unless C-Mix can determine that the static store is "similar"
	  that allows polyvariance after dynamic ifs. There is no mention of 
	  a reachability analysis, so I am not sure how this works in detail.
	  There's an example where C-Mix multi-way unrolls a binary search 
	  function, though, so it seems to work..
	- no polyvariant division 
	- no partially static data structure (structs must be 
	completely dynamic / static) - what about array splitting?

`C
	- requires use of a new language to use it; can't be used to
	  optimize existing C programs with run-time code generation
	- difficult to debug
	- difficult to statically optimize across tick expressions
	- cannot unroll loops with early exits (breaks)
	- `C programs manipulate code; DyC programs manipulate data
	- we can generate arbitrary code graphs (even ones with merges
	  and loops, thanks to our cache lookups and backpatching) by
	  unrolling an interpreter loop for a language with branches
	- with specialized varargs, we can do anything they can do and
	  more by specializing a simulator of their system
	  [Ershov:1982:MixedComputation]
	- run-time inlining through function pointers would lend a
	  more straightforward emulation
	- in our target applications, our system is more
	  straightforward to use (just a few make_static annotations)