Related Work Tempo [Consel:1996:POPL], a compile-time and run-time specialization system for C, is most similar to DyC, but there are numerous and substantial differences. Tempo is a mature system that provides polyvariant function specialization and division, performs alias and side-effect analysis, performs interprocedural analysis, and deals with partially static structures. The two systems differ in the degree of programmer control offered, annotation strategies, in the granularity of specialization and division, in support for caching specialized code, in support for lazy specialization, in the way aliases and partially static structures are handled, in the way unstructured code is handled, in support for interprocedural analysis and separate compilation, in optimization of static and dynamic code, and in implementation strategies. In Tempo, the programmer annotates which parameters and global variables to make static at run time when certain entry-point functions are called. These entry points are listed in a file provided as input to the BTA in addition to the source program, rather than inlined with the program text as in DyC. The programmer also must provide information about aliasing outside of the region of specialization. DyC's annotations support the equivalent of entry points at any program point. DyC requires no information about code outside the region of specialziation, but requires annotations to make pointer dereferences static. Tempo's alias analysis can be unsound, but does much more than ours. On the other hand, our annotations permit specialization of pointer data structures without accurate alias information. DyC's function annotations provide an analogous solution to the absence of interprocedural analysis in our system. They permit interprocedural specialization and work as well with separate compilation. We plan to add alias analysis and interprocedural analysis to DyC, but the annotations will continue to be useful in cases where the analyses fail to collect sufficiently accurate information. Because of the disadvantages of CPS conversion [Jones:1993:PartialEvaluation], and based on study of our target applications, we feel that program-point specialization is particularly important for C programs. In addition to its ability to specialize functions, Tempo only has the capability to unroll simple loops. Our first-generation system also implemented this capability in an ad-hoc manner [APCEB96]. Cmix was a partial-evaluation system for C that provided polyvariant program-point specialization for C [Andersen:1992:Self-ApplicableC]. Cmix handled unstructured code, but it seems to have lacked reachability analysis [Andersen:1994:PhDThesis]. Tempo applies Erosa's and Hendren's goto-elimination algorithm to convert unstructured code to a structured equivalent [ErosaHendren94,Consel:1996:AUniform]. This approach introduces a number of additional tests and may also introduce loops. We feel that direct support of unstructured control flow is more appropriate for C. We also feel that polyvariant program-point division is important, especially when combined with our program-point annotations make_static and make_dynamic to effect conditional specialization. Others [Consel:1996:AUniform,Meyer:1991:Techniques] worried that polyvariant program-point specialization and division could cause code explosion, but instead of not supporting these powerful features, we put them under control of the programmer. Schism's filters permitted choices about whether to unfold or residualize a function and which arguments to generalize, given binding times for the functions parameters [Consel:1993:ATour]. Because filters are executed by the BTA, only binding-time information can be used to make decisions, not the values of static or dynamic variables, which can be used in specialization decisions in DyC. Filters can be used, for example, to prevent unbounded unfolding and unbounded specialization. Both off-line partial evaluators such as Schism and on-line specializers such as Fuse [WeiseDanie1991a] look for dynamic conditionals as a signal that unbounded unfolding or specialization could result, and then must halt. Run-time specializers have an additional option, which is to temporarily halt specialization when dynamic conditionals are found in potential cycles and insert lazy callbacks to the specializer, which is what DyC does. Tempo does not permit caching of multiple specialized versions of code. Fabius, a compiler and run-time specializer for a pure subset of ML, performs caching of multiple versions, but does not provide control over caching policies [LeoneMarka1995b]. As does DyC, Tempo also constructs and optimizes templates for dynamically generated code and uses them to produce generating extensions that, when executed, produce specialized code at run time. However, they block many inter-template optimizations to preserve template integrity. We need only restrict template instructions with holes to their units, and prevent their speculation above static branches. Fabius does not optimize templates separately. We have expended much more effort optimizing the specialized run-time specializers. `C extends the ANSI C language to support dynamic code generation in an imperative style [EnHsKa96]. The programmer must specify code to be generated at run time, substitute run-time values and combine code fragments, perform certain optimizations, invoke the run-time compiler, manage code reuse and code-space reclamation, and ensure correctness [LeoneMarka1996a]. They claim that their flexibility is unique in giving the programmer the ability to specify and compose arbitrary expressions and statements at run time; however, it is not. A declarative system such as DyC can do so as well, requiring much less effort from the programmer, and also facilitating debugging and ensuring correctness (if the safe subset of the annotations is used). Tempo uses `C as one of its back-end code generators [Consel97], while DyC could essentially serve as a back end to `C by specializing an ICODE interpreter [PoEnKa97]. Generation of new function signatures is handled by specializing functions with variable numbers of arguments. A more direct emulation of `C would be possible with the addition to our system of run-time inlining through function pointers, but we stress that, in practice, only a few annotations are necessary to achieve an effect comparable to common `C idioms. For example, multi-way loop unrolling can be achieved with a single annotation on the loop induction variable. In fact, DyC has several capabilities that `C does not. `C cannot multi-way unroll loops with dynamic exit tests because jumps to labels in other tick expressions are not permitted. Also, because portions of dynamically generated functions cannot be recompiled in `C, we cannot see how they could unroll the interpreter supporting the computed goto shown in section 2. Moreover, DyC's nested and overlapping dynamic regions are not matched by nested tick expressions in `C. Furthermore, better static optimization of dynamic code because control-flow is more easily determined. This could be one reason why `C now applies limited partial evaluation to tick expressions, including dead-code elimination and unrolling of simple static loops [PoEnKa97]. They even plan to add a binding-time analysis, which would yield a hybrid declarative/imperative run-time code generation system. We could implement any of the proposed run-time register allocation strategies (e.g., ICODE [PoEnKa97], Fabius [LeoneMarka1995b], or register actions [Wall86,APCEB96]), and select among them or select not to perform run-time register allocation through new policies in the annotations. The bottom line, however, is that DyC and other declarative systems can be applied to existing programs in a straightforward manner, whereas imperative systems such as `C require substantial rewriting of code. [Unfortunately, I couldn't locate any concrete, citable observation in this paper, so perhaps we shouldn't mention it. -bkg] We therefore conclude, as did Ershov 15 years ago [Ershov:1982:MixedComputation], that the declarative approach to code generation is more general than the imperative approach. ********************************************************************** Old text and notes: While the ability to specialize at compile time is an advantage over our system, Tempo's goal to treat compile-time and run-time specialization uniformly restricts their treatment of run-time specialization. For example, arbitrary dynamic-to-static promotion of variables, lazy specialization, and dynamic caching strategies are not compatible with compile-time specialization, and so cannot be exposed to the programmer. Dead-variables problem: Specializing on variables that are no longer used [Jones:1993:PartialEvaluation]. Unstructured control flow: Tempo applies the algorithm from [ErosaHendren94]. Could the algorithm presented in [Sakharov:1996:Specialization] be used to do reachability analysis? Polyvariant specialization: Worries about code explosion due to program-point polyvariant specialization in [Meyer:1991:Techniques]. Cmix does polyvariant program-point specialization. Tempo does not. Polyvariant division: Worries about code explosion due to program-point polyvariant division in [Consel:1996:AUniform]. Mix did polyvariant program-point division. Tempo, Cmix, Fabius, etc. do not. CPS: [Consel:1991:ForABetter]. Incremental specialization: By "incremental", Consel meant at several stages in an OS. For example: -- compile time -- link time -- load time -- boot time -- run time So, this is a little different from our lazy specialization, which is all at run time. (from his `93 PEPM paper) Specialization units: Specialization points are points reached by several static memory states [Jones:1993:PartialEvaluation, Bulyonkov:1996:PracticalAspects]. Our unit partitioning is vaguely similar to block specialization [Bulyonkov:1996:PracticalAspects]. They insert explicit block statements around statement sequences with the same "static memory state" (think context). Hence, their blocks begin at specialization points. Their approach is more restricted because of their breadth-first "linear traversal" specialization algorithm. They insert specialization points when they need to delay specialization of some statements to satisfy their (self-induced) constraints. They also disallow gotos into compound statements. Configuration analysis? Control over specialization: Termination: "Conditional markers" on their specializer's stack of calls being specialized indicate "speculative" specialization. "If there is a conditional marker before the most recent call to the same closure, it means that the call about to be made is speculative, and that unfolding it may lead to specializer loops that don't correspond to runtime loops. When such a call is detected, Fuse elects to specialize rather than unfold. Abstraction of arguments occurs at this point..." [WeiseDanie1991a]. Tempo - polyvariant function division - polyvariant function specialization - special hack to unroll simple loops - can't deal with both static and dynamic loop exits - promotions of parameters or globals only at certain function entries listed as entry points in an input file to the BTA; variables to be promoted are annotated as static - first converts to structured code - also extracts templates - also produces generating extensions - performs compile-time specialization as well - interprocedural analysis - partially static structures - alias and side-effect analysis - static return values of residual calls Schism Fabius Cmix - polyvariant specialization of C functions - polyvariance at program points: at control flow merges splitting is done unless C-Mix can determine that the static store is "similar" that allows polyvariance after dynamic ifs. There is no mention of a reachability analysis, so I am not sure how this works in detail. There's an example where C-Mix multi-way unrolls a binary search function, though, so it seems to work.. - no polyvariant division - no partially static data structure (structs must be completely dynamic / static) - what about array splitting? `C - requires use of a new language to use it; can't be used to optimize existing C programs with run-time code generation - difficult to debug - difficult to statically optimize across tick expressions - cannot unroll loops with early exits (breaks) - `C programs manipulate code; DyC programs manipulate data - we can generate arbitrary code graphs (even ones with merges and loops, thanks to our cache lookups and backpatching) by unrolling an interpreter loop for a language with branches - with specialized varargs, we can do anything they can do and more by specializing a simulator of their system [Ershov:1982:MixedComputation] - run-time inlining through function pointers would lend a more straightforward emulation - in our target applications, our system is more straightforward to use (just a few make_static annotations)