******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: PEPM `97 Paper Date: Thu, 30 Jan 1997 14:17:19 PST From: Brian Grant This message will create a discussion list named pepm97. Please direct messages about the paper to that list. Susan called. Here are some of her comments about the submission. -- Too long -- Focus -- Annotations -- BTA -- Contributions -- wrt. PE (respecialization? intraprocedural?) -- wrt. DC (?) -- More -- Motivation -- Examples -- Definitions -- polyvariant specialization and division -- Explanation -- Nested/overlapping regions -- Respecialization -- Sub-function annotations -- Appendix of annotations -- Syntax -- Concise description -- Should omit -- Tangents -- Digressions into `C, PE vs. DC, off-line vs. on-line, etc. => Discuss each topic only in one place and keep to the point -- Excessive forward references -- Policy details -- Cache replacement policies -- Weaker specialization -- Some implementation details -- Implementation discussion about units/backend (save for another paper) -- Results (save for another paper) Something that Craig mentioned at the meeting was that we should think about separating concepts that are separable so that we can better claim that our annotations are in some way "primitives" for PE-based DC. For example, we should separate mono- vs. polyvariant specialization and division, and also automatic dynamic-to-static promotion. We could maintain our current annotations as syntactic sugar for the more primitive ones. --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: specializer pseudo-code Date: Fri, 14 Mar 1997 11:50:33 PST From: Brian Grant Here is my specification of the run-time actions and their arguments. The pseudo-code is somewhat baroque and probably needs to be debugged and simplified, but I think it is a good start. I couldn't figure out how to live without my "ContextBuild" primitive in this first, very detailed pass. I'm assuming none all of these primitives is specialized in this version. That is, I specify all necessary arguments and leave specialization of the primitive as an optimization. Comments/questions welcome. --Brian ****************************************************************************** /***************************************************************************** Type definitions *****************************************************************************/ /* Per-operation and per-operand binding time annotation */ Type BindingTime { Static | Dynamic }; /* Inter-unit edge classification */ Type SpecializeTime { Eager | Lazy }; /* Whatever type of cache policy we decide to allow */ Type CachePolicy { /* CacheAll, Cache(n), replacement policy? */ }; /* This is what information we need about a Unit */ Type Unit { Int Id; /* Identifier */ Set ContextType; /* Set of variables in context */ CachePolicy Policy; /* Cache policy */ DAG CFG; /* DAG of instructions (some IR) */ List Successors; /* List of successor units */ }; Type BB { List Instructions; }; Type Instruction { Operation Op; List Operands; List Results; BindingTime OpBT; List OperandBTs; List ResultBTs; . . /* whatever other information is needed for codegen */ . }; Type Variable { Name VariableName; DataType VariableType; . . /* whatever other information is needed for codegen */ . }; /* An Operand can either be a variable or a constant */ Type Operand { Variable | Value }; /* Lists of Operands are useful for varargs */ Type EmitOperand { Operand | List }; /* Name->Value mapping */ Type Definition { Name VariableName; Value VariableValue; }; Type Operation { Call | Jump | Beq | Add | ... }; /* Abstract inter-unit edge information */ Type Edge { Unit Destination; SpecializeTime Policy; /* Eager or lazy */ List VariablesPromoted; /* D->S promos occur on edges */ }; /* Information needed about an instance of an inter-unit edge */ Type BranchEdge { Address BranchAddr; Int SuccessorNum; }; /***************************************************************************** Primitive declarations/definitions *****************************************************************************/ Function CacheLookup Input: Int UnitId; Set Context; Output: Bool UnitFound; Address UnitAddress; Function CacheInsert Input: Int UnitId; Set Context; Address UnitAddress; Output: Void Function EmitInstruction Input: Operation Op; List Operands; List Results; Output: Void (side-effects the writable instruction space and pointer) Function ReduceAndResidualize Input: DAG CFG; Set StaticVariables; Output: Address ResidualAddr; List OutBranches; Set UpdatedStaticVariables; Function BackPatch Input: Address BranchAddr; Int SuccessorNum; Address DestinationAddr; Output: Void Function ContextBuild Input: Set StaticVariables; Set NewContextType; List PromotedVariables; Output: Set NewContext; Function Specialize Input: Unit Unit; Bool Patch; Address PredecessorAddr; Int SuccessorNum; List StaticVariables; Output: Address SuccessorAddr; { (Found, UnitAddr) = CacheLookup(Unit.Id, Unit.ContextType, StaticVariables); If (! Found) { (UnitAddr, OutBranches, UpdatedStaticVariables) = ReduceAndResidualize(Unit.CFG, StaticVariables); CacheInsert(Unit.Id, Unit.ContextType, StaticVariables, UnitAddr); If (Patch) { BackPatch(PredecessorAddr, SuccessorNum, UnitAddr); } /* The Edge and BranchEdge lists must be correlated */ Foreach (Edge, BranchEdge) In (Unit.Successors, OutBranches) { If (Edge.Policy == Eager) { Specialize(Edge.Destination, True, BranchEdge.BranchAddr, BranchEdge.SuccessorNum, ContextBuild(UpdatedStaticVariables, Edge.Destination.ContextType, ())); } Else { If (! Empty(Edge.VariablesPromoted)) { EmitInstruction(Call, (ContextBuild, UpdatedStaticVariables, Edge.Destination.ContextType, Edge.VariablesPromoted), (Variable("NewStaticVariables", List))); EmitInstruction(Call, (Specialize, Edge.Destination, False, BranchEdge.BranchAddr, BranchEdge.SuccessorNum, Variable("NewStaticVariables",List)), (Variable("NextPC", Address))); } Else { EmitInstruction(Call, (Specialize, Edge.Destination, True, BranchEdge.BranchAddr, BranchEdge.SuccessorNum, ContextBuild(UpdatedStaticVariables, Edge.Destination.ContextType, ())), (Variable("NextPC", Address))); } EmitInstruction(Jump, (Variable("NextPC", Address)), ()); } } } Else { If (Patch) { BackPatch(PredecessorAddr, SuccessorNum, UnitAddr); } } Return UnitAddr; }; ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: New version of the specializer code with accompanying declarations Date: Mon, 17 Mar 1997 16:44:48 PST From: Brian Grant This is a multipart MIME message. --boundary=_0 Content-Type: text/plain; charset="us-ascii" -------- This version separates static information from dynamic information and partitions the list of static variables into the context list and other live static variables. Hence, ContextBuild is now ValueListBuild because it is used to build both lists. I've also included support for unchecked units. Let me know what you think. --Brian --boundary=_0 Content-Type: text/plain; charset="us-ascii" Content-Description: specializer /***************************************************************************** Type definitions *****************************************************************************/ /* Unit * * This is what information we need about a Unit. */ Type Unit { Int Id; /* Identifier */ /* These must be lists so that we can correlate names with values */ List ContextVariables; /* Set of variables in context */ List LiveVariables; /* Set of incoming live static variables not in context */ List StaticVariables; /* Set of static variables known following unit setup computations */ CachePolicy Policy; /* Cache policy */ DAG CFG; /* DAG of instructions (some IR) */ List Successors; /* List of successor units */ }; /* Edge * * Abstract inter-unit edge information. */ Type Edge { Unit Destination; Int SuccessorNum; /* which successor of the branch */ SpecializeTime Policy; /* Eager or lazy */ List VariablesPromoted; /* D->S promos occur on edges */ }; /* SpecializeTime * * Inter-unit edge classification. */ Type SpecializeTime { Eager | Lazy }; /* CachePolicy * * Whatever type of cache policy we decide to allow. */ Type CachePolicy { /* CacheAll, Cache(n), replacement policy? */ }; /* BB * * IR for a basic block. */ Type BB { List Operations; . . /* whatever other information is needed for codegen */ . }; /* Operation * * IR for an operation. */ Type Operation { OpType Op; List Operands; List Results; BindingTime OpBT; List OperandBTs; List ResultBTs; . . /* whatever other information is needed for codegen */ . }; /* BindingTime * * Per-operation and per-operand binding time annotation. */ Type BindingTime { Static | Dynamic }; /* Variable * * IR for a variable. */ Type Variable { Name VariableName; DataType VariableType; . . /* whatever other information is needed for codegen */ . }; /* Operand * * An Operand can either be a variable or a constant. */ Type Operand { Variable | Value }; /* OpType * * Type of operations in the IR. */ Type OpType { Call | Jump | Beq | Add | ... }; /* EmitOperand * * Lists of Operands are useful for varargs. */ Type EmitOperand { Operand | List }; /***************************************************************************** Primitive declarations/definitions *****************************************************************************/ /* CacheInsert * * Inserts the location of a version of a unit into the cache with the key * (UnitId, Context). */ Function CacheInsert Input: Int UnitId; List Context; Address UnitAddress; CachePolicy Policy; Output: Void /* side-effects the code cache */ /* CacheLookup * * Retrieves the location from the cache of the code keyed by (UnitId, Context). */ Function CacheLookup Input: Int UnitId; List Context; Output: Bool UnitFound; Address UnitAddress; /* RequiresLookup * * Queries cache policy to determine whether a cache lookup is needed or not. */ Function RequiresLookup Input: CachePolicy Policy; Output: Bool Required; /* ValueListBuild * * Constructs a list of values of the static variables listed in NewVariables. * The variables listed in NewVariables must be present in either the * OldVariables or AdditionalVariables list (but not in both). The list * of values, NewValues, is then constructed from the corresponding values in * OldValues and AdditionalValues. */ Function ValueListBuild Input: List OldVariables; List OldValues; List AdditionalVariables; List AdditionalValues; List NewVariables; Output: List NewValues; /* EmitOperation * * Emits machine instruction(s) corresponding to the specified operation. */ Function EmitOperation Input: OpType Op; List Operands; List Results; Output: Address OperationAddr; /* also side-effects the writable instruction space and pointer */ /* ReduceAndReisdualize * * Residualizes the specified unit using the values of the static variables * provided by Context and OtherLiveVariables. Returns the address of the * residual code, a list of branches to other units that require patching * (in the same order as Unit.Successors), and the values of all known * static variables (because the static variables are in SSA form, one list * suffices for all out edges). */ Function ReduceAndResidualize Input: Unit Unit; List Context; List OtherLiveVariables; Output: Address ResidualAddr; List
OutBranches; List KnownStaticVariables; /* also side-effects the writable instruction space and pointer */ /* BackPatch * * Patch the specified branch, if needed. Passing additional information, * such as jump-table location (if applicable) or information about other * successors, could speed up this routine. */ Function BackPatch Input: Address BranchAddr; Int SuccessorNum; Address DestinationAddr; Output: Void /* also side-effects the writable instruction space */ /* Specialize * * Residualize the specified unit and all eager successors if the unit * isn't in the cache. Emit callbacks for all lazy successors. Callbacks * are spliced out, but the instruction cache need not be flushed because * the jump following the callback is still valid. */ Function Specialize Input: Unit Unit; Bool Patch; Address BranchAddr; Int SuccessorNum; List Context; List OtherLiveVariables; Output: Address UnitAddr; /* also side-effects the writable instruction space and pointer */ { If (RequiresLookup(Unit.Policy)) { (Found, UnitAddr) = CacheLookup(Unit.Id, Context); } Else { /* If no lookup is required, we'll always generate the code */ Found = False; } If (! Found) { (UnitAddr, OutBranches, KnownStaticVariables) = ReduceAndResidualize(Unit, Context, OtherLiveVariables); CacheInsert(Unit.Id, Context, UnitAddr, Unit.Policy); If (Patch) { BackPatch(BranchAddr, SuccessorNum, UnitAddr); } /* The Edge and BranchAddr lists must be correlated */ Foreach (Edge, BranchAddr) In (Unit.Successors, OutBranches) { If (Edge.Policy == Eager) { Specialize(Edge.Destination, True, BranchAddr, Edge.SuccessorNum, ValueListBuild(Unit.StaticVariables, KnownStaticVariables, (), (), Edge.Destination.ContextVariables), ValueListBuild(Unit.StaticVariables, KnownStaticVariables, (), (), Edge.Destination.LiveVariables)); } Else { If (! Empty(Edge.VariablesPromoted)) { /* This just takes a list of variables and produces a list of their names */ PromotedNames = Map(Edge.VariablesPromoted, .Name); /* Incorporate the promoted variables into the context */ CallbackAddr = EmitOperation(Call, (ValueListBuild, Unit.StaticVariables, KnownStaticVariables, PromotedNames, Edge.VariablesPromoted, Edge.Destination.ContextVariables), (Variable("NewContext", List, ...))); /* Generate the callback to the specializer */ Nil = EmitOperation(Call, (Specialize, Edge.Destination, ! RequiresLookup(Edge.Destination.Policy), BranchAddr, Edge.SuccessorNum, Variable("NewContext", List, ...), ValueListBuild(Unit.StaticVariables, KnownStaticVariables, (), (), Edge.Destination.LiveVariables)), (Variable("NextPC", Address))); } Else { /* Generate the callback to the specializer */ CallbackAddr = EmitOperation(Call, (Specialize, Edge.Destination, True, BranchAddr, Edge.SuccessorNum, ValueListBuild(Unit.StaticVariables, KnownStaticVariables, (), (), Edge.Destination.ContextVariables), ValueListBuild(Unit.StaticVariables, KnownStaticVariables, (), (), Edge.Destination.LiveVariables)), (Variable("NextPC", Address, ...))); } /* Emit the jump to the residual */ Nil = EmitOperation(Jump, (Variable("NextPC", Address, ...)), ()); /* Link in the callback */ BackPatch(BranchAddr, SuccessorNum, CallbackAddr); } } } Else { /* Back-patching may be required even if we find the code in the cache */ If (Patch) { BackPatch(BranchAddr, SuccessorNum, UnitAddr); } } Return UnitAddr; }; --boundary=_0-- ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@cs Subject: PEPM `97 web page alive again Date: Thu, 20 Mar 1997 10:05:48 PST From: Brian Grant http://www.cs.washington.edu/research/projects/unisw/DynComp/www/Internal/Documentation/DyC/Papers/PEPM97/ ******************************************************************************** ******************************************************************************** ******************************************************************************** To: matthai@thistle.cs.washington.edu cc: spin-comp@thistle.cs.washington.edu Subject: clustering Date: Thu, 20 Mar 1997 13:29:20 PST From: Brian Grant Matthai, We decided that we would like to put at least a simple clustering algorithm in the paper, and that you would continue to be in charge of working on it. We also decided a few other things wrt clustering: -- Clustering will run after the BTA and incrementally update the output of the BTA (unless computing the updates turns out to be too complicated). It looks like clustering will then go in the GEgen phase and be part of the unit identification process. -- You only have to worry about moving makeStatics and makeDynamics. A prepass to the BTA will use liveness and use-def information to insert makeStatics after every may def pt. of automatically promoted variables with uses in the dynamic region and makeDynamics after last uses. This may prove tricky because the prepass would have to determine the boundaries of the dynamic region. The BTA will remove redundant makeStatics and makeDynamics, leaving makeStatics at every possible dynamic-to-static promotion point and makeDynamics where needed. I believe Markus is working on this. -- makeStatics and makeDynamics should not be moved across uses of the variables they annotate. -- The clustering algorithm should cluster makeStatics and makeDynamics with each other and with immovable unit boundaries: discordant merges, nodes with at least one lazy incoming edge, merges with incoming edges from different units. The last of these is tricky because depending on how makeStatics and makeDynamics are moved, more such merges could be introduced. -- makeStatics should generally be moved below defs. makeDynamics could be moved in either direction across defs to accomplish clustering. -- Moving mSs and mDs across branches in merges can greatly increase or decrease the number of unit boundaries. It isn't clear what should be done. Some kind of control/data dependence graphs may help (e.g., in moving across innocuous if/then/else diamonds). There are similarities to global code motion. I've updated my outline to reflect these changes: http://www.cs.washington.edu/research/projects/unisw/DynComp/www/Internal/Documentation/DyC/Papers/PEPM97/final/outline3.txt --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: chambers@franklin.cs.washington.edu cc: spin-comp@franklin.cs.washington.edu Subject: Interpreter example annotated for selective specialization Date: Tue, 25 Mar 1997 13:05:29 PST From: Matthai Philipose Craig, Markus said you'd like the interpreter example with annotations for selective specialization. I've appended the code to this mail. The example is copied from the final PEPM submission. I have just added a case for function invocation. Matthai ----------------- interpret(byteCodeT byteCodes[], int argList[], int numArgs) { int valueStack[STACK_SIZE]; int pc = 0, vsp = 0; int arg, opcode; /* push arguments */ for (arg = numArgs; arg > 0; arg--) valueStack[vsp++] = argList[arg-1]; /*DyC annotations for polyvariant specialization on bytecodes and pc at every downstream use*/ makeStatic(bytecodes : transitive); makestatic(pc); for (;;) { opcode = byteCodes[pc]; switch (opcode) case ADD: valueStack[vsp-1] = valueStack[vsp] + valueStack[vsp-1]; vsp--; pc++; break; case NEQ: pp0: if (valueStack[vsp] != 0) pc = byteCodes[pc+1]; else pc += 2; pp1: break; case IGOTO: pp2: pc = valueStack[vsp--]; break; // This is the new case added... the opcode for function invocation case FUNCALL: pc = byteCodes[pc+1]; //Get the function to jump to incrementNumCalls(pc); //Keep track of how many times this fn has been invoked if (NumCalls(pc) < MIN_NUM_CALLS) //Don't specialize infrequent functions makeDynamic(pc); break; /* ... */ case RETURN: return valueStack[vsp]; } } } ----------------- ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: "incremental" specialization Date: Wed, 26 Mar 1997 13:59:37 PST From: Brian Grant By "incremental", Consel meant at several stages in an OS. For example: -- compile time -- link time -- load time -- boot time -- run time So, this is a little different from our lazy specialization, which is all at run time. (from his `93 PEPM paper) --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: "Charles Consel" cc: spin-comp@thistle.cs.washington.edu Subject: Tempo's BTA and PEPM `97 deadline Date: Wed, 26 Mar 1997 15:09:14 PST From: Brian Grant Charles, We're writing the related-work section for our PEPM paper, and just wanted to confirm a few details of Tempo's BTA and action analysis. We picked up some understanding of what Tempo does during your visit, but your visit was too short. :-) We would appreciate short answers to the following questions. Not all of these issues may come up in the final version of our paper, but we would like to include some kind of concrete comparison between Tempo and our system. We think we understand the alias analysis, handing of partially static structures, etc., but there are a few general points on which we're still a little uncertain. -- What kind of polyvariant division does Tempo perform? -- Multiple divisions per function? My understanding: possibly for those functions annotated as entry points -- Multiple divisions per program point? My understanding: only at joins following conditionals with static tests; is this done for switch statements as well as if statements? -- What kind of polyvariant specialization does Tempo perform? -- How does Tempo pick up run-time values and use them for specialization? My understanding: from globals or parameters of functions annotated as entry points; are the values of static globals read when an associated function (an entry point) is specialized? -- Multiple specializations per function? My understanding: only for functions not annotated as entry points? -- Multiple specializations per program point? My understanding: only simple loop unrolling, not for arbitrary control-flow merges; e.g., can a loop such as the following be reduced? red ev ev id ev reb reb ev ev for ((i = 0) ; (i < size) ; v [i ] ? (i++) : (i+=2) ) { ... } (Basically, the static update of the loop induction variable is guarded by a dynamic conditional.) Also, what do you do about loops with both static and dynamic exit tests (multiple loop exits)? Also, at what time next Thursday are the papers due? Can we send the copyright form early and submit the final version of the paper electronically on April 3rd, or should that type of thing only be done as a last resort? Thank you for your help. I look forward to seeing you in Amsterdam. --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: GEgen (section 6) Date: Thu, 27 Mar 1997 16:51:21 PST From: Brian Grant I've written up some details about units and unit identification. Tonight, I'll work more on writing up the production of generating extensions. This is what I have so far. It is also on our PEPM97 web page. Section 6: Title (GEgen)? Steps we perform: 1) Generate new functions for different divisions of specialized functions (and create Function structs) and split merges with multiple divisions, yielding monovariant divisions 2) Unit identification 3) Separate setup and template code 4) Insert calls to the specializer at dynamic-to-static promotion points on region entries from static code 5) Produce optimized generating extensions Unit Identification ---------------------------------------------------------------------- What are specialization units? -- Specialization units are the largest subgraphs that the specialization algorithm can manipulate. -- The specializer makes decisions about which code to specialize at the granularity of units, and consequently cache insertions, deletions, and lookups also operate on units. -- Cache lookups ensure that specialized code is reused/shared if it exists (important at promotions, demotions, and discordant merges) What do specialization units look like? -- Roughly, a unit begins at a specialization point (this is a term used in the literature, e.g., by Bulyonkov and Kochetov `96; they cite the JGS93 book), which is a point of which the specializer should be aware (to lookup and/or begin specialization) -- Units differ from basic blocks in that they can contain calls and multiple exits. -- A unit must have a single entry -- If a predecessor to a node has a different context (and, hence, must be from a different unit), a cache lookup will be required either at run time (if there are promotions) or at specialization time (if there are only demotions), and so the node must start a new unit even if it has other predecessors with the same context -- Even if all predecessors to a node have the same context as the node, but they don't all belong to the same unit (e.g., due to "voluntary" laziness), then the node must start a new unit because it must be possible to generate code for that node if it is reached by any of its predecessors (remember that the specializer only generates code for whole units) -- Even if you know that a particular predecessor will be specialized before the node is reached, when the other predecessors are specialized, you need to be able to find the node in order to link (backpatch) the branches to it. Thus, some kind of context-dependent lookup is required, which is essentially a cache lookup. For example: If some node d in the unit of one of the predecessors of node n dominates all of the predecessors and there exists downward paths (not traversing back edges) from d to each of the predecessors that do not contain lazy edges, then n _could_ be included in the same unit as d A concrete instance of this: make_static(x); L1: ...x... if (d1) { ...x... } else { if (d2) { make_static(y); L2: ...y... make_dynamic(y); } L3: ...x... } L4: ...x... make_dynamic(x); The node marked L4 could be included in the same unit as the node marked L1. But, without a cache lookup, how would the specializer link the bottom of L3 to L4? Simple unit identification -- A node begins a new unit if any of the following is true: -- At least one incoming edge has dynamic-to-static promotions (explicit in BTA output) -- At least one incoming edge has static-to-dynamic "demotions" (difference of context sets) -- At least one incoming edge is lazy (explicit in BTA output) -- It is a discordant merge (explicit in BTA output) -- It has predecessors from different units (for single-entry) -- Build unit structs (input to specializer), including id, context variables, other live variables, all static variables, cache policy, successor edge information (dest. unit, successor number, eager/lazy, promotions), and call information (in the interprocedural version), and make demotions explicit as well for clustering Clustering Matthai is writing up the algorithm. Here is the basic idea. Clustering attempts to reduce the number of units by moving, merging, and/or eliminating unit boundaries. It does so by moving the promotions and demotions within the CFG. If promotions and demotions are made to coincide on the same edges with each other or with lazy edges, or if promotions and demotions are moved to edges entering discordant merges or multi-unit merges, then the number of unit boundaries is reduced. Discordant merges and multi-unit merges can be eliminated as well by pushing promotions or demotions past such merges. Clustering must update the information in the unit structs. Produce optimized generating extensions ---------------------------------------------------------------------- -- DAG construction via splitting & merging -- Want to specialize all blocks of a unit that could potentially be reached => pursue all paths under dynamic control -- Want to know order of traversal of the blocks at compile time -- Better optimization of static code -- Eliminate run-time work-list overhead -- So, build a DAG of blocks reached -- Difficult because of -- Unstructured code -- Combinations of static and dynamic branches -- Loops / back edges -- Solution -- Pursue all static paths (static paths are determined by which static branches are taken) -- Copy blocks when they are reached in different orders on different static paths -- Algorithm - should we put the full algorithms in the appendix given that they are untested? how much detail in the body? -- Splitting -- Split all static paths -- Add nodes to a given static path as they are reached; topologically order nodes on different dynamic paths -- Merging Merge the tails of branches of the tree created by splitting -- Create a dummy node and link it to all of the leaves of the unit's tree -- Add the sink to a work list -- Until the work list is empty, remove the head of the list. Compare the sources of all incoming edges into the node. If two source nodes represent the same block and they have the same successors in this graph, then merge the two nodes and add the merged node to the work list if it isn't there already. -- Eliminate unit boundaries for eager successors with single run-time predecessors, and for those with multiple predecessors but the replicate policy, statically copying code for those with multiple compile-time predecessors -- Eliminate cache lookups for lazy successors with no dynamic-to-static promotions and single run-time predecessors or multiple predecessors and the replicate policy -- What about unchecked eager promotions for static pointer derefs? -- Further optimizations to the run-time specializer -- In-line and specialize the run-time primitives -- Everything in the Unit structures is known at compile time -- Recursion is implemented with a stack and jump (tail recursion!) -- Eliminate stack manipulation for all first eager successors -- Saving/restoring of state and argument passing can be specialized in many cases as well (e.g., at lazy callbacks) -- We don't explicitly pass around a set of all live static variables; we load/store them from/to the cache -- Hybrid caching scheme? -- Use hierarchy when adding to the context -- Reset to top of hierarchy when remove from context ******************************************************************************** ******************************************************************************** ******************************************************************************** To: chambers@klamath (Craig Chambers) cc: spin-comp@cs Subject: Re: annotations section Date: Thu, 27 Mar 1997 22:56:03 PST From: Brian Grant >> OK, I've written one more section, section 4 on the annotations. It might be a good idea to have some simple examples (e.g., reiterate that in order to unroll a loop, the induction variable should be annotated with make_static). In the PLDI submission draft and in various talks, people have had difficulty developing an intuition for how to use the annotations. Presumably, the PEPM crowd should do better, but a trivial example or two couldn't hurt. You've done a pretty good job spelling out all of the policies. The promote_one_unchecked_eager policy is a good idea. However, if the annotation is on a variable other than the pointer itself, it isn't clear to me how it would work. Other promotions work by inserting a lazy point after dynamic computations are performed. This one needs to instead make certain dynamic computations (i.e., the load) and values (the result of the load) static, and the computations don't involve the annotated variable. Am I missing something? Also, IMO, we shouldn't make it sound like we think alias analysis is a bad idea. I suggest we tone down the discussion about alias analysis a little, though we can certainly say that the static* annotations will still be necessary even with better alias analysis. Another idea for future work: optimization policies controlling, for example, run-time register allocation or register actions. Also, somewhere we need to mention what the default policies are for derived variables (mono_everything and no_promote) and why (to prevent all loops counting up from a constant from being unrolled, etc.). Will that go in the BTA section or the annotations section? There is a question in the text about what the difference is between using specialize and make_static. For concreteness: specialize f(x,y,z) on (x,y); vs. f(x,y,z) { make_static(x,y); ... } 1. specialize only specializes f on x and y if they are static at the call site. A make_static inside f would cause all arguments passed for x and y to be promoted. 2. specialize permits x and y to begin as static in f, without the potentially redundant promotion at the beginning of f. Thus, f may be eagerly specialized after specializing the caller, possibly eliminating a run-time cache check. 3. specialize doesn't have to pass x and y to the specialized version of f. Any other differences? Perhaps we should at least mention something about specialize and function pointers (stubs, calling general versions, topic of future work, whatever). --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: more on section 6 Date: Fri, 28 Mar 1997 09:43:26 PST From: Brian Grant I made a few changes to the text I sent out yesterday, and added more on producing the generating extensions. I included the code I wrote for the splitting algorithm, but I haven't looked at it lately, so I'm not yet convinced that it is safe to include. :-) --Brian Section 6: Title (GEgen)? ====================================================================== Steps we perform ---------------- 1) Generate new functions for different divisions of specialized functions (and create Function structs) and split merges with multiple divisions, yielding monovariant divisions 2) Unit identification 3) Separate setup and template code 4) Insert calls to the specializer at dynamic-to-static promotion points on region entries from static code 5) Produce optimized generating extensions Unit Identification ---------------------------------------------------------------------- What are specialization units? ------------------------------ -- Specialization units are the largest subgraphs that the specialization algorithm can manipulate. -- The specializer makes decisions about which code to specialize at the granularity of units, and consequently cache insertions, deletions, and lookups also operate on units. -- Cache lookups ensure that specialized code is reused/shared if it exists (important at promotions, demotions, and discordant merges) What do specialization units look like? --------------------------------------- -- Roughly, a unit begins at a specialization point (this is a term used in the literature, e.g., by Bulyonkov and Kochetov `96; they cite the JGS93 book), which is a point of which the specializer should be aware (to lookup and/or begin specialization) -- Units differ from basic blocks in that they can contain calls and multiple exits. -- A unit must have a single entry -- If a predecessor to a node has a different context (and, hence, must be from a different unit), a cache lookup will be required either at run time (if there are promotions) or at specialization time (if there are only demotions), and so the node must start a new unit even if it has other predecessors with the same context -- Even if all predecessors to a node have the same context as the node, but they don't all belong to the same unit (e.g., due to "voluntary" laziness), then the node must start a new unit because it must be possible to generate code for that node if it is reached by any of its predecessors (remember that the specializer only generates code for whole units) -- Even if you know that a particular predecessor will be specialized before the node is reached, when the other predecessors are specialized, you need to be able to find the node in order to link (backpatch) the branches to it. Thus, some kind of context-dependent lookup is required, which is essentially a cache lookup. For example: If some node d in the unit of one of the predecessors of node n dominates all of the predecessors and there exists downward paths (not traversing back edges) from d to each of the predecessors that do not contain lazy edges, then n _could_ be included in the same unit as d A concrete instance of this: make_static(x); L1: ...x... if (d1) { ...x... } else { if (d2) { make_static(y); L2: ...y... make_dynamic(y); } L3: ...x... } L4: ...x... make_dynamic(x); The node marked L4 could be included in the same unit as the node marked L1. But, without a cache lookup, how would the specializer link the bottom of L3 to L4? Single-Run-Time-Predecessor Optimizations ----------------------------------------- We can relax the compile-time single-entry requirement for units if we know that a node will have only a single predecessor _at_run_time_. We can eliminate unit boundaries for eager successors with single run-time predecessors (e.g., at a static multi-unit merge where all predecessors have the same context), and for those with multiple predecessors but the replicate policy. Thus, it is possible for multiple units to share the same code. Similarly, we can eliminate cache lookups for lazy successors with no dynamic-to-static promotions and single run-time predecessors (e.g., at lazy successors to dynamic branches) or multiple predecessors and the replicate policy. Simple unit identification -------------------------- -- A node begins a new unit if any of the following is true: -- At least one incoming edge has dynamic-to-static promotions (explicit in BTA output) -- At least one incoming edge has static-to-dynamic "demotions" (difference of context sets) -- At least one incoming edge is lazy (explicit in BTA output) -- It is a discordant merge (explicit in BTA output) -- It has immediate predecessors (predecessors found on back edges are ignored) from different units and it is a non-constant merge and the cache policy is not replicate (for single-entry, called a multi-unit merge) -- Unchecked eager promotions for static dereferences of static pointers do not create new unit boundaries -- Build unit structs (input to specializer), including id, context variables, other live static variables, all known static variables, cache policy, successor edge information (dest. unit, successor number, eager/lazy, promotions), and call information (in the interprocedural version), and make demotions explicit as well for clustering; only CFG information need not be contructed at this point Clustering ---------- Matthai is writing up the algorithm. Here is the basic idea. Clustering attempts to reduce the number of units by moving, merging, and/or eliminating unit boundaries. It does so by moving the promotions and demotions within the CFG. If promotions and demotions are made to coincide on the same edges with each other or with lazy edges, or if promotions and demotions are moved to edges entering discordant merges or multi-unit merges, then the number of unit boundaries is reduced. Discordant merges and multi-unit merges can be eliminated as well by pushing promotions or demotions past such merges. Clustering must update the information in the unit structs. Produce optimized generating extensions ---------------------------------------------------------------------- In order to make run-time generation of dynamic code as efficient as possible, the Specialize function is manually specialized for each specialization unit, creating an optimized generating extension for each unit. All of the information in the unit structs is known at static compile time, and so can be used in specializing the specializer. All of the run-time primitives called by the specializer (see section 2) can be inlined into the body of the generating extension. The bulk of the generating extension is, of course, embodied in the specialized ReduceAndResidualize primitive. The other primitives, when inlined and specialized, incur little additional overhead. Other optimizations ------------------- Recursive calls to the specializer are implemented with a manually manipulated stack of units to specialize and jumps. In a few cases, this stack manipulation may be avoided. For example, when a dynamic branch to another unit appears on all static paths through a unit, the specializer may proceed directly to that successor unit after specializing the first one. We don't explicitly pass around a set of all live static variables; we load/store them from/to the cache as needed. We don't include all live static variables in the context used for cache lookups, but rather only those that trigger polyvariant specialization (promoted and discordant variables). Because derived static variables are monovariantly specialized and not promoted, they never appear in the context. Variables promoted from dynamic to static may be passed to the generating extension in registers. A hybrid hierarchical caching scheme is possible, where a level is added to the hierarchy when adding to the context (promotions), but the level is reset to the top of the hierarchy when removing from the context (demotions). The hierarchy allows checking only on the new variables in the context. Resetting the level to the top at demotions avoids the cost of traversing back up the tree and possibly down another branch (e.g., at a merge of contexts {x,y} and {x,z}). DAG conversion -------------- At specialize time, we must reduce and residualize all blocks of a unit that could potentially be reached at run time, so we must pursue all paths under dynamic control. So that ReduceAndResidualize does not have to traverse the unit subgraph using a work list and visited flags, the unit subgraph is converted to a DAG that can be simply traversed top to bottom. This process goes beyond simple unrolling of the traversal loop. Because converting the unit subgraph to a DAG also allows us to know order of traversal of the blocks at compile time, we can also better optimize the generating extension using standard compiler optimizations. Building a DAG of the basic blocks reached is difficult in the presence of unstructured code, combinations of static and dynamic branches, and loops or other back edges. Our solution is to pursue all static paths (static paths are determined by which static branches are taken), copying blocks when they are reached in different orders on different static paths. Thus, the order of traversal of the basic blocks will be known for each static path. Our algorithm first splits all static paths, creating a tree of basic blocks. Nodes under dynamic control are linearized in a topological order. Nodes in the tree with fanout greater than one correspond to static branches. Each node can appear in a given static path at most one time. Then, all leaves of the tree are linked to a sink node, which is added to a work list. Until the work list is empty, remove a node from the work list and unify all its identical predecessors with the same successors, adding unified nodes to the work list. The merging algorithm can also be used to merge common tails of different units. The Sorting & Splitting Algorithm Pseudocode Notation RC(Node) returns the reachability conditions for Node UnitHead(Node) returns the head of the unit to which Node belongs IsLazyUnit(Node) returns true if the unit to which Node belongs must be stitched lazily, false if the unit can be stitched eagerly Zero(Node) returns 0 Topo#(Node) returns the topological number of Node IsPredecessor(Node1, Node2) is true if Node1 is a predecessor of Node2 or Node1 == Node2, false if Node1 succeeds Node2 or Node1 neither preceeds nor succeeds Node2 NewNode = CopyNode(OldNode) creates a copy NewNode of node OldNode (not including connecting edges) Edges are ordered by the topological numbers of their destinations; Node->Edge[i] is the ith edge of Node; Node->Edge[0] is the first edge of Node Edge->Destination is the destination node of Edge Edge->Condition is the condition (NoCondition, True, False, 1, 100, etc.) of Edge (F0, F1, ..., FN) is an (N+1)-tuple of objects F0, F1, ..., FN; F0 is in field 0, F1 is in field 1, etc. Field(Tuple, N) returns field N of Tuple Stacks: stacks contain fixed-length tuples of pointers or scalars Stack = CreateEmptyStack(Size) Creates the empty stack Stack of tuples of size Size DestroyStack(Stack) Frees Stack EmptyStack(Stack) Returns true if Stack is empty, false otherwise PushOntoStack(Stack, Tuple) Pushes Tuple onto Stack Tuple = PopOffStack(Stack) Pops and returns the top of Stack Tuple = PeekAtStack(Stack) Returns the top of Stack without popping Lists: sorted lists contain fixed-length tuples of pointers or scalars and no duplicate tuples List = CreateEmptyList(Size, SortField, ValueFunction) Creates a list of tuples of size Size that is sorted in ascending order by the value returned by invoking ValueFunction on field SortField of each tuple DestroyList(List) Frees List EmptyList(List) Returns true if List is empty, false otherwise InsertIntoList(List, Tuple) Inserts Tuple into List after all other tuples of lesser or equal rank RemoveFromList(List, Tuple) Removes Tuple from List; a wildcard (*) may be used in any tuple field IsInList(List, Tuple) Returns true if Tuple is in List, false otherwise Tuple = HeadOfList(List) Returns the head of List (least tuple by the ValueFunction) Tuple = TailOfList(List) Returns the tail of List (greatest tuple by the ValueFunction) NewList = CopyList(OldList) Creates a copy NewList of list OldList Description This algorithm creates a list of lists of blocks for each unit. These lists are then passed to the merging algorithm for merging. The blocks in each list are in topological order. Each list corresponds to one combination of true and false paths from static branches. So, the first list contains the blocks reached when all static branches in the unit evaluate to true, and the last list contains the blocks reached when they all evaluate to false. As presented, this algorithm has a problem with back edges from one side of a static if into another static if body. This situation can create a difference between lists (i.e., blocks in one list but not the other) before the static branch causing the difference appears in the lists. The two possible solutions are to include the blocks in question in both lists, or to move the blocks below the static branch. The first solution has the disadvantage that some blocks will be unnecessarily stitched on some paths. The second solution inhibits merging. I plan to go with the second solution because its run-time efficiency is higher (no blocks unnecessarily stitched) and because it is easier to implement. Note that this algorithm assumes that units cannot contain disconnected blocks. The merging algorithm requires a tree rather than a list of lists. That tree can be reconstructed from the lists generated by this algorithm, or this algorithm can be modified to generate the tree directly. Variables UnitRootList is a list of nodes sorted in ascending order by topological number; it is a list of nodes that are the roots of specialization units PathStack is a stack of 2-tuples: (List, List); the tuples contain information about paths that need to be completed NextNodeList is a list of nodes of the graph sorted in ascending order by topological number; it is a list of nodes that need to be stitched, but haven't been copied to the duplicate graph yet VisitedNodeList is a list of 2-tuples: (Node, EdgeCondition); where Node is a node of the graph and EdgeCondition indictates which path was taken from a static branch (NoCondition for nodes that aren't static branches); the list is sorted in ascending order by topological number of the nodes; it is a list of nodes along one path down a unit UnitPathList is a list of all VisitedNodeLists, and is what will be handed to the tail-merging algorithm Pseudocode procedure SortAndSplitUnits(UnitRootList) { UnitPathList = CreateEmptyList(1, 0, Zero) until EmptyList(UnitRootList) { /* Get the root of this specialization unit */ (CurrentUnit) = RemoveHeadOfList(UnitRootList) PathStack = CreateEmptyStack(2) VisitedNodeList = CreateEmptyList(2, 0, Topo#) InsertIntoList(UnitPathList, (VisitedNodeList)) NextNodeList = CreateEmptyList(1, 0, Topo#) InsertIntoList(NextNodeList, (CurrentUnit)) PushOntoStack(PathStack, (VisitedNodeList, NextNodeList)) until EmptyStack(PathStack) { /* Start on a new path through the unit */ (VisitedNodeList, NextNodeList) = PopOffStack(PathStack) /* Add to the path until it is completed */ until EmptyList(NextNodeList) { (NextNode) = RemoveHeadOfList(NextNodeList) if EndsInStaticBranch(NextNode) { /* Static branches spawn new paths through the unit. The current path follows the first edge. */ for each outgoing edge NextNode->Edge[i] except the first (in reverse topo. order) { /* Create a new path */ NewVisitedList = CopyList(VisitedNodeList) InsertIntoList(UnitPathList, (NewVisitedList)) /* Keep track of which path we took */ InsertIntoList(NewVisitedList, (NextNode, NextNode->Edge[i]->Condition)) /* Work on each new path begins at a different destination. Stay within the unit and don't revisit visited nodes (must check because we follow back edges). */ NewNextList = CopyList(NextNodeList) Destination = NextNode->Edge[i]->Destination if ((UnitHead(Destination) == CurrentUnit) && (! IsInList(NewVisitedList, (Destination, *)))) { InsertIntoList(NewNextList, (Destination)) } PushOntoStack(PathStack, (NewVisitedList, NewNextList)) } InsertIntoList(VisitedNodeList, (NextNode, NextNode->Edge[0]->Condition)) Destination = NextNode->Edge[0]->Destination if ((UnitHead(Destination) == CurrentUnit) && (! IsInList(VisitedNodeList, (Destination, *)))) { InsertIntoList(NextNodeList, (Destination)) } } else { InsertIntoList(VisitedNodeList, (NextNode, NoCondition)) /* Visit all children of this node that are in this unit and that have not already been visited. */ for each outgoing edge NextNode->Edge[i] { Destination = NextNode->Edge[i]->Destination if ((UnitHead(Destination) == CurrentUnit) && (! IsInList(VisitedNodeList, (Destination, *)))) { InsertIntoList(NextNodeList, (Destination)) } } } } DestroyList(NextNodeList) } DestroyStack(PathStack) } DestroyList(UnitRootList) return UnitPathList } The Actual Code STATIC VOID dyc_SortAndSplitUnits( DYC_UNIT_INFO UnitList ) { DYC_UNIT_INFO CurrentUnit; BBLOCK CurrentUnitHead; DYC_STACK PathStack; DYC_LIST VisitedNodeList, NewVisitedList; DYC_LIST NextNodeList, NewNextList; BBLOCK NextNode, NNSuccessor; BBLOCK_LIST SuccessorList, ReverseSuccessorList; DYC_GRAPH Graph; DYC_GRAPH_NODE ParentGraphNode, GraphNode; CurrentUnit = UnitList; WHILE (CurrentUnit != NULL) { /* Skip units outside of dynamic regions */ IF (! dyc_UNIT_insideDynamicRegion(CurrentUnit)) { DYC_UNIT_INFO_UnitGraph(CurrentUnit) = (DYC_GRAPH) NULL; CurrentUnit = DYC_UNIT_INFO_NextUnit(CurrentUnit); CONTINUE; } /* Get the root of this specialization unit */ CurrentUnitHead = DYC_UNIT_INFO_UnitHead(CurrentUnit); /* Create the graph we build to restructure the unit. The graph node * for each basic block is created when the node is inserted into * the NextNodeList. */ DYC_UNIT_INFO_UnitGraph(CurrentUnit) = Graph = DYC_GRAPH_Create(1, dyc_Allocate, dyc_Deallocate); /* Create the list of nodes to visit. dyc_TopologicallyPreceeds should * work even though it only expects pointers to scalars instead of * pointers to arrays. It just uses the first element of each array. */ NextNodeList = DYC_LIST_Create(1, dyc_Allocate, dyc_Deallocate, dyc_TopologicallyPreceeds); DYC_LIST_Insert(NextNodeList, CurrentUnitHead); /* Create the stack used to keep track of split paths */ PathStack = DYC_STACK_Create(3, dyc_Allocate, dyc_Deallocate); DYC_STACK_Push(PathStack, VisitedNodeList, NextNodeList, (void *) NULL); WHILE (! DYC_STACK_EmptyP(PathStack)) { /* Start on a new path through the unit */ DYC_STACK_Pop(PathStack, &VisitedNodeList, &NextNodeList, &ParentGraphNode); /* Add to the path until it is completed */ WHILE (! DYC_LIST_EmptyP(NextNodeList)) { /* Get the next node */ DYC_LIST_RemoveHead(NextNodeList, &NextNode); /* Mark the node as visited on this path */ DYC_LIST_Insert(VisitedNodeList, NextNode); /* Add the node to the graph. Does the edge-order matter? */ GraphNode = DYC_GRAPH_CreateNode(Graph, NextNode); IF (ParentGraphNode == (DYC_GRAPH_NODE) NULL) DYC_GRAPH_AddRoot(Graph, GraphNode); ELSE DYC_GRAPH_AddEdge(Graph, ParentGraphNode, GraphNode); IF (dyc_StatIsAStaticBranch(BBLOCK_last_stat(NextNode))) { /* Static branches spawn new paths through the * unit. The current path follows the first edge. */ /* Create a list of successors in the reverse order, * skipping the first successor. */ SuccessorList = BBLOCK_succ_list(NextNode); assert(SuccessorList); ReverseSuccessorList = BBLOCK_List_Reverse_List(BBLOCK_list_tail(SuccessorList)); /* NB: If the branch is an IGOTO, the successors are tag * anchors, which must always be the immediate successors * of the block. How can we guarantee this? */ /* Push the successors onto the PathStack in reverse order */ LOOP { FOR_EACH_BBLOCK_LIST_BBLOCK(ReverseSuccessorList,NNSuccessor) DO { /* Create a new path */ NewVisitedList = DYC_LIST_Copy(VisitedNodeList); NewNextList = DYC_LIST_Copy(NextNodeList); dyc_AddBBlockToNextNodeList(NNSuccessor, NewNextList, NewVisitedList, CurrentUnit); DYC_STACK_Push(PathStack, NewVisitedList, NewNextList, GraphNode); } } /* The first successor is on the current path */ NNSuccessor = BBLOCK_list_bblock(SuccessorList); dyc_AddBBlockToNextNodeList(NNSuccessor, NextNodeList, VisitedNodeList, CurrentUnit); BBLOCK_List_Free_List(ReverseSuccessorList); } ELSE { /* Visit all children of this node that are in * this unit and that have not already been visited. */ LOOP { FOR_EACH_BBLOCK_SUCC_BBLOCK(NextNode,NNSuccessor) DO { dyc_AddBBlockToNextNodeList(NNSuccessor, NextNodeList, VisitedNodeList, CurrentUnit); } } } ParentGraphNode = GraphNode; } DYC_LIST_Destroy(NextNodeList); DYC_LIST_Destroy(VisitedNodeList); } DYC_STACK_Destroy(PathStack); CurrentUnit = DYC_UNIT_INFO_NextUnit(CurrentUnit); } } ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: bibliography Date: Fri, 28 Mar 1997 13:54:43 PST From: Brian Grant Here is a list of references for the paper. Most are in bibtex format. Do we need a reference for Java? Others? --Brian @InProceedings{Andersen:1992:Self-ApplicableC, author = "L.O. Andersen", title = "Self-Applicable {C} Program Specialization", booktitle = PEPM92, publisher = Yale, year = "1992", pages = "54--61", month = "June", note = ""} @InProceedings{Andersen:1993:Binding-TimeAnalysis, author = "L.O. Andersen", title = "Binding-Time Analysis and the Taming of {C} Pointers", booktitle = PEPM93, year = "1993", pages = "47-58", publisher = ACM, OPTnote = ""} Author: Auslander-J. Philipose-M. Chambers-C. Eggers-S-J. Bershad-B-N. Title: Fast, effective dynamic compilation. Source: SIGPLAN Notices. vol.31, no.5. pp. 149-59. May 1996. Conf. Title: ACM SIGPLAN '96: Programming Language Design and Implementation. Philadelphia, PA, USA. ACM. 21-24 May 1996. Year: 1996. @InProceedings{Bondorf:1992:ImprovingBinding, author = "A. Bondorf", title = "Improving Binding Times without Explicit CPS-Conversion", booktitle = "1992 ACM Conference in Lisp and Functional Programming, San Francisco, California (Lisp Pointers, vol. V, no. 1, 1992)", year = "1992", pages = "1-10", publisher = ACM, OPTnote = ""} @InProceedings{Bulyonkov:1996:PracticalAspects, author = "M.A. Bulyonkov and D.V. Kochetov", title = "Practical Aspects of Specialization of {Algol}-like Programs", crossref = "Danvy:1996:PartialEvaluation", pages = "17-32"} @INPROCEEDINGS{ Consel:1988:NewInsights, AUTHOR = "C. Consel", TITLE = "New Insights into Partial Evaluation: The {S}chism Experiment", BOOKTITLE = "ESOP '88, 2nd European Symposium on Programming, Nancy, France, March 1988 (Lecture Notes in Computer Science, vol. 300)", EDITOR = "H. Ganzinger", PUBLISHER = S-V, PAGES = "236-246", YEAR = "1988", ANNOTE = "A self-applicable partial evaluator called Schism is presented. It is written in a first-order subset of Scheme, with syntactic extensions and an extensible set of primitives, and can handle certain forms of side-effects. User-defined annotations control unfolding and specialization during partial evaluation. Work towards automating the annotations is presented."} C. Chambers, S. J. Eggers, J. Auslander, M. Philipose, M. Mock, and P. Pardyak Automatic dynamic compilation support for event dispatching in extensible systems In Workshop on Compiler Support for Systems Software Feb. 1996 @InProceedings{Consel:1990:FromInterpreting:ESOP, author = "C. Consel and O. Danvy", title = "From Interpreting to Compiling Binding Times", booktitle = "ESOP '90. 3rd European Symposium on Programmong, Copenhagen, Denmark, May 1990 (Lecture Notes in Computer Science, vol. 432)", year = "1990", editor = "N. Jones", pages = "88-105", publisher = S-V, OPTnote = ""} @Manual{Consel:1990:TheSchism, author = "Charles Consel", title = "The {S}chism Manual, Version 1.0", Organization = "Yale University", address = "New Haven, Connecticut", month = "December", year = "1990", note = ""} @InProceedings{Consel:1991:ForABetter, author = "C. Consel and O. Danvy", title = "For a Better Support of Static Data Flow", booktitle = "Functional Programming Languages and Computer Architecture, Cambridge, Massachusetts, August 1991 (Lecture Notes in Computer Science, vol. 523)", year = "1991", editor = "J. Hughes", pages = "496-519", organization = "ACM", publisher = S-V, note = ""} @InProceedings{Consel:1993:ATour, author = "C. Consel", title = "A Tour of Schism: A Partial Evaluation System for Higher-Order Applicative Languages", booktitle = PEPM93, year = "1993", pages = "145-154", publisher = ACM, OPTnote = ""} @InProceedings{Consel:1993:IncrementalPartial, author = "C. Consel and C. Pu and J. Walpole", title = "Incremental Partial Evaluation: The Key to High Performance, Modularity and Portability in OPerating Systems", booktitle = PEPM93, year = "1993", pages = "44-46", publisher = ACM, OPTnote = ""} @InProceedings{Consel:1993:Tutorial, author = "C. Consel and O. Danvy", title = "Tutorial Notes on Partial Evaluation", booktitle = "Twentieth ACM Symposium on Principles of Programming Languages, Charleston, South Carolina, January 1993", year = "1993", pages = "493-501", organization = "ACM", publisher = ACM, note = ""} Author: Consel-C. Noel-F. Title: A general approach for run-time specialization and its application to C. Conf. Title: Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. St. Petersburg Beach, FL, USA. pp. 145-56. ACM. 21-24 Jan. 1996. Year: 1996. @InProceedings{Consel:1996:AUniform, author = "C. Consel and others", title = "A Uniform Approach for Compile-Time and Run-Time Specialization", crossref = "Danvy:1996:PartialEvaluation", pages = "54-72"} @Proceedings{Danvy:1996:PartialEvaluation, title = "Partial Evaluation. Dagstuhl Castle, Germany, February 1996", year = "1996", editor = "O. Danvy and R. Gl{\"u}ck and P. Thiemann", volume = "1110", series = "Lecture Notes in Computer Science", publisher = S-V, OPTannote = "", OPTnote = ""} @inproceedings{EnglerDaws1994a, author ="Engler, Dawson R. and Proebsting, Todd A", title ="DCG: An Efficient, Retargetable Dynamic Code Generator", pages ="263-273", booktitle ="Sixth International Conference on Architectural Support for Programming Languages a nd Operating Systems (ASPLOS-VI)", month ="October", year ="1994", scope ="rcg", abstractURL ="http://www.pdos.lcs.mit.edu/~engler/papers.html", documentURL ="http://www.pdos.lcs.mit.edu/~engler/dcg.ps" } Author: Engler-D-R. Title: VCODE: a retargetable, extensible, very fast dynamic code generation system. Source: SIGPLAN Notices. vol.31, no.5. pp. 160-70. May 1996. Conf. Title: ACM SIGPLAN '96: Programming Language Design and Implementation. Philadelphia, PA, USA. ACM. 21-24 May 1996. Year: 1996. Author: Engler-D-R. Hsieh-W-C. Kaashoek-M-F. Title: `C: a language for high-level, efficient, and machine- independent dynamic code generation. Conf. Title: Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. St. Petersburg Beach, FL, USA. pp. 131-44. ACM. 21-24 Jan. 1996. Year: 1996. tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek. This paper will appear in PLDI 1997. tcc: A Template-Based Compiler for `C by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek. This paper appeared in WCSSS 1996. Author: Engler-D-R. Proebsting-T-A. Title: DCG: an efficient, retargetable dynamic code generation system. Source: SIGPLAN Notices. vol.29, no.11. pp. 263-72. Nov. 1994. Conf. Title: Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). San Jose, CA, USA. ACM. IEEE Comput. Soc. 4-7 Oct. 1994. Year: 1994. @ARTICLE{ Ershov:1982:MixedComputation, AUTHOR = "A.P. Ershov", TITLE = "Mixed Computation: Potential Applications and Problems for Study", JOURNAL = "Theoretical Computer Science", YEAR = "1982", VOLUME = "18", PAGES = "41-67", ANNOTE = "English version of \cite{Ershov:1980:MixedComputation:Russian}."} @Book{Jones:1993:PartialEvaluation, author = "N.D. Jones and C.K. Gomard and P. Sestoft", title = "Partial Evaluation and Automatic Program Generation", publisher = P-H, year = "1993", isbn = "0-13-020249-5", note = ""} @article{Keppel:91a, author={David Keppel}, title={A Portable Interface for On-The-Fly Instruction Space Modification}, journal={Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV)}, month={April}, year={1991}, pages={86-95} } B. W. Kernighan and D. M. Ritchie The C Programming Language Second Edition Prentice Hall 1988 Deferred Compilation: The Automation of Run-Time Code Generation. Mark Leone and Peter Lee. Technical Report CMU-CS-93-225, School of Computer Science, Carnegie Mellon University, December 1993. @InProceedings{Leone:1994:DeferredCompilation, author = "M. Leone and P. Lee", title = "Lightweight Run-Time Code Generation", booktitle ="PEPM'94 Workshop on Partial Evaluation and Semantics-Based Program Manipulation", month ="June", year = "1994", pages = "97-106", address ="Orlando, Florida", } @techreport{LeoneMarka1995b, author ="Leone, Mark and Lee, Peter", title ="Optimizing ML with Run-Time Code Generation", type ="Technical report", address ="Pittsburgh, Pennsylvania", number ="CMU-CS-95-205", institution ="School of Computer Science, Carnegie Mellon University", month ="December", year ="1995", scope ="implemen", abstractURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.abstract", documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.ps", keywords ="Run-time code generation, dynamic optimization, partial evaluation, specialization, staged computation" In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pp. 137-148, Philadelphia, May 1996. } @inproceedings{LeoneMarka1996a, author ="Leone, Mark and Lee, Peter", title ="A Declarative Approach to Run-Time Code Generation", booktitle ="Workshop on Compiler Support for System Software (WCSSS)", month ="February", year ="1996", scope ="implemen", documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/declarative-rtcg.ps", keywords ="run-time code generation, dynamic optimization, specialization, partial evaluation" In Workshop Record of WCSSS'96: The Inaugural Workshop on Compiler Support for System Software, pp. 8-17, Tucson, February 1996. } Author: Lowney-P-G. Freudenberger-S-M. Karzes-T-J. Lichtenstein-W-D. Nix-R-P. ODonnell-J-S. Ruttenberg-J-C. Title: The multiflow trace scheduling compiler. Source: Journal of Supercomputing. vol.7, no.1-2. pp. 51-142. May 1993. @InProceedings{Meyer:1991:Techniques, author = "U. Meyer", title = "Techniques for Partial Evaluation of Imperative Languages", booktitle = PEPM, SIGPLAN Notices. vol.26, no.9. pp. 94-105. Sept. 1991. Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM '91. year = "1991", pages = "94-105", publisher = ACM, OPTnote = ""} @InProceedings{Sakharov:1996:Specialization, author = "A. Sakharov", title = "Specialization of Imperative Programs through Analysis of Relational Expressions", crossref = "Danvy:1996:PartialEvaluation", pages = "430-445"} @inproceedings{WeiseDanie1991a, author ="Weise, Daniel and Conybeare, Roland and Ruf, Erik and Seligman, Scott", title ="Automatic online partial evaluation", booktitle ="Functional Programming & Computer Architecture", month ="June", year ="1991", scope ="partial", abstractURL ="ftp://quilty.stanford.edu/pub/fuse-papers/README", documentURL ="ftp://quilty.stanford.edu/pub/fuse-papers/FUSE-MEMO-91-6.ps" } ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: related work draft Date: Sun, 30 Mar 1997 23:23:42 PST From: Brian Grant Here is the aforementioned draft of the related work section, followed by some of my notes. I'll include the same disclaimer that Craig did: the text is not great. However, I think it covers most of the points we want to hit (and probably more). --Brian Related Work Tempo [Consel:1996:POPL], a compile-time and run-time specialization system for C, is most similar to DyC, but there are numerous and substantial differences. Tempo is a mature system that provides polyvariant function specialization and division, performs alias and side-effect analysis, performs interprocedural analysis, and deals with partially static structures. The two systems differ in the degree of programmer control offered, annotation strategies, in the granularity of specialization and division, in support for caching specialized code, in support for lazy specialization, in the way aliases and partially static structures are handled, in the way unstructured code is handled, in support for interprocedural analysis and separate compilation, in optimization of static and dynamic code, and in implementation strategies. In Tempo, the programmer annotates which parameters and global variables to make static at run time when certain entry-point functions are called. These entry points are listed in a file provided as input to the BTA in addition to the source program, rather than inlined with the program text as in DyC. The programmer also must provide information about aliasing outside of the region of specialization. DyC's annotations support the equivalent of entry points at any program point. DyC requires no information about code outside the region of specialziation, but requires annotations to make pointer dereferences static. Tempo's alias analysis can be unsound, but does much more than ours. On the other hand, our annotations permit specialization of pointer data structures without accurate alias information. DyC's function annotations provide an analogous solution to the absence of interprocedural analysis in our system. They permit interprocedural specialization and work as well with separate compilation. We plan to add alias analysis and interprocedural analysis to DyC, but the annotations will continue to be useful in cases where the analyses fail to collect sufficiently accurate information. Because of the disadvantages of CPS conversion [Jones:1993:PartialEvaluation], and based on study of our target applications, we feel that program-point specialization is particularly important for C programs. In addition to its ability to specialize functions, Tempo only has the capability to unroll simple loops. Our first-generation system also implemented this capability in an ad-hoc manner [APCEB96]. Cmix was a partial-evaluation system for C that provided polyvariant program-point specialization for C [Andersen:1992:Self-ApplicableC]. Cmix handled unstructured code, but it seems to have lacked reachability analysis [Andersen:1994:PhDThesis]. Tempo applies Erosa's and Hendren's goto-elimination algorithm to convert unstructured code to a structured equivalent [ErosaHendren94,Consel:1996:AUniform]. This approach introduces a number of additional tests and may also introduce loops. We feel that direct support of unstructured control flow is more appropriate for C. We also feel that polyvariant program-point division is important, especially when combined with our program-point annotations make_static and make_dynamic to effect conditional specialization. Others [Consel:1996:AUniform,Meyer:1991:Techniques] worried that polyvariant program-point specialization and division could cause code explosion, but instead of not supporting these powerful features, we put them under control of the programmer. Schism's filters permitted choices about whether to unfold or residualize a function and which arguments to generalize, given binding times for the functions parameters [Consel:1993:ATour]. Because filters are executed by the BTA, only binding-time information can be used to make decisions, not the values of static or dynamic variables, which can be used in specialization decisions in DyC. Filters can be used, for example, to prevent unbounded unfolding and unbounded specialization. Both off-line partial evaluators such as Schism and on-line specializers such as Fuse [WeiseDanie1991a] look for dynamic conditionals as a signal that unbounded unfolding or specialization could result, and then must halt. Run-time specializers have an additional option, which is to temporarily halt specialization when dynamic conditionals are found in potential cycles and insert lazy callbacks to the specializer, which is what DyC does. Tempo does not permit caching of multiple specialized versions of code. Fabius, a compiler and run-time specializer for a pure subset of ML, performs caching of multiple versions, but does not provide control over caching policies [LeoneMarka1995b]. As does DyC, Tempo also constructs and optimizes templates for dynamically generated code and uses them to produce generating extensions that, when executed, produce specialized code at run time. However, they block many inter-template optimizations to preserve template integrity. We need only restrict template instructions with holes to their units, and prevent their speculation above static branches. Fabius does not optimize templates separately. We have expended much more effort optimizing the specialized run-time specializers. `C extends the ANSI C language to support dynamic code generation in an imperative style [EnHsKa96]. The programmer must specify code to be generated at run time, substitute run-time values and combine code fragments, perform certain optimizations, invoke the run-time compiler, manage code reuse and code-space reclamation, and ensure correctness [LeoneMarka1996a]. They claim that their flexibility is unique in giving the programmer the ability to specify and compose arbitrary expressions and statements at run time; however, it is not. A declarative system such as DyC can do so as well, requiring much less effort from the programmer, and also facilitating debugging and ensuring correctness (if the safe subset of the annotations is used). Tempo uses `C as one of its back-end code generators [Consel97], while DyC could serve as a back end to `C by specializing an ICODE interpreter [PoEnKa97]. Generation of new function signatures is handled by specializing functions with variable numbers of arguments. A more direct emulation of `C would be possible with the addition to our system of run-time inlining through function pointers, but we stress that, in practice, only a few annotations are necessary to achieve an effect comparable to common `C idioms. For example, multi-way loop unrolling can be achieved with a single annotation on the loop induction variable. In fact, DyC has several capabilities that `C does not. `C cannot multi-way unroll loops with dynamic exit tests because jumps to labels in other tick expressions are not permitted. Also, because portions of dynamically generated functions cannot be recompiled in `C, we cannot see how they could unroll the interpreter supporting the computed goto shown in section 2. Moreover, DyC's nested and overlapping dynamic regions are not matched by nested tick expressions in `C. Furthermore, better static optimization of dynamic code because control-flow is more easily determined. This could be one reason why `C now applies limited partial evaluation to tick expressions, including dead-code elimination and unrolling of simple static loops [PoEnKa97]. They even plan to add a binding-time analysis, which would yield a hybrid declarative/imperative run-time code generation system. We could implement any of the proposed run-time register allocation strategies (e.g., ICODE [PoEnKa97], Fabius [LeoneMarka1995b], or register actions [Wall86,APCEB96]), and select among them or select not to perform run-time register allocation through new policies in the annotations. The bottom line, however, is that DyC and other declarative systems can be applied to existing programs in a straightforward manner, whereas imperative systems such as `C require substantial rewriting of code. We therefore conclude, as did Ershov 15 years ago [Ershov:1982:MixedComputation], that the declarative approach to code generation is more general than the imperative approach. ********************************************************************** Old text and notes: While the ability to specialize at compile time is an advantage over our system, Tempo's goal to treat compile-time and run-time specialization uniformly restricts their treatment of run-time specialization. For example, arbitrary dynamic-to-static promotion of variables, lazy specialization, and dynamic caching strategies are not compatible with compile-time specialization, and so cannot be exposed to the programmer. Dead-variables problem: Specializing on variables that are no longer used [Jones:1993:PartialEvaluation]. Unstructured control flow: Tempo applies the algorithm from [ErosaHendren94]. Could the algorithm presented in [Sakharov:1996:Specialization] be used to do reachability analysis? Polyvariant specialization: Worries about code explosion due to program-point polyvariant specialization in [Meyer:1991:Techniques]. Cmix does polyvariant program-point specialization. Tempo does not. Polyvariant division: Worries about code explosion due to program-point polyvariant division in [Consel:1996:AUniform]. Mix did polyvariant program-point division. Tempo, Cmix, Fabius, etc. do not. CPS: [Consel:1991:ForABetter]. Specialization units: Specialization points are points reached by several static memory states [Jones:1993:PartialEvaluation, Bulyonkov:1996:PracticalAspects]. Our unit partitioning is vaguely similar to block specialization [Bulyonkov:1996:PracticalAspects]. They insert explicit block statements around statement sequences with the same "static memory state" (think context). Hence, their blocks begin at specialization points. Their approach is more restricted because of their breadth-first "linear traversal" specialization algorithm. They insert specialization points when they need to delay specialization of some statements to satisfy their (self-induced) constraints. They also disallow gotos into compound statements. Configuration analysis? Control over specialization: Termination: "Conditional markers" on their specializer's stack of calls being specialized indicate "speculative" specialization. "If there is a conditional marker before the most recent call to the same closure, it means that the call about to be made is speculative, and that unfolding it may lead to specializer loops that don't correspond to runtime loops. When such a call is detected, Fuse elects to specialize rather than unfold. Abstraction of arguments occurs at this point..." [WeiseDanie1991a]. Tempo - polyvariant function division - polyvariant function specialization - special hack to unroll simple loops - can't deal with both static and dynamic loop exits - promotions of parameters or globals only at certain function entries listed as entry points in an input file to the BTA; variables to be promoted are annotated as static - first converts to structured code - also extracts templates - also produces generating extensions - performs compile-time specialization as well - interprocedural analysis - partially static structures - alias and side-effect analysis - static return values of residual calls Schism Fabius Cmix - polyvariant specialization of C functions - polyvariance at program points: at control flow merges splitting is done unless C-Mix can determine that the static store is "similar" that allows polyvariance after dynamic ifs. There is no mention of a reachability analysis, so I am not sure how this works in detail. There's an example where C-Mix multi-way unrolls a binary search function, though, so it seems to work.. - no polyvariant division - no partially static data structure (structs must be completely dynamic / static) - what about array splitting? `C - requires use of a new language to use it; can't be used to optimize existing C programs with run-time code generation - difficult to debug - difficult to statically optimize across tick expressions - cannot unroll loops with early exits (breaks) - `C programs manipulate code; DyC programs manipulate data - we can generate arbitrary code graphs (even ones with merges and loops, thanks to our cache lookups and backpatching) by unrolling an interpreter loop for a language with branches - with specialized varargs, we can do anything they can do and more by specializing a simulator of their system [Ershov:1982:MixedComputation] - run-time inlining through function pointers would lend a more straightforward emulation - in our target applications, our system is more straightforward to use (just a few make_static annotations) ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: conclusion draft Date: Sun, 30 Mar 1997 23:24:23 PST From: Brian Grant Here is a draft of the conclusion. --Brian Conclusion We are building DyC by modifying the Multiflow compiler [LFKLNOR93]. We began with a detailed design, covering every aspect of the system, with the implementation being performed in stages. We currently have a restricted monovariant version of the BTA working that computes all of the outputs required by the run-time specializer. We should soon be able to generate code for simple dynamic regions. Over the next few months, we hope to complete the system. In the near future, we plan to work out the annotations for static fields of data structures and for pointers and add them to our system. We also would like to add alias and side-effect analysis, and interprocedural analysis as well. Beyond those analyses, there are a number of analyses we could add such as termination and exception analysis to eliminate unnecessary lazy specialization, analysis of relational expressions [Sakharov:1996:Specialization], or even theorem proving. We also would like a better solution to the problem of clustering specialization-unit boundaries. However, at this time we do not know whether these analyses will be useful. In addition to more aggressive analyses, we would like to add more run-time optimizations, controlled by the annotations through policies. In particular, we would like to implement run-time register allocation and run-time inlining. Our annotations' design resulted from a search for a small set of flexible primitive directives to govern dynamic compilation, suitable for use by both human programmers and tools (such as a semi-automatic dynamic-compilation front end). With the exception of support for static data structures, we believe that our make_static annotation provides the flexibility we require in a concise, even elegant manner. Furthermore (and to our surprise), most of the run-time optimizations (e.g., multi-way loop unrolling) and other features we desired (e.g., conditional specialization) fit neatly into the framework of partial evaluation. We feel that DyC's main contribution lies in the flexibility achieved by its blend of features: - the ability to promote dynamic variables to static or demote static variables to dynamic at arbitrary program points by use of simple annotations inserted into the program text, permitting nested and overlapping dynamic regions, - automatic dynamic-to-static promotion of specialization variables, substantially reducing the number of annotations necessary to continue specialization and permitting the programmer to be ignorant of the accuracy of the system's analyses, - polyvariant program-point specialization and division, permitting, in particular, multi-way loop unrolling and conditional specialization, - automatic caching and reuse of specialized code, - on-demand as well as speculative specialization, and - programmer control over these features through straightforward policies on specialization variables included in the annotations. The result is a system far easier to use than a code-generation system like `C and one that can be easily applied to existing programs. Furthermore, the annotations and their corresponding array of policies provide a convenient mechanism for experimenting with dynamic compilation, and a potential target for future tools or a fully automatic system. ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: updated bibliography Date: Sun, 30 Mar 1997 23:24:57 PST From: Brian Grant @InProceedings{Andersen:1992:Self-ApplicableC, author = "L.O. Andersen", title = "Self-Applicable {C} Program Specialization", booktitle = PEPM92, publisher = Yale, year = "1992", pages = "54--61", month = "June", note = ""} @InProceedings{Andersen:1993:Binding-TimeAnalysis, author = "L.O. Andersen", title = "Binding-Time Analysis and the Taming of {C} Pointers", booktitle = PEPM93, year = "1993", pages = "47-58", publisher = ACM, OPTnote = ""} [APCEB96] Author: Auslander-J. Philipose-M. Chambers-C. Eggers-S-J. Bershad-B-N. Title: Fast, effective dynamic compilation. Source: SIGPLAN Notices. vol.31, no.5. pp. 149-59. May 1996. Conf. Title: ACM SIGPLAN '96: Programming Language Design and Implementation. Philadelphia, PA, USA. ACM. 21-24 May 1996. Year: 1996. @InProceedings{Bondorf:1992:ImprovingBinding, author = "A. Bondorf", title = "Improving Binding Times without Explicit CPS-Conversion", booktitle = "1992 ACM Conference in Lisp and Functional Programming, San Francisco, California (Lisp Pointers, vol. V, no. 1, 1992)", year = "1992", pages = "1-10", publisher = ACM, OPTnote = ""} @InProceedings{Bulyonkov:1996:PracticalAspects, author = "M.A. Bulyonkov and D.V. Kochetov", title = "Practical Aspects of Specialization of {Algol}-like Programs", crossref = "Danvy:1996:PartialEvaluation", pages = "17-32"} @INPROCEEDINGS{ Consel:1988:NewInsights, AUTHOR = "C. Consel", TITLE = "New Insights into Partial Evaluation: The {S}chism Experiment", BOOKTITLE = "ESOP '88, 2nd European Symposium on Programming, Nancy, France, March 1988 (Lecture Notes in Computer Science, vol. 300)", EDITOR = "H. Ganzinger", PUBLISHER = S-V, PAGES = "236-246", YEAR = "1988", ANNOTE = "A self-applicable partial evaluator called Schism is presented. It is written in a first-order subset of Scheme, with syntactic extensions and an extensible set of primitives, and can handle certain forms of side-effects. User-defined annotations control unfolding and specialization during partial evaluation. Work towards automating the annotations is presented."} C. Chambers, S. J. Eggers, J. Auslander, M. Philipose, M. Mock, and P. Pardyak Automatic dynamic compilation support for event dispatching in extensible systems In Workshop on Compiler Support for Systems Software Feb. 1996 @InProceedings{Consel:1990:FromInterpreting:ESOP, author = "C. Consel and O. Danvy", title = "From Interpreting to Compiling Binding Times", booktitle = "ESOP '90. 3rd European Symposium on Programmong, Copenhagen, Denmark, May 1990 (Lecture Notes in Computer Science, vol. 432)", year = "1990", editor = "N. Jones", pages = "88-105", publisher = S-V, OPTnote = ""} @Manual{Consel:1990:TheSchism, author = "Charles Consel", title = "The {S}chism Manual, Version 1.0", Organization = "Yale University", address = "New Haven, Connecticut", month = "December", year = "1990", note = ""} @InProceedings{Consel:1991:ForABetter, author = "C. Consel and O. Danvy", title = "For a Better Support of Static Data Flow", booktitle = "Functional Programming Languages and Computer Architecture, Cambridge, Massachusetts, August 1991 (Lecture Notes in Computer Science, vol. 523)", year = "1991", editor = "J. Hughes", pages = "496-519", organization = "ACM", publisher = S-V, note = ""} @InProceedings{Consel:1993:ATour, author = "C. Consel", title = "A Tour of Schism: A Partial Evaluation System for Higher-Order Applicative Languages", booktitle = PEPM93, year = "1993", pages = "145-154", publisher = ACM, OPTnote = ""} @InProceedings{Consel:1993:IncrementalPartial, author = "C. Consel and C. Pu and J. Walpole", title = "Incremental Partial Evaluation: The Key to High Performance, Modularity and Portability in OPerating Systems", booktitle = PEPM93, year = "1993", pages = "44-46", publisher = ACM, OPTnote = ""} @InProceedings{Consel:1993:Tutorial, author = "C. Consel and O. Danvy", title = "Tutorial Notes on Partial Evaluation", booktitle = "Twentieth ACM Symposium on Principles of Programming Languages, Charleston, South Carolina, January 1993", year = "1993", pages = "493-501", organization = "ACM", publisher = ACM, note = ""} Author: Consel-C. Noel-F. Title: A general approach for run-time specialization and its application to C. Conf. Title: Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. St. Petersburg Beach, FL, USA. pp. 145-56. ACM. 21-24 Jan. 1996. Year: 1996. @InProceedings{Consel:1996:AUniform, author = "C. Consel and others", title = "A Uniform Approach for Compile-Time and Run-Time Specialization", crossref = "Danvy:1996:PartialEvaluation", pages = "54-72"} @Proceedings{Danvy:1996:PartialEvaluation, title = "Partial Evaluation. Dagstuhl Castle, Germany, February 1996", year = "1996", editor = "O. Danvy and R. Gl{\"u}ck and P. Thiemann", volume = "1110", series = "Lecture Notes in Computer Science", publisher = S-V, OPTannote = "", OPTnote = ""} @inproceedings{EnglerDaws1994a, author ="Engler, Dawson R. and Proebsting, Todd A", title ="DCG: An Efficient, Retargetable Dynamic Code Generator", pages ="263-273", booktitle ="Sixth International Conference on Architectural Support for Programming Languages a nd Operating Systems (ASPLOS-VI)", month ="October", year ="1994", scope ="rcg", abstractURL ="http://www.pdos.lcs.mit.edu/~engler/papers.html", documentURL ="http://www.pdos.lcs.mit.edu/~engler/dcg.ps" } Author: Engler-D-R. Proebsting-T-A. Title: DCG: an efficient, retargetable dynamic code generation system. Source: SIGPLAN Notices. vol.29, no.11. pp. 263-72. Nov. 1994. Conf. Title: Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). San Jose, CA, USA. ACM. IEEE Comput. Soc. 4-7 Oct. 1994. Year: 1994. Author: Engler-D-R. Title: VCODE: a retargetable, extensible, very fast dynamic code generation system. Source: SIGPLAN Notices. vol.31, no.5. pp. 160-70. May 1996. Conf. Title: ACM SIGPLAN '96: Programming Language Design and Implementation. Philadelphia, PA, USA. ACM. 21-24 May 1996. Year: 1996. [EnHsKa96] Author: Engler-D-R. Hsieh-W-C. Kaashoek-M-F. Title: `C: a language for high-level, efficient, and machine- independent dynamic code generation. Conf. Title: Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. St. Petersburg Beach, FL, USA. pp. 131-44. ACM. 21-24 Jan. 1996. Year: 1996. [PoEnKa97] tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek. This paper will appear in PLDI 1997. tcc: A Template-Based Compiler for `C by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek. This paper appeared in WCSSS 1996. ErosaHendren94: A.M. Erosa and L.J. Hendren Taming control flow: a structured approach to eliminating goto statements Proceedings of the IEEE 1994 International Conference on Computer Languages May 1994, pp.229-40. @ARTICLE{ Ershov:1982:MixedComputation, AUTHOR = "A.P. Ershov", TITLE = "Mixed Computation: Potential Applications and Problems for Study", JOURNAL = "Theoretical Computer Science", YEAR = "1982", VOLUME = "18", PAGES = "41-67", ANNOTE = "English version of \cite{Ershov:1980:MixedComputation:Russian}."} @Book{Jones:1993:PartialEvaluation, author = "N.D. Jones and C.K. Gomard and P. Sestoft", title = "Partial Evaluation and Automatic Program Generation", publisher = P-H, year = "1993", isbn = "0-13-020249-5", note = ""} @article{Keppel:91a, author={David Keppel}, title={A Portable Interface for On-The-Fly Instruction Space Modification}, journal={Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV)}, month={April}, year={1991}, pages={86-95} } B. W. Kernighan and D. M. Ritchie The C Programming Language Second Edition Prentice Hall 1988 Deferred Compilation: The Automation of Run-Time Code Generation. Mark Leone and Peter Lee. Technical Report CMU-CS-93-225, School of Computer Science, Carnegie Mellon University, December 1993. @InProceedings{Leone:1994:DeferredCompilation, author = "M. Leone and P. Lee", title = "Lightweight Run-Time Code Generation", booktitle ="PEPM'94 Workshop on Partial Evaluation and Semantics-Based Program Manipulation", month ="June", year = "1994", pages = "97-106", address ="Orlando, Florida", } @techreport{LeoneMarka1995b, author ="Leone, Mark and Lee, Peter", title ="Optimizing ML with Run-Time Code Generation", type ="Technical report", address ="Pittsburgh, Pennsylvania", number ="CMU-CS-95-205", institution ="School of Computer Science, Carnegie Mellon University", month ="December", year ="1995", scope ="implemen", abstractURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.abstract", documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.ps", keywords ="Run-time code generation, dynamic optimization, partial evaluation, specialization, staged computation" In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pp. 137-148, Philadelphia, May 1996. } @inproceedings{LeoneMarka1996a, author ="Leone, Mark and Lee, Peter", title ="A Declarative Approach to Run-Time Code Generation", booktitle ="Workshop on Compiler Support for System Software (WCSSS)", month ="February", year ="1996", scope ="implemen", documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/declarative-rtcg.ps", keywords ="run-time code generation, dynamic optimization, specialization, partial evaluation" In Workshop Record of WCSSS'96: The Inaugural Workshop on Compiler Support for System Software, pp. 8-17, Tucson, February 1996. } [LFKLNOR93] Author: Lowney-P-G. Freudenberger-S-M. Karzes-T-J. Lichtenstein-W-D. Nix-R-P. ODonnell-J-S. Ruttenberg-J-C. Title: The multiflow trace scheduling compiler. Source: Journal of Supercomputing. vol.7, no.1-2. pp. 51-142. May 1993. @InProceedings{Meyer:1991:Techniques, author = "U. Meyer", title = "Techniques for Partial Evaluation of Imperative Languages", booktitle = PEPM, SIGPLAN Notices. vol.26, no.9. pp. 94-105. Sept. 1991. Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM '91. year = "1991", pages = "94-105", publisher = ACM, OPTnote = ""} @InProceedings{Sakharov:1996:Specialization, author = "A. Sakharov", title = "Specialization of Imperative Programs through Analysis of Relational Expressions", crossref = "Danvy:1996:PartialEvaluation", pages = "430-445"} @inproceedings{WeiseDanie1991a, author ="Weise, Daniel and Conybeare, Roland and Ruf, Erik and Seligman, Scott", title ="Automatic online partial evaluation", booktitle ="Functional Programming & Computer Architecture", month ="June", year ="1991", scope ="partial", abstractURL ="ftp://quilty.stanford.edu/pub/fuse-papers/README", documentURL ="ftp://quilty.stanford.edu/pub/fuse-papers/FUSE-MEMO-91-6.ps" } ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: another bibliography update Date: Mon, 31 Mar 1997 10:02:28 PST From: Brian Grant The bibliography can now be found on our PEPM97 web page, as can the other text I've mailed out: http://www.cs.washington.edu/research/projects/unisw/DynComp/www/Internal/Documentation/DyC/Papers/PEPM97/ --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: running list of outstanding paper issues Date: Mon, 31 Mar 1997 13:46:50 PST From: Brian Grant I'm building a running list of outstanding issues wrt the paper. It includes our discussions from this morning and most of the mail I've sent. It can be found on our PEPM97 web page under "Outstanding Issues". --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: BTA comments Date: Wed, 02 Apr 1997 00:03:47 PST From: Brian Grant My belated comments on the BTA are finally in my list of outstanding issues. The only major flaw I found was indeed in removal of dead variables from the Division. It seems we can't remove a dead variable from the Division until its last use, or not at all if we want to take the easier route. There could be a problem with the Division meet operator, or otherwise I just don't fully understand it. What happens at a control-flow merge if a variable on one path has the mono_divide policy but is dynamic on another path? Other problems were minor. A few clarifications are needed in the text, IMO. Anyway, take a look at my full comments. In the morning, I'll try to sweep through my comments from other sections to update them for the latest version of the paper. --Brian