********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: PEPM `97 Paper
Date: Thu, 30 Jan 1997 14:17:19 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

This message will create a discussion list named pepm97.  Please direct
messages about the paper to that list.

Susan called.  Here are some of her comments about the submission.

-- Too long
-- Focus
	-- Annotations
	-- BTA
	-- Contributions
		-- wrt. PE (respecialization? intraprocedural?)
		-- wrt. DC (?)
-- More
	-- Motivation
	-- Examples
	-- Definitions
		-- polyvariant specialization and division
	-- Explanation
		-- Nested/overlapping regions
		-- Respecialization
		-- Sub-function annotations
	-- Appendix of annotations
		-- Syntax
		-- Concise description
-- Should omit
	-- Tangents
		-- Digressions into `C, PE vs. DC, off-line vs. on-line, etc.
	  	=> Discuss each topic only in one place
		   and keep to the point
	-- Excessive forward references
	-- Policy details
		-- Cache replacement policies
		-- Weaker specialization
	-- Some implementation details
	-- Implementation discussion about units/backend (save for another
	   paper)
	-- Results (save for another paper)

Something that Craig mentioned at the meeting was that we should think
about separating concepts that are separable so that we can better claim
that our annotations are in some way "primitives" for PE-based DC.  For
example, we should separate mono- vs. polyvariant specialization and division,
and also automatic dynamic-to-static promotion.  We could maintain our
current annotations as syntactic sugar for the more primitive ones.

--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: specializer pseudo-code
Date: Fri, 14 Mar 1997 11:50:33 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Here is my specification of the run-time actions and their arguments.
The pseudo-code is somewhat baroque and probably needs to be debugged
and simplified, but I think it is a good start.  I couldn't figure out
how to live without my "ContextBuild" primitive in this first, very
detailed pass.

I'm assuming none all of these primitives is specialized in this
version.  That is, I specify all necessary arguments and leave
specialization of the primitive as an optimization.

Comments/questions welcome.

--Brian

******************************************************************************

/*****************************************************************************
			        Type definitions
*****************************************************************************/

/* Per-operation and per-operand binding time annotation */
Type BindingTime { Static | Dynamic };

/* Inter-unit edge classification */
Type SpecializeTime { Eager | Lazy };

/* Whatever type of cache policy we decide to allow */
Type CachePolicy {
	/* CacheAll, Cache(n), replacement policy? */
};

/* This is what information we need about a Unit */
Type Unit {
	Int			Id;		/* Identifier */
	Set<Name>		ContextType;	/* Set of variables in context */
	CachePolicy		Policy;		/* Cache policy */
	DAG<BB>			CFG;		/* DAG of instructions (some IR) */
	List<Edge>		Successors;	/* List of successor units */
};

Type BB {
	List<Instruction>	Instructions;
};

Type Instruction {
	Operation		Op;
	List<Operand>		Operands;
	List<Variable>		Results;

	BindingTime		OpBT;
	List<BindingTime>	OperandBTs;
	List<BindingTime>	ResultBTs;

	.
	. /* whatever other information is needed for codegen  */
	.
};

Type Variable {
	Name			VariableName;
	DataType		VariableType;
	.
	. /* whatever other information is needed for codegen  */
	.
};

/* An Operand can either be a variable or a constant */
Type Operand { Variable | Value };

/* Lists of Operands are useful for varargs */
Type EmitOperand { Operand | List<Operand> };

/* Name->Value mapping */
Type Definition {
	Name			VariableName;
	Value			VariableValue;
};

Type Operation { Call | Jump | Beq | Add | ... };

/* Abstract inter-unit edge information */
Type Edge {
	Unit			Destination;
	SpecializeTime		Policy;		/* Eager or lazy */
	List<Variable>		VariablesPromoted; /* D->S promos occur on edges */
};

/* Information needed about an instance of an inter-unit edge */
Type BranchEdge {
	Address			BranchAddr;
	Int			SuccessorNum;
};

/*****************************************************************************
		     Primitive declarations/definitions
*****************************************************************************/

Function CacheLookup
Input:
	Int			UnitId;
	Set<Definition>		Context;
Output:
	Bool			UnitFound;
	Address			UnitAddress;

Function CacheInsert
Input:
	Int			UnitId;
	Set<Definition>		Context;
	Address			UnitAddress;
Output:
	Void

Function EmitInstruction
Input:
	Operation		Op;
	List<EmitOperand>	Operands;
	List<Variable>		Results;
Output:
	Void (side-effects the writable instruction space and pointer)

Function ReduceAndResidualize
Input:
	DAG<BB>			CFG;
	Set<Definition>		StaticVariables;
Output:
	Address			ResidualAddr;
	List<BranchEdge>	OutBranches;
	Set<Definition>		UpdatedStaticVariables;

Function BackPatch
Input:
	Address			BranchAddr;
	Int			SuccessorNum;
	Address			DestinationAddr;
Output:
	Void

Function ContextBuild
Input:
	Set<Definition>		StaticVariables;
	Set<Name>		NewContextType;
	List<Definition>	PromotedVariables;
Output:
	Set<Definition>		NewContext;

Function Specialize
Input:
	Unit			Unit;
	Bool			Patch;
	Address			PredecessorAddr;
	Int			SuccessorNum;
	List<Variable>		StaticVariables;
Output:
	Address			SuccessorAddr;
{
    (Found, UnitAddr) = CacheLookup(Unit.Id, Unit.ContextType, StaticVariables);
    If (! Found) {
        (UnitAddr, OutBranches, UpdatedStaticVariables) =
            ReduceAndResidualize(Unit.CFG, StaticVariables);
        CacheInsert(Unit.Id, Unit.ContextType, StaticVariables, UnitAddr);
        If (Patch) {
            BackPatch(PredecessorAddr, SuccessorNum, UnitAddr);
        }
	/* The Edge and BranchEdge lists must be correlated */
        Foreach (Edge, BranchEdge) In (Unit.Successors, OutBranches) {
            If (Edge.Policy == Eager) {
                Specialize(Edge.Destination, True,
                       BranchEdge.BranchAddr, BranchEdge.SuccessorNum,
                       ContextBuild(UpdatedStaticVariables,
				    Edge.Destination.ContextType, ()));
            } Else {
                If (! Empty(Edge.VariablesPromoted)) {
                    EmitInstruction(Call,
                            (ContextBuild,
                             UpdatedStaticVariables,
                             Edge.Destination.ContextType,
                             Edge.VariablesPromoted),
                            (Variable("NewStaticVariables", List<Definition>)));
                    EmitInstruction(Call,
                            (Specialize,
                             Edge.Destination, False,
                             BranchEdge.BranchAddr, BranchEdge.SuccessorNum,
                             Variable("NewStaticVariables",List<Definition>)),
                            (Variable("NextPC", Address)));
                } Else {
                    EmitInstruction(Call,
                            (Specialize,
                             Edge.Destination, True,
                             BranchEdge.BranchAddr, BranchEdge.SuccessorNum,
                             ContextBuild(UpdatedStaticVariables,
					  Edge.Destination.ContextType, ())),
                            (Variable("NextPC", Address)));
                }
                EmitInstruction(Jump, (Variable("NextPC", Address)), ());
            }
        }
    } Else {
        If (Patch) {
            BackPatch(PredecessorAddr, SuccessorNum, UnitAddr);
        }
    }

    Return UnitAddr;
};


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: New version of the specializer code with accompanying declarations
Date: Mon, 17 Mar 1997 16:44:48 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

This is a multipart MIME message.

--boundary=_0
Content-Type: text/plain; charset="us-ascii"
--------
This version separates static information from dynamic information and
partitions the list of static variables into the context list and other live
static variables.  Hence, ContextBuild is now ValueListBuild because it
is used to build both lists.  I've also included support for unchecked units.

Let me know what you think.
--Brian


--boundary=_0
Content-Type: text/plain; charset="us-ascii"
Content-Description: specializer

/*****************************************************************************
			        Type definitions
*****************************************************************************/

/* Unit
 *
 * This is what information we need about a Unit.
 */
Type Unit {
	Int			Id;		/* Identifier */

	/* These must be lists so that we can correlate names with values */
	List<Name>		ContextVariables; /* Set of variables in context */
	List<Name>		LiveVariables;	/* Set of incoming live static
						   variables not in context */
	List<Name>		StaticVariables; /* Set of static variables known
						    following unit setup
						    computations */

	CachePolicy		Policy;		/* Cache policy */
	DAG<BB>			CFG;		/* DAG of instructions (some IR) */
	List<Edge>		Successors;	/* List of successor units */
};

/* Edge
 *
 * Abstract inter-unit edge information.
 */
Type Edge {
	Unit			Destination;
	Int			SuccessorNum;	/* which successor of the branch */
	SpecializeTime		Policy;		/* Eager or lazy */
	List<Variable>		VariablesPromoted; /* D->S promos occur on edges */
};

/* SpecializeTime
 *
 * Inter-unit edge classification.
 */
Type SpecializeTime { Eager | Lazy };

/* CachePolicy
 *
 * Whatever type of cache policy we decide to allow.
 */
Type CachePolicy {
	/* CacheAll, Cache(n), replacement policy? */
};

/* BB
 *
 * IR for a basic block.
 */
Type BB {
	List<Operation>		Operations;

	.
	. /* whatever other information is needed for codegen  */
	.
};

/* Operation
 *
 * IR for an operation.
 */
Type Operation {
	OpType			Op;
	List<Operand>		Operands;
	List<Variable>		Results;

	BindingTime		OpBT;
	List<BindingTime>	OperandBTs;
	List<BindingTime>	ResultBTs;

	.
	. /* whatever other information is needed for codegen  */
	.
};

/* BindingTime
 *
 * Per-operation and per-operand binding time annotation.
 */
Type BindingTime { Static | Dynamic };

/* Variable
 *
 * IR for a variable.
 */
Type Variable {
	Name			VariableName;
	DataType		VariableType;
	.
	. /* whatever other information is needed for codegen  */
	.
};

/* Operand
 *
 * An Operand can either be a variable or a constant.
 */
Type Operand { Variable | Value };

/* OpType
 *
 * Type of operations in the IR.
 */
Type OpType { Call | Jump | Beq | Add | ... };

/* EmitOperand
 *
 * Lists of Operands are useful for varargs.
 */
Type EmitOperand { Operand | List<Operand> };


/*****************************************************************************
		     Primitive declarations/definitions
*****************************************************************************/

/* CacheInsert
 *
 * Inserts the location of a version of a unit into the cache with the key
 * (UnitId, Context).
 */
Function CacheInsert
Input:
	Int			UnitId;
	List<Value>		Context;
	Address			UnitAddress;
	CachePolicy		Policy;
Output:
	Void
	/* side-effects the code cache */

/* CacheLookup
 *
 * Retrieves the location from the cache of the code keyed by (UnitId, Context).
 */
Function CacheLookup
Input:
	Int			UnitId;
	List<Value>		Context;
Output:
	Bool			UnitFound;
	Address			UnitAddress;

/* RequiresLookup
 *
 * Queries cache policy to determine whether a cache lookup is needed or not.
 */
Function RequiresLookup
Input:
	CachePolicy		Policy;
Output:
	Bool			Required;

/* ValueListBuild
 *
 * Constructs a list of values of the static variables listed in NewVariables.
 * The variables listed in NewVariables must be present in either the
 * OldVariables or AdditionalVariables list (but not in both).  The list
 * of values, NewValues, is then constructed from the corresponding values in
 * OldValues and AdditionalValues.
 */
Function ValueListBuild
Input:
	List<Name>		OldVariables;
	List<Value>		OldValues;
	List<Name>		AdditionalVariables;
	List<Value>		AdditionalValues;
	List<Name>		NewVariables;
Output:
	List<Value>		NewValues;

/* EmitOperation
 *
 * Emits machine instruction(s) corresponding to the specified operation.
 */
Function EmitOperation
Input:
	OpType			Op;
	List<EmitOperand>	Operands;
	List<Variable>		Results;
Output:
	Address			OperationAddr;
	/* also side-effects the writable instruction space and pointer */

/* ReduceAndReisdualize
 *
 * Residualizes the specified unit using the values of the static variables
 * provided by Context and OtherLiveVariables.  Returns the address of the
 * residual code, a list of branches to other units that require patching
 * (in the same order as Unit.Successors), and the values of all known
 * static variables (because the static variables are in SSA form, one list
 * suffices for all out edges).
 */
Function ReduceAndResidualize
Input:
	Unit			Unit;
	List<Value>		Context;
	List<Value>		OtherLiveVariables;
Output:
	Address			ResidualAddr;
	List<Address>		OutBranches;
	List<Value>		KnownStaticVariables;
	/* also side-effects the writable instruction space and pointer */

/* BackPatch
 *
 * Patch the specified branch, if needed.  Passing additional information,
 * such as jump-table location (if applicable) or information about other
 * successors, could speed up this routine.
 */
Function BackPatch
Input:
	Address			BranchAddr;
	Int			SuccessorNum;
	Address			DestinationAddr;
Output:
	Void
	/* also side-effects the writable instruction space */

/* Specialize
 *
 * Residualize the specified unit and all eager successors if the unit
 * isn't in the cache.  Emit callbacks for all lazy successors.  Callbacks
 * are spliced out, but the instruction cache need not be flushed because
 * the jump following the callback is still valid.
 */
Function Specialize
Input:
	Unit			Unit;
	Bool			Patch;
	Address			BranchAddr;
	Int			SuccessorNum;
	List<Value>		Context;
	List<Value>		OtherLiveVariables;
Output:
	Address			UnitAddr;
	/* also side-effects the writable instruction space and pointer */
{
    If (RequiresLookup(Unit.Policy)) {
        (Found, UnitAddr) = CacheLookup(Unit.Id, Context);
    } Else {
	/* If no lookup is required, we'll always generate the code */
	Found = False;
    }

    If (! Found) {
        (UnitAddr, OutBranches, KnownStaticVariables) =
            ReduceAndResidualize(Unit, Context, OtherLiveVariables);

        CacheInsert(Unit.Id, Context, UnitAddr, Unit.Policy);

        If (Patch) {
            BackPatch(BranchAddr, SuccessorNum, UnitAddr);
        }

	/* The Edge and BranchAddr lists must be correlated */
        Foreach (Edge, BranchAddr) In (Unit.Successors, OutBranches) {

            If (Edge.Policy == Eager) {

                Specialize(Edge.Destination, True,
                       BranchAddr, Edge.SuccessorNum,
		       ValueListBuild(Unit.StaticVariables, KnownStaticVariables,
				      (), (), Edge.Destination.ContextVariables),
		       ValueListBuild(Unit.StaticVariables, KnownStaticVariables,
				      (), (), Edge.Destination.LiveVariables));

            } Else {

                If (! Empty(Edge.VariablesPromoted)) {

		    /* This just takes a list of variables and produces a list
		       of their names */
		    PromotedNames = Map(Edge.VariablesPromoted, .Name);

		    /* Incorporate the promoted variables into the context */
                    CallbackAddr = EmitOperation(Call,
                            (ValueListBuild,
			     Unit.StaticVariables, KnownStaticVariables,
			     PromotedNames, Edge.VariablesPromoted,
                             Edge.Destination.ContextVariables),
                            (Variable("NewContext", List<Value>, ...)));

		    /* Generate the callback to the specializer */
                    Nil = EmitOperation(Call,
                            (Specialize,
                             Edge.Destination,
			     ! RequiresLookup(Edge.Destination.Policy),
                             BranchAddr, Edge.SuccessorNum,
                             Variable("NewContext", List<Value>, ...),
			     ValueListBuild(Unit.StaticVariables,
					    KnownStaticVariables, (), (),
					    Edge.Destination.LiveVariables)),
                            (Variable("NextPC", Address)));

                } Else {

		    /* Generate the callback to the specializer */
                    CallbackAddr = EmitOperation(Call,
                            (Specialize,
                             Edge.Destination, True,
                             BranchAddr, Edge.SuccessorNum,
			     ValueListBuild(Unit.StaticVariables,
					    KnownStaticVariables, (), (),
					    Edge.Destination.ContextVariables),
			     ValueListBuild(Unit.StaticVariables,
					    KnownStaticVariables, (), (),
					    Edge.Destination.LiveVariables)),
                            (Variable("NextPC", Address, ...)));

                }

		/* Emit the jump to the residual */
                Nil = EmitOperation(Jump, (Variable("NextPC", Address, ...)), ());

		/* Link in the callback */
		BackPatch(BranchAddr, SuccessorNum, CallbackAddr);

            }
        }
    } Else {

	/* Back-patching may be required even if we find the code in the cache */
        If (Patch) {
            BackPatch(BranchAddr, SuccessorNum, UnitAddr);
        }

    }

    Return UnitAddr;
};
--boundary=_0--


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@cs
Subject: PEPM `97 web page alive again
Date: Thu, 20 Mar 1997 10:05:48 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

http://www.cs.washington.edu/research/projects/unisw/DynComp/www/Internal/Documentation/DyC/Papers/PEPM97/


********************************************************************************
********************************************************************************
********************************************************************************

To: matthai@thistle.cs.washington.edu
cc: spin-comp@thistle.cs.washington.edu
Subject: clustering
Date: Thu, 20 Mar 1997 13:29:20 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Matthai,

We decided that we would like to put at least a simple clustering algorithm
in the paper, and that you would continue to be in charge of working on it.

We also decided a few other things wrt clustering:

-- Clustering will run after the BTA and incrementally update the output of
   the BTA (unless computing the updates turns out to be too complicated).
   It looks like clustering will then go in the GEgen phase and be part of
   the unit identification process.

-- You only have to worry about moving makeStatics and makeDynamics.  A 
   prepass to the BTA will use liveness and use-def information to insert
   makeStatics after every may def pt. of automatically promoted variables
   with uses in the dynamic region and makeDynamics after last uses.  This
   may prove tricky because the prepass would have to determine the
   boundaries of the dynamic region.
   The BTA will remove redundant makeStatics and makeDynamics, leaving
   makeStatics at every possible dynamic-to-static promotion point and
   makeDynamics where needed.  I believe Markus is working on this.

-- makeStatics and makeDynamics should not be moved across uses of the
   variables they annotate.

-- The clustering algorithm should cluster makeStatics and makeDynamics
   with each other and with immovable unit boundaries: discordant merges,
   nodes with at least one lazy incoming edge, merges with incoming edges
   from different units.  The last of these is tricky because depending on
   how makeStatics and makeDynamics are moved, more such merges could be
   introduced.

-- makeStatics should generally be moved below defs.  makeDynamics could
   be moved in either direction across defs to accomplish clustering.

-- Moving mSs and mDs across branches in merges can greatly increase or
   decrease the number of unit boundaries.  It isn't clear what should be
   done.  Some kind of control/data dependence graphs may help (e.g., in
   moving across innocuous if/then/else diamonds).  There are similarities
   to global code motion.

I've updated my outline to reflect these changes:

http://www.cs.washington.edu/research/projects/unisw/DynComp/www/Internal/Documentation/DyC/Papers/PEPM97/final/outline3.txt

--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: chambers@franklin.cs.washington.edu
cc: spin-comp@franklin.cs.washington.edu
Subject: Interpreter example annotated for selective specialization
Date: Tue, 25 Mar 1997 13:05:29 PST
From: Matthai Philipose <matthai@franklin.cs.washington.edu>

Craig,
Markus said you'd like the interpreter example with annotations
for selective specialization.

I've appended the code to this mail.

The example is copied from the final PEPM submission. I have
just added a case for function invocation.
			Matthai
-----------------
interpret(byteCodeT byteCodes[],
          int argList[], int numArgs) {
   int valueStack[STACK_SIZE];
   int pc = 0, vsp = 0;
   int arg, opcode;

   /* push arguments */
   for (arg = numArgs; arg > 0; arg--)
      valueStack[vsp++] = argList[arg-1];
      
      /*DyC annotations for polyvariant
        specialization on bytecodes and pc
        at every downstream use*/
		      makeStatic(bytecodes : transitive);
      makestatic(pc);

      for (;;) {
      opcode = byteCodes[pc];
      switch (opcode)
         
	case ADD:
            valueStack[vsp-1] =
                valueStack[vsp] +		
                valueStack[vsp-1];
	            vsp--; pc++;
	            break;
         
	case NEQ:
       pp0: if (valueStack[vsp] != 0)
		               pc = byteCodes[pc+1];
            else 
               pc += 2;
       pp1: break;
         

	 case IGOTO:
       pp2: pc = valueStack[vsp--];
	            break;
        
	 // This is the new case added... the opcode for function invocation
	  case FUNCALL:
		 pc = byteCodes[pc+1]; //Get the function to jump to
		 
		 incrementNumCalls(pc); //Keep track of how many times this fn has been 
invoked
		 if (NumCalls(pc) < MIN_NUM_CALLS) //Don't specialize infrequent functions
			makeDynamic(pc);
		
		  break;
	/* ... */
	 
         case RETURN:
            return valueStack[vsp];
      }
   }
}

-----------------

********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: "incremental" specialization
Date: Wed, 26 Mar 1997 13:59:37 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

By "incremental", Consel meant at several stages in an OS.  For example:
	-- compile time
	-- link time
	-- load time
	-- boot time
	-- run time

So, this is a little different from our lazy specialization, which is all
at run time.

(from his `93 PEPM paper)

--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: "Charles Consel" <Charles.Consel@irisa.fr>
cc: spin-comp@thistle.cs.washington.edu
Subject: Tempo's BTA and PEPM `97 deadline
Date: Wed, 26 Mar 1997 15:09:14 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Charles,

We're writing the related-work section for our PEPM paper, and just
wanted to confirm a few details of Tempo's BTA and action analysis.
We picked up some understanding of what Tempo does during your visit,
but your visit was too short. :-)

We would appreciate short answers to the following questions.  Not all
of these issues may come up in the final version of our paper, but we
would like to include some kind of concrete comparison between Tempo and
our system.  We think we understand the alias analysis, handing of partially
static structures, etc., but there are a few general points on which we're
still a little uncertain.

-- What kind of polyvariant division does Tempo perform?
	-- Multiple divisions per function?
		My understanding: possibly for those functions annotated as
		entry points

	-- Multiple divisions per program point?
		My understanding: only at joins following conditionals with
		static tests; is this done for switch statements as well as
		if statements?

-- What kind of polyvariant specialization does Tempo perform?
	-- How does Tempo pick up run-time values and use them for
	   specialization?
		My understanding: from globals or parameters of functions
		annotated as entry points; are the values of static globals
		read when an associated function (an entry point) is
		specialized?

	-- Multiple specializations per function?
		My understanding: only for functions not annotated as entry
		points?

	-- Multiple specializations per program point?
		My understanding: only simple loop unrolling, not for
		arbitrary control-flow merges; e.g., can a loop such as
		the following be reduced?
      red         ev            ev   id  ev reb  reb      ev         ev
   for    ((i = 0)  ; (i < size)  ; v  [i  ]    ?    (i++)   : (i+=2)  )
      { ... }
		(Basically, the static update of the loop induction variable
		 is guarded by a dynamic conditional.)

		Also, what do you do about loops with both static and dynamic
		exit tests (multiple loop exits)?

Also, at what time next Thursday are the papers due?  Can we send the
copyright form early and submit the final version of the paper
electronically on April 3rd, or should that type of thing only be done as
a last resort?

Thank you for your help.  I look forward to seeing you in Amsterdam.
--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: GEgen (section 6)
Date: Thu, 27 Mar 1997 16:51:21 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

I've written up some details about units and unit identification.  Tonight,
I'll work more on writing up the production of generating extensions.  This
is what I have so far.  It is also on our PEPM97 web page.

Section 6: Title (GEgen)?

Steps we perform:

1) Generate new functions for different divisions of specialized
   functions (and create Function structs) and split merges with
   multiple divisions, yielding monovariant divisions
2) Unit identification
3) Separate setup and template code
4) Insert calls to the specializer at dynamic-to-static promotion
   points on region entries from static code
5) Produce optimized generating extensions


Unit Identification
----------------------------------------------------------------------

What are specialization units?

-- Specialization units are the largest subgraphs that the
   specialization algorithm can manipulate.

-- The specializer makes decisions about which code to specialize at
   the granularity of units, and consequently cache insertions,
   deletions, and lookups also operate on units.

-- Cache lookups ensure that specialized code is reused/shared if it
   exists (important at promotions, demotions, and discordant merges)

What do specialization units look like?

-- Roughly, a unit begins at a specialization point (this is a term
   used in the literature, e.g., by Bulyonkov and Kochetov `96; they
   cite the JGS93 book), which is a point of which the specializer
   should be aware (to lookup and/or begin specialization)

-- Units differ from basic blocks in that they can contain calls and
   multiple exits.

-- A unit must have a single entry

	-- If a predecessor to a node has a different context (and,
	   hence, must be from a different unit), a cache lookup will
	   be required either at run time (if there are promotions) or
	   at specialization time (if there are only demotions), and
	   so the node must start a new unit even if it has other
	   predecessors with the same context

	-- Even if all predecessors to a node have the same context as
	   the node, but they don't all belong to the same unit (e.g.,
	   due to "voluntary" laziness), then the node must start a
	   new unit because it must be possible to generate code for
	   that node if it is reached by any of its predecessors
	   (remember that the specializer only generates code for
	   whole units)

	-- Even if you know that a particular predecessor will be
	   specialized before the node is reached, when the other
	   predecessors are specialized, you need to be able to find
	   the node in order to link (backpatch) the branches to it.
	   Thus, some kind of context-dependent lookup is required,
	   which is essentially a cache lookup.

	   For example:
	   If some node d in the unit of one of the predecessors of
	   node n dominates all of the predecessors and there exists
	   downward paths (not traversing back edges) from d to
	   each of the predecessors that do not contain lazy edges,
	   then n _could_ be included in the same unit as d

	   A concrete instance of this:
		make_static(x);
	L1:	...x...
		if (d1) {
		    ...x...
		} else {
		    if (d2) {
			make_static(y);
	L2:		...y...
			make_dynamic(y);
		    }
	L3:	    ...x...
		}
	L4:	...x...
		make_dynamic(x);
	   The node marked L4 could be included in the same unit as
	   the node marked L1.  But, without a cache lookup, how would
	   the specializer link the bottom of L3 to L4?
	   
Simple unit identification

-- A node begins a new unit if any of the following is true:
	-- At least one incoming edge has dynamic-to-static promotions
	   (explicit in BTA output) 
	-- At least one incoming edge has static-to-dynamic
	   "demotions" (difference of context sets) 
	-- At least one incoming edge is lazy (explicit in BTA output)
	-- It is a discordant merge (explicit in BTA output)
	-- It has predecessors from different units (for single-entry)

-- Build unit structs (input to specializer), including id, context
   variables, other live variables, all static variables, cache
   policy, successor edge information (dest. unit, successor number,
   eager/lazy, promotions), and call information (in the
   interprocedural version), and make demotions explicit as well for
   clustering 

Clustering

Matthai is writing up the algorithm.  Here is the basic idea.
Clustering attempts to reduce the number of units by moving, merging,
and/or eliminating unit boundaries.  It does so by moving the
promotions and demotions within the CFG.  If promotions and demotions
are made to coincide on the same edges with each other or with lazy
edges, or if promotions and demotions are moved to edges entering
discordant merges or multi-unit merges, then the number of unit
boundaries is reduced.  Discordant merges and multi-unit merges can be
eliminated as well by pushing promotions or demotions past such
merges.

Clustering must update the information in the unit structs.

Produce optimized generating extensions
----------------------------------------------------------------------

		-- DAG construction via splitting & merging
			-- Want to specialize all blocks of a unit
			   that could potentially be reached => pursue
			   all paths under dynamic control
			-- Want to know order of traversal of the
			   blocks at compile time
				-- Better optimization of static code
				-- Eliminate run-time work-list overhead
			-- So, build a DAG of blocks reached
			-- Difficult because of
				-- Unstructured code
				-- Combinations of static and dynamic branches
				-- Loops / back edges
			-- Solution
				-- Pursue all static paths (static
				   paths are determined by which
				   static branches are taken)
				-- Copy blocks when they are reached
				   in different orders on different
				   static paths
			-- Algorithm - should we put the full
			   algorithms in the appendix given that they
			   are untested?  how much detail in the body?
				-- Splitting
					-- Split all static paths
					-- Add nodes to a given static
					   path as they are reached;
					   topologically order nodes
					   on different dynamic paths
				-- Merging
				   Merge the tails of branches of the
				   tree created by splitting
					-- Create a dummy node and
					   link it to all of the
					   leaves of the unit's tree
					-- Add the sink to a work list
					-- Until the work list is
					   empty, remove the head of
					   the list. Compare the
					   sources of all incoming
					   edges into the node. If two
					   source nodes represent the
					   same block and they have
					   the same successors in this
					   graph, then merge the two
					   nodes and add the merged
					   node to the work list if it
					   isn't there already.
		-- Eliminate unit boundaries for eager successors with single
		   run-time predecessors, and for those with multiple
		   predecessors but the replicate policy, statically copying
		   code for those with multiple compile-time predecessors
		-- Eliminate cache lookups for lazy successors with no
		   dynamic-to-static promotions and single run-time
		   predecessors or multiple predecessors and the
		   replicate policy
		-- What about unchecked eager promotions for static
		   pointer derefs?
	-- Further optimizations to the run-time specializer
		-- In-line and specialize the run-time primitives
			-- Everything in the Unit structures is known
			   at compile time
			-- Recursion is implemented with a stack and
			   jump (tail recursion!)
			-- Eliminate stack manipulation for all first
			   eager successors
			-- Saving/restoring of state and argument
			   passing can be specialized in many cases as
			   well (e.g., at lazy callbacks)
		-- We don't explicitly pass around a set of all live
		   static variables; we load/store them from/to the cache
		-- Hybrid caching scheme?
			-- Use hierarchy when adding to the context
			-- Reset to top of hierarchy when remove from context


********************************************************************************
********************************************************************************
********************************************************************************

To: chambers@klamath (Craig Chambers)
cc: spin-comp@cs
Subject: Re: annotations section
Date: Thu, 27 Mar 1997 22:56:03 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

>> OK, I've written one more section, section 4 on the annotations.

It might be a good idea to have some simple examples (e.g., reiterate that
in order to unroll a loop, the induction variable should be annotated
with make_static).  In the PLDI submission draft and in various talks,
people have had difficulty developing an intuition for how to use the
annotations.  Presumably, the PEPM crowd should do better, but a trivial
example or two couldn't hurt.

You've done a pretty good job spelling out all of the policies.  The
promote_one_unchecked_eager policy is a good idea.  However, if
the annotation is on a variable other than the pointer itself, it isn't
clear to me how it would work.  Other promotions work by inserting a
lazy point after dynamic computations are performed.  This one needs
to instead make certain dynamic computations (i.e., the load) and values
(the result of the load) static, and the computations don't involve the
annotated variable.  Am I missing something?

Also, IMO, we shouldn't make it sound like we think alias analysis is a
bad idea.  I suggest we tone down the discussion about alias analysis a
little, though we can certainly say that the static* annotations will still
be necessary even with better alias analysis.

Another idea for future work: optimization policies controlling, for
example, run-time register allocation or register actions.

Also, somewhere we need to mention what the default policies are for
derived variables (mono_everything and no_promote) and why (to prevent
all loops counting up from a constant from being unrolled, etc.).  Will
that go in the BTA section or the annotations section?

There is a question in the text about what the difference is between using
specialize and make_static.  For concreteness:
	specialize f(x,y,z) on (x,y);
		vs.
	f(x,y,z) { make_static(x,y); ... }

1. specialize only specializes f on x and y if they are static at the call
site.  A make_static inside f would cause all arguments passed for x and y
to be promoted.

2. specialize permits x and y to begin as static in f, without the potentially
redundant promotion at the beginning of f.  Thus, f may be eagerly specialized
after specializing the caller, possibly eliminating a run-time cache check.

3. specialize doesn't have to pass x and y to the specialized version of f.

Any other differences?

Perhaps we should at least mention something about specialize and function
pointers (stubs, calling general versions, topic of future work, whatever).

--Brian

********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: more on section 6
Date: Fri, 28 Mar 1997 09:43:26 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

I made a few changes to the text I sent out yesterday, and added more on
producing the generating extensions.  I included the code I wrote for the
splitting algorithm, but I haven't looked at it lately, so I'm not yet
convinced that it is safe to include. :-)

--Brian

Section 6: Title (GEgen)?
======================================================================

Steps we perform
----------------

1) Generate new functions for different divisions of specialized
   functions (and create Function structs) and split merges with
   multiple divisions, yielding monovariant divisions
2) Unit identification
3) Separate setup and template code
4) Insert calls to the specializer at dynamic-to-static promotion
   points on region entries from static code
5) Produce optimized generating extensions


Unit Identification
----------------------------------------------------------------------

What are specialization units?
------------------------------

-- Specialization units are the largest subgraphs that the
   specialization algorithm can manipulate.

-- The specializer makes decisions about which code to specialize at
   the granularity of units, and consequently cache insertions,
   deletions, and lookups also operate on units.

-- Cache lookups ensure that specialized code is reused/shared if it
   exists (important at promotions, demotions, and discordant merges)

What do specialization units look like?
---------------------------------------

-- Roughly, a unit begins at a specialization point (this is a term
   used in the literature, e.g., by Bulyonkov and Kochetov `96; they
   cite the JGS93 book), which is a point of which the specializer
   should be aware (to lookup and/or begin specialization)

-- Units differ from basic blocks in that they can contain calls and
   multiple exits.

-- A unit must have a single entry

	-- If a predecessor to a node has a different context (and,
	   hence, must be from a different unit), a cache lookup will
	   be required either at run time (if there are promotions) or
	   at specialization time (if there are only demotions), and
	   so the node must start a new unit even if it has other
	   predecessors with the same context

	-- Even if all predecessors to a node have the same context as
	   the node, but they don't all belong to the same unit (e.g.,
	   due to "voluntary" laziness), then the node must start a
	   new unit because it must be possible to generate code for
	   that node if it is reached by any of its predecessors
	   (remember that the specializer only generates code for
	   whole units)

	-- Even if you know that a particular predecessor will be
	   specialized before the node is reached, when the other
	   predecessors are specialized, you need to be able to find
	   the node in order to link (backpatch) the branches to it.
	   Thus, some kind of context-dependent lookup is required,
	   which is essentially a cache lookup.

	   For example:
	   If some node d in the unit of one of the predecessors of
	   node n dominates all of the predecessors and there exists
	   downward paths (not traversing back edges) from d to
	   each of the predecessors that do not contain lazy edges,
	   then n _could_ be included in the same unit as d

	   A concrete instance of this:
		make_static(x);
	L1:	...x...
		if (d1) {
		    ...x...
		} else {
		    if (d2) {
			make_static(y);
	L2:		...y...
			make_dynamic(y);
		    }
	L3:	    ...x...
		}
	L4:	...x...
		make_dynamic(x);
	   The node marked L4 could be included in the same unit as
	   the node marked L1.  But, without a cache lookup, how would
	   the specializer link the bottom of L3 to L4?
	   
Single-Run-Time-Predecessor Optimizations
-----------------------------------------

We can relax the compile-time single-entry requirement for units if we
know that a node will have only a single predecessor _at_run_time_.

We can eliminate unit boundaries for eager successors with single
run-time predecessors (e.g., at a static multi-unit merge where all
predecessors have the same context), and for those with multiple
predecessors but the replicate policy.  Thus, it is possible for
multiple units to share the same code.

Similarly, we can eliminate cache lookups for lazy successors with no
dynamic-to-static promotions and single run-time predecessors (e.g.,
at lazy successors to dynamic branches) or multiple predecessors and
the replicate policy.

Simple unit identification
--------------------------

-- A node begins a new unit if any of the following is true:
	-- At least one incoming edge has dynamic-to-static promotions
	   (explicit in BTA output) 
	-- At least one incoming edge has static-to-dynamic
	   "demotions" (difference of context sets) 
	-- At least one incoming edge is lazy (explicit in BTA output)
	-- It is a discordant merge (explicit in BTA output)
	-- It has immediate predecessors (predecessors found on back
	   edges are ignored) from different units and it is a
	   non-constant merge and the cache policy is not replicate
	   (for single-entry, called a multi-unit merge)

-- Unchecked eager promotions for static dereferences of static
   pointers do not create new unit boundaries

-- Build unit structs (input to specializer), including id, context
   variables, other live static variables, all known static variables,
   cache policy, successor edge information (dest. unit, successor number,
   eager/lazy, promotions), and call information (in the
   interprocedural version), and make demotions explicit as well for
   clustering; only CFG information need not be contructed at this point

Clustering
----------

Matthai is writing up the algorithm.  Here is the basic idea.
Clustering attempts to reduce the number of units by moving, merging,
and/or eliminating unit boundaries.  It does so by moving the
promotions and demotions within the CFG.  If promotions and demotions
are made to coincide on the same edges with each other or with lazy
edges, or if promotions and demotions are moved to edges entering
discordant merges or multi-unit merges, then the number of unit
boundaries is reduced.  Discordant merges and multi-unit merges can be
eliminated as well by pushing promotions or demotions past such
merges.

Clustering must update the information in the unit structs.


Produce optimized generating extensions
----------------------------------------------------------------------

In order to make run-time generation of dynamic code as efficient as
possible, the Specialize function is manually specialized for each
specialization unit, creating an optimized generating extension for
each unit.  All of the information in the unit structs is known at
static compile time, and so can be used in specializing the
specializer.

All of the run-time primitives called by the specializer (see section
2) can be inlined into the body of the generating extension.  The bulk
of the generating extension is, of course, embodied in the specialized
ReduceAndResidualize primitive.  The other primitives, when inlined
and specialized, incur little additional overhead.

Other optimizations
-------------------

Recursive calls to the specializer are implemented with a manually
manipulated stack of units to specialize and jumps.  In a few cases,
this stack manipulation may be avoided.  For example, when a dynamic
branch to another unit appears on all static paths through a unit, the
specializer may proceed directly to that successor unit after
specializing the first one.

We don't explicitly pass around a set of all live static variables; we
load/store them from/to the cache as needed.

We don't include all live static variables in the context used for
cache lookups, but rather only those that trigger polyvariant
specialization (promoted and discordant variables).  Because derived
static variables are monovariantly specialized and not promoted, they
never appear in the context.

Variables promoted from dynamic to static may be passed to the
generating extension in registers.

A hybrid hierarchical caching scheme is possible, where a level is
added to the hierarchy when adding to the context (promotions), but
the level is reset to the top of the hierarchy when removing from the
context (demotions).  The hierarchy allows checking only on the new
variables in the context.  Resetting the level to the top at demotions
avoids the cost of traversing back up the tree and possibly down
another branch (e.g., at a merge of contexts {x,y} and {x,z}).

DAG conversion
--------------

At specialize time, we must reduce and residualize all blocks of a
unit that could potentially be reached at run time, so we must pursue
all paths under dynamic control.  So that ReduceAndResidualize does
not have to traverse the unit subgraph using a work list and visited
flags, the unit subgraph is converted to a DAG that can be simply
traversed top to bottom.  This process goes beyond simple unrolling of
the traversal loop.  Because converting the unit subgraph to a DAG
also allows us to know order of traversal of the blocks at compile
time, we can also better optimize the generating extension using
standard compiler optimizations.

Building a DAG of the basic blocks reached is difficult in the
presence of unstructured code, combinations of static and dynamic
branches, and loops or other back edges.  Our solution is to pursue
all static paths (static paths are determined by which static branches
are taken), copying blocks when they are reached in different orders
on different static paths.  Thus, the order of traversal of the basic
blocks will be known for each static path.

Our algorithm first splits all static paths, creating a tree of basic
blocks.  Nodes under dynamic control are linearized in a topological
order.  Nodes in the tree with fanout greater than one correspond to
static branches.  Each node can appear in a given static path at most
one time.  Then, all leaves of the tree are linked to a sink
node, which is added to a work list.  Until the work list is empty,
remove a node from the work list and unify all its identical
predecessors with the same successors, adding unified nodes to the
work list.  The merging algorithm can also be used to merge common
tails of different units.


The Sorting & Splitting Algorithm


Pseudocode Notation

     RC(Node) returns the reachability conditions for Node 
     UnitHead(Node) returns the head of the unit to which Node belongs 
     IsLazyUnit(Node) returns true if the unit to which Node belongs must be stitched lazily,
     false if the unit can be stitched eagerly 
     Zero(Node) returns 0 
     Topo#(Node) returns the topological number of Node 
     IsPredecessor(Node1, Node2) is true if Node1 is a predecessor of Node2 or Node1 ==
     Node2, false if Node1 succeeds Node2 or Node1 neither preceeds nor succeeds Node2 
     NewNode = CopyNode(OldNode) creates a copy NewNode of node OldNode (not including
     connecting edges) 
     Edges are ordered by the topological numbers of their destinations; Node->Edge[i] is the
     ith edge of Node; Node->Edge[0] is the first edge of Node 
     Edge->Destination is the destination node of Edge 
     Edge->Condition is the condition (NoCondition, True, False, 1, 100, etc.) of Edge 
     (F0, F1, ..., FN) is an (N+1)-tuple of objects F0, F1, ..., FN; F0 is in field 0, F1 is in field
     1, etc. 
     Field(Tuple, N) returns field N of Tuple 
     Stacks: stacks contain fixed-length tuples of pointers or scalars 
     Stack = CreateEmptyStack(Size) 
         Creates the empty stack Stack of tuples of size Size 
     DestroyStack(Stack) 
         Frees Stack 
     EmptyStack(Stack) 
         Returns true if Stack is empty, false otherwise 
     PushOntoStack(Stack, Tuple) 
         Pushes Tuple onto Stack 
     Tuple = PopOffStack(Stack) 
         Pops and returns the top of Stack 
     Tuple = PeekAtStack(Stack) 
         Returns the top of Stack without popping 
     Lists: sorted lists contain fixed-length tuples of pointers or scalars and no duplicate tuples
     List = CreateEmptyList(Size, SortField, ValueFunction) 
         Creates a list of tuples of size Size that is sorted in ascending order by the value
         returned by invoking ValueFunction on field SortField of each tuple 
     DestroyList(List) 
         Frees List 
     EmptyList(List) 
         Returns true if List is empty, false otherwise 
     InsertIntoList(List, Tuple) 
         Inserts Tuple into List after all other tuples of lesser or equal rank 
     RemoveFromList(List, Tuple) 
         Removes Tuple from List; a wildcard (*) may be used in any tuple field 
     IsInList(List, Tuple) 
         Returns true if Tuple is in List, false otherwise 
     Tuple = HeadOfList(List) 
         Returns the head of List (least tuple by the ValueFunction) 
     Tuple = TailOfList(List) 
         Returns the tail of List (greatest tuple by the ValueFunction) 
     NewList = CopyList(OldList) 
         Creates a copy NewList of list OldList 

Description

This algorithm creates a list of lists of blocks for each unit. These
lists are then passed to the merging algorithm for merging. The blocks
in each list are in topological order. Each list corresponds to one
combination of true and false paths from static branches. So, the
first list contains the blocks reached when all static branches in the
unit evaluate to true, and the last list contains the blocks reached
when they all evaluate to false.

As presented, this algorithm has a problem with back edges from one
side of a static if into another static if body. This situation can
create a difference between lists (i.e., blocks in one list but not
the other) before the static branch causing the difference appears in
the lists. The two possible solutions are to include the blocks in
question in both lists, or to move the blocks below the static
branch. The first solution has the disadvantage that some blocks will
be unnecessarily stitched on some paths. The second solution inhibits
merging. I plan to go with the second solution because its run-time
efficiency is higher (no blocks unnecessarily stitched) and because it
is easier to implement.

Note that this algorithm assumes that units cannot contain
disconnected blocks.

The merging algorithm requires a tree rather than a list of
lists. That tree can be reconstructed from the lists generated by this
algorithm, or this algorithm can be modified to generate the tree
directly.

Variables

     UnitRootList is a list of nodes sorted in ascending order by
     topological number; it is a list of nodes that are the roots of
     specialization units  
     PathStack is a stack of 2-tuples: (List, List); the tuples
     contain information about paths that need to be completed 
     NextNodeList is a list of nodes of the graph sorted in ascending
     order by topological number; it is a list of nodes that need to
     be stitched, but haven't been copied to the duplicate graph yet 
     VisitedNodeList is a list of 2-tuples: (Node, EdgeCondition);
     where Node is a node of the graph and EdgeCondition indictates
     which path was taken from a static branch 
     (NoCondition for nodes that aren't static branches); the list is
     sorted in ascending order by topological number of the nodes; it
     is a list of nodes along one path down a unit  
     UnitPathList is a list of all VisitedNodeLists, and is what will
     be handed to the tail-merging algorithm 

Pseudocode

procedure SortAndSplitUnits(UnitRootList) {

    UnitPathList = CreateEmptyList(1, 0, Zero)
    
    until EmptyList(UnitRootList) {
    
        /* Get the root of this specialization unit */
        (CurrentUnit) = RemoveHeadOfList(UnitRootList)
    
        PathStack = CreateEmptyStack(2)
    
        VisitedNodeList = CreateEmptyList(2, 0, Topo#)
        InsertIntoList(UnitPathList, (VisitedNodeList))
    
        NextNodeList = CreateEmptyList(1, 0, Topo#)
        InsertIntoList(NextNodeList, (CurrentUnit))
    
        PushOntoStack(PathStack, (VisitedNodeList,
          NextNodeList))
    
        until EmptyStack(PathStack) {
            /* Start on a new path through the unit */
            (VisitedNodeList, NextNodeList) = PopOffStack(PathStack)
    
            /* Add to the path until it is completed */
            until EmptyList(NextNodeList) {
                (NextNode) = RemoveHeadOfList(NextNodeList)
    
                if EndsInStaticBranch(NextNode) {
                    /* Static branches spawn new paths through the
                       unit.  The current path follows the first edge.
                     */

                    for each outgoing edge NextNode->Edge[i]
                      except the first (in reverse topo. order) {
                        /* Create a new path */
                        NewVisitedList = CopyList(VisitedNodeList)
                        InsertIntoList(UnitPathList, (NewVisitedList))

                        /* Keep track of which path we took */
                        InsertIntoList(NewVisitedList,
                          (NextNode, NextNode->Edge[i]->Condition))
    
                        /* Work on each new path begins at a different
                           destination.  Stay within the unit and
                           don't revisit visited nodes (must check
                           because we follow back edges).
                         */
                        NewNextList = CopyList(NextNodeList)
                        Destination = NextNode->Edge[i]->Destination
                        if ((UnitHead(Destination) == CurrentUnit) &&
                           (! IsInList(NewVisitedList, (Destination,
                           *)))) {
                            InsertIntoList(NewNextList, (Destination))
                        }
    
                        PushOntoStack(PathStack,
                          (NewVisitedList, NewNextList))
                    }
    
                    InsertIntoList(VisitedNodeList,
                      (NextNode, NextNode->Edge[0]->Condition))
    
                    Destination = NextNode->Edge[0]->Destination
                    if ((UnitHead(Destination) == CurrentUnit) &&
                      (! IsInList(VisitedNodeList, (Destination, *)))) {
                        InsertIntoList(NextNodeList, (Destination))
                    }
                }
                else {
                    InsertIntoList(VisitedNodeList,
                      (NextNode, NoCondition))
    
                    /* Visit all children of this node that are in
                       this unit and that have not already been
                       visited.
                     */
                    for each outgoing edge NextNode->Edge[i] {
                        Destination = NextNode->Edge[i]->Destination
                        if ((UnitHead(Destination) == CurrentUnit) &&
                           (! IsInList(VisitedNodeList, (Destination,
                           *)))) {
                            InsertIntoList(NextNodeList, (Destination))
                        }
                    }
                }
            }
            DestroyList(NextNodeList)
        }
        DestroyStack(PathStack)
    }
    DestroyList(UnitRootList)
    
    return UnitPathList
}

The Actual Code

STATIC VOID
dyc_SortAndSplitUnits(
    DYC_UNIT_INFO		UnitList
)
{
    DYC_UNIT_INFO		CurrentUnit;
    BBLOCK			CurrentUnitHead;
    DYC_STACK			PathStack;
    DYC_LIST			VisitedNodeList, NewVisitedList;
    DYC_LIST			NextNodeList, NewNextList;
    BBLOCK			NextNode, NNSuccessor;
    BBLOCK_LIST			SuccessorList, ReverseSuccessorList;
    DYC_GRAPH			Graph;
    DYC_GRAPH_NODE		ParentGraphNode, GraphNode;

    CurrentUnit = UnitList;
    WHILE (CurrentUnit != NULL) {
	/* Skip units outside of dynamic regions */
	IF (! dyc_UNIT_insideDynamicRegion(CurrentUnit)) {
	    DYC_UNIT_INFO_UnitGraph(CurrentUnit) = (DYC_GRAPH) NULL;
	    CurrentUnit = DYC_UNIT_INFO_NextUnit(CurrentUnit);
	    CONTINUE;
	}

        /* Get the root of this specialization unit */
        CurrentUnitHead = DYC_UNIT_INFO_UnitHead(CurrentUnit);
    
	/* Create the graph we build to restructure the unit.  The graph node
	 * for each basic block is created when the node is inserted into
	 * the NextNodeList.
	 */
	DYC_UNIT_INFO_UnitGraph(CurrentUnit) = Graph =
	    DYC_GRAPH_Create(1, dyc_Allocate, dyc_Deallocate);

	/* Create the list of nodes to visit.  dyc_TopologicallyPreceeds should
	 * work even though it only expects pointers to scalars instead of
	 * pointers to arrays.  It just uses the first element of each array.
	 */
        NextNodeList =
	    DYC_LIST_Create(1, dyc_Allocate, dyc_Deallocate,
			    dyc_TopologicallyPreceeds);
        DYC_LIST_Insert(NextNodeList, CurrentUnitHead);
    
	/* Create the stack used to keep track of split paths */
        PathStack = DYC_STACK_Create(3, dyc_Allocate, dyc_Deallocate);
        DYC_STACK_Push(PathStack, VisitedNodeList, NextNodeList, (void *) NULL);
    
        WHILE (! DYC_STACK_EmptyP(PathStack)) {
            /* Start on a new path through the unit */
	    DYC_STACK_Pop(PathStack, &VisitedNodeList, &NextNodeList,
			  &ParentGraphNode);
    
            /* Add to the path until it is completed */
            WHILE (! DYC_LIST_EmptyP(NextNodeList)) {
		/* Get the next node */
                DYC_LIST_RemoveHead(NextNodeList, &NextNode);

		/* Mark the node as visited on this path */
		DYC_LIST_Insert(VisitedNodeList, NextNode);

		/* Add the node to the graph.  Does the edge-order matter? */
		GraphNode = DYC_GRAPH_CreateNode(Graph, NextNode);
		IF (ParentGraphNode == (DYC_GRAPH_NODE) NULL)
		    DYC_GRAPH_AddRoot(Graph, GraphNode);
		ELSE
		    DYC_GRAPH_AddEdge(Graph, ParentGraphNode, GraphNode);

                IF (dyc_StatIsAStaticBranch(BBLOCK_last_stat(NextNode))) {
                    /* Static branches spawn new paths through the
                     * unit.  The current path follows the first edge.
                     */

		    /* Create a list of successors in the reverse order,
		     * skipping the first successor.
		     */
		    SuccessorList = BBLOCK_succ_list(NextNode);
		    assert(SuccessorList);
		    ReverseSuccessorList =
			BBLOCK_List_Reverse_List(BBLOCK_list_tail(SuccessorList));

		    /* NB: If the branch is an IGOTO, the successors are tag
		     * anchors, which must always be the immediate successors
		     * of the block.  How can we guarantee this?
		     */

		    /* Push the successors onto the PathStack in reverse order */
		    LOOP {
			FOR_EACH_BBLOCK_LIST_BBLOCK(ReverseSuccessorList,NNSuccessor) DO {

			    /* Create a new path */
			    NewVisitedList = DYC_LIST_Copy(VisitedNodeList);
			    NewNextList = DYC_LIST_Copy(NextNodeList);
			    dyc_AddBBlockToNextNodeList(NNSuccessor, NewNextList,
							NewVisitedList, CurrentUnit);
			    DYC_STACK_Push(PathStack, NewVisitedList, NewNextList,
					   GraphNode);
			}
		    }
    
		    /* The first successor is on the current path */
                    NNSuccessor = BBLOCK_list_bblock(SuccessorList);
		    dyc_AddBBlockToNextNodeList(NNSuccessor, NextNodeList,
						VisitedNodeList, CurrentUnit);

		    BBLOCK_List_Free_List(ReverseSuccessorList);
                }
                ELSE {
                    /* Visit all children of this node that are in
                     * this unit and that have not already been visited.
                     */
		    LOOP {
			FOR_EACH_BBLOCK_SUCC_BBLOCK(NextNode,NNSuccessor) DO {
			    dyc_AddBBlockToNextNodeList(NNSuccessor,
							NextNodeList,
							VisitedNodeList,
							CurrentUnit);
                        }
                    }
                }

		ParentGraphNode = GraphNode;
            }

            DYC_LIST_Destroy(NextNodeList);
            DYC_LIST_Destroy(VisitedNodeList);
        }

        DYC_STACK_Destroy(PathStack);
	CurrentUnit = DYC_UNIT_INFO_NextUnit(CurrentUnit);
    }
}


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: bibliography
Date: Fri, 28 Mar 1997 13:54:43 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Here is a list of references for the paper.  Most are in bibtex format.
Do we need a reference for Java?  Others?

--Brian

@InProceedings{Andersen:1992:Self-ApplicableC,
  author = 	"L.O. Andersen",
  title = 	"Self-Applicable {C} Program Specialization",
  booktitle = 	PEPM92,
  publisher = 	Yale,
  year = 	"1992",
  pages = 	"54--61",
  month = 	"June",
  note = 	""}

@InProceedings{Andersen:1993:Binding-TimeAnalysis,
  author = 	"L.O. Andersen",
  title = 	"Binding-Time Analysis and the Taming of {C} Pointers",
  booktitle = 	PEPM93,
  year = 	"1993",
  pages = 	"47-58",
  publisher = 	ACM,
  OPTnote = 	""}

Author:        Auslander-J.  Philipose-M.  Chambers-C.  Eggers-S-J.
               Bershad-B-N.
Title:         Fast, effective dynamic compilation.
Source:        SIGPLAN Notices.  vol.31, no.5.  pp. 149-59.  May 1996.
Conf. Title:   ACM SIGPLAN '96: Programming Language Design and
               Implementation. Philadelphia, PA, USA.  ACM.  21-24 May 1996.
Year:          1996.

@InProceedings{Bondorf:1992:ImprovingBinding,
  author = 	"A. Bondorf",
  title = 	"Improving Binding Times without Explicit CPS-Conversion",
  booktitle = 	"1992 ACM Conference in Lisp and Functional
		 Programming, San Francisco, California  (Lisp
		 Pointers, vol. V, no. 1, 1992)",
  year = 	"1992",
  pages = 	"1-10",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Bulyonkov:1996:PracticalAspects,
  author = 	 "M.A. Bulyonkov and D.V. Kochetov",
  title = 	 "Practical Aspects of Specialization of {Algol}-like 
		  Programs",
  crossref =	 "Danvy:1996:PartialEvaluation",
  pages =	 "17-32"}

@INPROCEEDINGS{ Consel:1988:NewInsights,
  AUTHOR        = "C. Consel",
  TITLE         = "New Insights into Partial Evaluation: The {S}chism
Experiment",
  BOOKTITLE     = "ESOP '88, 2nd European Symposium on Programming,
Nancy, France, March 1988 (Lecture Notes in Computer Science, vol. 300)",
  EDITOR        = "H. Ganzinger",
  PUBLISHER     = S-V,
  PAGES         = "236-246",
  YEAR          = "1988",
  ANNOTE        = "A self-applicable partial evaluator called Schism
is presented. It is written in a first-order subset of Scheme, with
syntactic extensions and an extensible set of primitives, and can
handle certain forms of side-effects. User-defined annotations control
unfolding and specialization during partial evaluation. Work towards
automating the annotations is presented."}

C. Chambers, S. J. Eggers, J. Auslander, M. Philipose, M. Mock, and
P. Pardyak
Automatic dynamic compilation support for event dispatching in
extensible systems
In Workshop on Compiler Support for Systems Software
Feb. 1996

@InProceedings{Consel:1990:FromInterpreting:ESOP,
  author = 	"C. Consel and O. Danvy",
  title = 	"From Interpreting to Compiling Binding Times",
  booktitle = 	"ESOP '90.  3rd European Symposium on Programmong,
		 Copenhagen, Denmark, May 1990 (Lecture Notes in
		 Computer Science, vol. 432)",
  year = 	"1990",
  editor = 	"N. Jones",
  pages = 	"88-105",
  publisher = 	S-V,
  OPTnote = 	""}

@Manual{Consel:1990:TheSchism,
  author =      "Charles Consel",
  title =       "The {S}chism Manual, Version 1.0",
  Organization = "Yale University",
  address =      "New Haven, Connecticut",
  month =	"December",
  year =        "1990",
  note =        ""}

@InProceedings{Consel:1991:ForABetter,
  author = 	"C. Consel and O. Danvy",
  title = 	"For a Better Support of Static Data Flow",
  booktitle = 	"Functional Programming Languages and Computer
		 Architecture, Cambridge, Massachusetts, August 1991
		 (Lecture Notes in Computer Science, vol. 523)",
  year = 	"1991",
  editor = 	"J. Hughes",
  pages = 	"496-519",
  organization = 	"ACM",
  publisher = 	S-V,
  note = 	""}

@InProceedings{Consel:1993:ATour,
  author = 	"C. Consel",
  title = 	"A Tour of Schism: A Partial Evaluation System for
		 Higher-Order Applicative Languages",
  booktitle = 	PEPM93,
  year = 	"1993",
  pages = 	"145-154",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Consel:1993:IncrementalPartial,
  author = 	"C. Consel and C. Pu and J. Walpole",
  title = 	"Incremental Partial Evaluation: The Key to High
		 Performance, Modularity and Portability in OPerating Systems",
  booktitle = 	PEPM93,
  year = 	"1993",
  pages = 	"44-46",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Consel:1993:Tutorial,
  author = 	"C. Consel and O. Danvy",
  title = 	"Tutorial Notes on Partial Evaluation",
  booktitle = 	"Twentieth ACM Symposium on Principles of Programming
		 Languages, Charleston, South Carolina, January 1993",
  year = 	"1993",
  pages = 	"493-501",
  organization = 	"ACM",
  publisher = 	ACM,
  note = 	""}

Author:        Consel-C.  Noel-F.
Title:         A general approach for run-time specialization and its
               application to C.
Conf. Title:   Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT
               Symposium on Principles of Programming Languages.  St. 
               Petersburg Beach, FL, USA. pp. 145-56.  ACM.  21-24 Jan. 1996.
Year:          1996.

@InProceedings{Consel:1996:AUniform,
  author = 	 "C. Consel and others",
  title = 	 "A Uniform Approach for Compile-Time and Run-Time 
		  Specialization",
  crossref =	 "Danvy:1996:PartialEvaluation",
  pages =	 "54-72"}

@Proceedings{Danvy:1996:PartialEvaluation,
  title = 	 "Partial Evaluation.  Dagstuhl Castle, Germany, 
		  February 1996",
  year = 	 "1996",
  editor =	 "O. Danvy and R. Gl{\"u}ck and P. Thiemann",
  volume =	 "1110",
  series =	 "Lecture Notes in Computer Science",
  publisher =	 S-V,
  OPTannote = 	 "",
  OPTnote =	 ""}

@inproceedings{EnglerDaws1994a,
 author      ="Engler, Dawson R. and Proebsting, Todd A",
 title       ="DCG: An Efficient, Retargetable Dynamic Code Generator",
 pages       ="263-273",
 booktitle   ="Sixth International Conference on Architectural 
               Support for Programming Languages a nd Operating 
               Systems (ASPLOS-VI)",
 month       ="October",
 year        ="1994",
 scope       ="rcg",
 abstractURL ="http://www.pdos.lcs.mit.edu/~engler/papers.html",
 documentURL ="http://www.pdos.lcs.mit.edu/~engler/dcg.ps"
}

Author:        Engler-D-R.
Title:         VCODE: a retargetable, extensible, very fast dynamic
               code generation system.
Source:        SIGPLAN Notices.  vol.31, no.5.  pp. 160-70.  May 1996.
Conf. Title:   ACM SIGPLAN '96: Programming Language Design and
               Implementation. Philadelphia, PA, USA.  ACM.  21-24 May 1996.
Year:          1996.

Author:        Engler-D-R.  Hsieh-W-C.  Kaashoek-M-F.
Title:         `C: a language for high-level, efficient, and machine-
               independent dynamic    code generation.
Conf. Title:   Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT
               Symposium on Principles of Programming Languages.  St. 
               Petersburg Beach, FL, USA. pp. 131-44.  ACM.  21-24 Jan. 1996.
Year:          1996.

     tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation
     by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
     This paper will appear in PLDI 1997. 

     tcc: A Template-Based Compiler for `C
     by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
     This paper appeared in WCSSS 1996. 

Author:        Engler-D-R.  Proebsting-T-A.
Title:         DCG: an efficient, retargetable dynamic code generation
               system.
Source:        SIGPLAN Notices.  vol.29, no.11.  pp. 263-72.  Nov. 1994.
Conf. Title:   Sixth International Conference on Architectural Support for
               Programming Languages and Operating Systems (ASPLOS-VI).  San
               Jose, CA, USA.  ACM. IEEE Comput. Soc.  4-7 Oct. 1994.
Year:          1994.

@ARTICLE{       Ershov:1982:MixedComputation,
  AUTHOR        = "A.P. Ershov",
  TITLE         = "Mixed Computation: Potential Applications and Problems
  for Study",
  JOURNAL       = "Theoretical Computer Science",
  YEAR          = "1982",
  VOLUME        = "18", 
  PAGES         = "41-67",
  ANNOTE        = "English version of 
                \cite{Ershov:1980:MixedComputation:Russian}."}

@Book{Jones:1993:PartialEvaluation,
  author = 	"N.D. Jones and C.K. Gomard and P. Sestoft",
  title = 	"Partial Evaluation and Automatic Program Generation",
  publisher = 	P-H,
  year = 	"1993",
  isbn = 	"0-13-020249-5",
  note = 	""}

@article{Keppel:91a,
    author={David Keppel},
    title={A Portable Interface for On-The-Fly Instruction Space Modification},
    journal={Proceedings of the Fourth International Conference on
             Architectural Support for Programming Languages and
             Operating Systems (ASPLOS-IV)},
    month={April},
    year={1991},
    pages={86-95}
}

B. W. Kernighan and D. M. Ritchie
The C Programming Language
Second Edition
Prentice Hall
1988

Deferred Compilation: The Automation of Run-Time Code Generation. Mark
Leone and Peter Lee. Technical Report CMU-CS-93-225, School of
Computer Science, Carnegie Mellon University, December 1993.

@InProceedings{Leone:1994:DeferredCompilation,
  author = 	"M. Leone and P. Lee",
  title = 	"Lightweight Run-Time Code Generation",
 booktitle   ="PEPM'94 Workshop on Partial Evaluation and 
               Semantics-Based Program Manipulation",
 month       ="June",
  year = 	"1994",
  pages = 	"97-106",
 address     ="Orlando, Florida",
}

@techreport{LeoneMarka1995b,
 author      ="Leone, Mark and Lee, Peter",
 title       ="Optimizing ML with Run-Time Code Generation",
 type        ="Technical report",
 address     ="Pittsburgh, Pennsylvania",
 number      ="CMU-CS-95-205",
 institution ="School of Computer Science, Carnegie Mellon University",
 month       ="December",
 year        ="1995",
 scope       ="implemen",
 abstractURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.abstract",
 documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.ps",
 keywords    ="Run-time code generation, dynamic optimization, 
               partial evaluation, specialization, staged computation"

In Proceedings of the ACM SIGPLAN '96 Conference on Programming
Language Design and Implementation, pp. 137-148, Philadelphia, May
1996.
}

@inproceedings{LeoneMarka1996a,
 author      ="Leone, Mark and Lee, Peter",
 title       ="A Declarative Approach to Run-Time Code Generation",
 booktitle   ="Workshop on Compiler Support for System Software (WCSSS)",
 month       ="February",
 year        ="1996",
 scope       ="implemen",
 documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/declarative-rtcg.ps",
 keywords    ="run-time code generation, dynamic optimization, 
               specialization, partial evaluation"
In Workshop Record of WCSSS'96: The Inaugural Workshop on Compiler
Support for System Software, pp. 8-17, Tucson, February 1996.
}

Author:        Lowney-P-G.  Freudenberger-S-M.  Karzes-T-J.  Lichtenstein-W-D.
               Nix-R-P.  ODonnell-J-S.  Ruttenberg-J-C.
Title:         The multiflow trace scheduling compiler.
Source:        Journal of Supercomputing.  vol.7, no.1-2.  pp. 51-142.  May
               1993.

@InProceedings{Meyer:1991:Techniques,
  author = 	"U. Meyer",
  title = 	"Techniques for Partial Evaluation of Imperative Languages",
  booktitle = 	PEPM,
  SIGPLAN Notices.  vol.26, no.9.  pp. 94-105.  Sept. 1991.
  Symposium on Partial Evaluation and Semantics-Based
  Program Manipulation, PEPM '91.
  year = 	"1991",
  pages = 	"94-105",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Sakharov:1996:Specialization,
  author = 	 "A. Sakharov",
  title = 	 "Specialization of Imperative Programs through
		  Analysis of Relational Expressions",
  crossref =	 "Danvy:1996:PartialEvaluation",
  pages =	 "430-445"}		  

@inproceedings{WeiseDanie1991a,
 author      ="Weise, Daniel and Conybeare, Roland and 
               Ruf, Erik and Seligman, Scott",
 title       ="Automatic online partial evaluation",
 booktitle   ="Functional Programming & Computer Architecture",
 month       ="June",
 year        ="1991",
 scope       ="partial",
 abstractURL ="ftp://quilty.stanford.edu/pub/fuse-papers/README",
 documentURL ="ftp://quilty.stanford.edu/pub/fuse-papers/FUSE-MEMO-91-6.ps"
}


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: related work draft
Date: Sun, 30 Mar 1997 23:23:42 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Here is the aforementioned draft of the related work section, followed
by some of my notes.  I'll include the same disclaimer that Craig did:
the text is not great.  However, I think it covers most of the points
we want to hit (and probably more).

--Brian


Related Work

Tempo [Consel:1996:POPL], a compile-time and run-time specialization
system for C, is most similar to DyC, but there are numerous and
substantial differences.  Tempo is a mature system that provides
polyvariant function specialization and division, performs alias and
side-effect analysis, performs interprocedural analysis, and deals
with partially static structures.  The two systems differ in the
degree of programmer control offered, annotation strategies, in the
granularity of specialization and division, in support for caching
specialized code, in support for lazy specialization, in the way
aliases and partially static structures are handled, in the way
unstructured code is handled, in support for interprocedural analysis
and separate compilation, in optimization of static and dynamic code,
and in implementation strategies.

In Tempo, the programmer annotates which parameters and global
variables to make static at run time when certain entry-point
functions are called.  These entry points are listed in a file
provided as input to the BTA in addition to the source program, rather
than inlined with the program text as in DyC.  The programmer also
must provide information about aliasing outside of the region of
specialization.  DyC's annotations support the equivalent of entry
points at any program point.  DyC requires no information about code
outside the region of specialziation, but requires annotations to make
pointer dereferences static.  Tempo's alias analysis can be unsound,
but does much more than ours.  On the other hand, our annotations
permit specialization of pointer data structures without accurate
alias information.  DyC's function annotations provide an analogous
solution to the absence of interprocedural analysis in our system.
They permit interprocedural specialization and work as well with
separate compilation.  We plan to add alias analysis and
interprocedural analysis to DyC, but the annotations will continue to
be useful in cases where the analyses fail to collect sufficiently
accurate information.

Because of the disadvantages of CPS conversion
[Jones:1993:PartialEvaluation], and based on study of our target
applications, we feel that program-point specialization is
particularly important for C programs.  In addition to its ability to
specialize functions, Tempo only has the capability to unroll simple
loops.  Our first-generation system also implemented this capability
in an ad-hoc manner [APCEB96].  Cmix was a partial-evaluation system
for C that provided polyvariant program-point specialization for C
[Andersen:1992:Self-ApplicableC].

Cmix handled unstructured code, but it seems to have lacked
reachability analysis [Andersen:1994:PhDThesis].  Tempo applies
Erosa's and Hendren's goto-elimination algorithm to convert
unstructured code to a structured equivalent
[ErosaHendren94,Consel:1996:AUniform].  This approach introduces a
number of additional tests and may also introduce loops.  We feel that
direct support of unstructured control flow is more appropriate for C.

We also feel that polyvariant program-point division is important,
especially when combined with our program-point annotations
make_static and make_dynamic to effect conditional specialization.
Others [Consel:1996:AUniform,Meyer:1991:Techniques] worried that
polyvariant program-point specialization and division could cause code
explosion, but instead of not supporting these powerful features, we
put them under control of the programmer.

Schism's filters permitted choices about whether to unfold or
residualize a function and which arguments to generalize, given
binding times for the functions parameters [Consel:1993:ATour].
Because filters are executed by the BTA, only binding-time information
can be used to make decisions, not the values of static or dynamic
variables, which can be used in specialization decisions in DyC.

Filters can be used, for example, to prevent unbounded unfolding and
unbounded specialization.  Both off-line partial evaluators such as
Schism and on-line specializers such as Fuse [WeiseDanie1991a] look
for dynamic conditionals as a signal that unbounded unfolding or
specialization could result, and then must halt.  Run-time
specializers have an additional option, which is to temporarily halt
specialization when dynamic conditionals are found in potential cycles
and insert lazy callbacks to the specializer, which is what DyC does.

Tempo does not permit caching of multiple specialized versions of
code.  Fabius, a compiler and run-time specializer for a pure subset
of ML, performs caching of multiple versions, but does not provide
control over caching policies [LeoneMarka1995b].

As does DyC, Tempo also constructs and optimizes templates for
dynamically generated code and uses them to produce generating
extensions that, when executed, produce specialized code at run time.
However, they block many inter-template optimizations to preserve
template integrity.  We need only restrict template instructions with
holes to their units, and prevent their speculation above static
branches.  Fabius does not optimize templates separately.

We have expended much more effort optimizing the specialized
run-time specializers.

`C extends the ANSI C language to support dynamic code generation in
an imperative style [EnHsKa96].  The programmer must specify code to
be generated at run time, substitute run-time values and combine code
fragments, perform certain optimizations, invoke the run-time
compiler, manage code reuse and code-space reclamation, and ensure
correctness [LeoneMarka1996a].  They claim that their flexibility is
unique in giving the programmer the ability to specify and compose
arbitrary expressions and statements at run time; however, it is not.

A declarative system such as DyC can do so as well, requiring much
less effort from the programmer, and also facilitating debugging and
ensuring correctness (if the safe subset of the annotations is used).
Tempo uses `C as one of its back-end code generators [Consel97], while
DyC could serve as a back end to `C by specializing an ICODE
interpreter [PoEnKa97].  Generation of new function signatures is
handled by specializing functions with variable numbers of arguments.
A more direct emulation of `C would be possible with the addition to
our system of run-time inlining through function pointers, but we
stress that, in practice, only a few annotations are necessary to
achieve an effect comparable to common `C idioms.  For example,
multi-way loop unrolling can be achieved with a single annotation on
the loop induction variable.

In fact, DyC has several capabilities that `C does not.  `C cannot
multi-way unroll loops with dynamic exit tests because jumps to labels
in other tick expressions are not permitted.  Also, because portions
of dynamically generated functions cannot be recompiled in `C, we
cannot see how they could unroll the interpreter supporting the
computed goto shown in section 2.  Moreover, DyC's nested and
overlapping dynamic regions are not matched by nested tick expressions
in `C.

Furthermore, better static optimization of dynamic code because
control-flow is more easily determined.  This could be one reason why
`C now applies limited partial evaluation to tick expressions,
including dead-code elimination and unrolling of simple static loops
[PoEnKa97].  They even plan to add a binding-time analysis, which
would yield a hybrid declarative/imperative run-time code generation
system.

We could implement any of the proposed run-time register allocation
strategies (e.g., ICODE [PoEnKa97], Fabius [LeoneMarka1995b], or register
actions [Wall86,APCEB96]), and select among them or select not to
perform run-time register allocation through new policies in the
annotations.

The bottom line, however, is that DyC and other declarative systems
can be applied to existing programs in a straightforward manner,
whereas imperative systems such as `C require substantial rewriting of
code.  We therefore conclude, as did Ershov 15 years ago
[Ershov:1982:MixedComputation], that the declarative approach to
code generation is more general than the imperative approach.

**********************************************************************

Old text and notes:

While the ability to specialize at compile time is an advantage
over our system, Tempo's goal to treat compile-time and run-time
specialization uniformly restricts their treatment of run-time
specialization.  For example, arbitrary dynamic-to-static promotion of
variables, lazy specialization, and dynamic caching strategies are not
compatible with compile-time specialization, and so cannot be exposed
to the programmer.

Dead-variables problem:

Specializing on variables that are no longer used
[Jones:1993:PartialEvaluation].

Unstructured control flow:

Tempo applies the algorithm from [ErosaHendren94].

Could the algorithm presented in [Sakharov:1996:Specialization] be
used to do reachability analysis?

Polyvariant specialization:

Worries about code explosion due to program-point polyvariant
specialization in [Meyer:1991:Techniques].

Cmix does polyvariant program-point specialization.  Tempo does not.

Polyvariant division:

Worries about code explosion due to program-point polyvariant division
in [Consel:1996:AUniform].

Mix did polyvariant program-point division.  Tempo, Cmix, Fabius, etc. do not.

CPS:

[Consel:1991:ForABetter].

Specialization units:

Specialization points are points reached by several static memory
states [Jones:1993:PartialEvaluation,
Bulyonkov:1996:PracticalAspects].

Our unit partitioning is vaguely similar to block specialization
[Bulyonkov:1996:PracticalAspects].  They insert explicit block
statements around statement sequences with the same "static memory
state" (think context).  Hence, their blocks begin at specialization
points.  Their approach is more restricted because of their
breadth-first "linear traversal" specialization algorithm.  They
insert specialization points when they need to delay specialization of
some statements to satisfy their (self-induced) constraints.  They
also disallow gotos into compound statements.  Configuration analysis?

Control over specialization:

Termination:

"Conditional markers" on their specializer's stack of calls being
specialized indicate "speculative" specialization.  "If there is a
conditional marker before the most recent call to the same closure, it
means that the call about to be made is speculative, and that
unfolding it may lead to specializer loops that don't correspond to
runtime loops.  When such a call is detected, Fuse elects to
specialize rather than unfold.  Abstraction of arguments occurs at
this point..."  [WeiseDanie1991a].

Tempo
	- polyvariant function division
	- polyvariant function specialization
	- special hack to unroll simple loops
	- can't deal with both static and dynamic loop exits
	- promotions of parameters or globals only at certain function
	  entries listed as entry points in an input file to the BTA;
	  variables to be promoted are annotated as static
	- first converts to structured code
	- also extracts templates
	- also produces generating extensions
	- performs compile-time specialization as well
	- interprocedural analysis
	- partially static structures
	- alias and side-effect analysis
	- static return values of residual calls

Schism

Fabius

Cmix
	- polyvariant specialization of C functions
	- polyvariance at program points: at control flow merges splitting is 
	  done unless C-Mix can determine that the static store is "similar"
	  that allows polyvariance after dynamic ifs. There is no mention of 
	  a reachability analysis, so I am not sure how this works in detail.
	  There's an example where C-Mix multi-way unrolls a binary search 
	  function, though, so it seems to work..
	- no polyvariant division 
	- no partially static data structure (structs must be 
	completely dynamic / static) - what about array splitting?

`C
	- requires use of a new language to use it; can't be used to
	  optimize existing C programs with run-time code generation
	- difficult to debug
	- difficult to statically optimize across tick expressions
	- cannot unroll loops with early exits (breaks)
	- `C programs manipulate code; DyC programs manipulate data
	- we can generate arbitrary code graphs (even ones with merges
	  and loops, thanks to our cache lookups and backpatching) by
	  unrolling an interpreter loop for a language with branches
	- with specialized varargs, we can do anything they can do and
	  more by specializing a simulator of their system
	  [Ershov:1982:MixedComputation]
	- run-time inlining through function pointers would lend a
	  more straightforward emulation
	- in our target applications, our system is more
	  straightforward to use (just a few make_static annotations)


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: conclusion draft
Date: Sun, 30 Mar 1997 23:24:23 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Here is a draft of the conclusion.

--Brian

Conclusion

We are building DyC by modifying the Multiflow compiler [LFKLNOR93].
We began with a detailed design, covering every aspect of the system,
with the implementation being performed in stages.  We currently have
a restricted monovariant version of the BTA working that computes all
of the outputs required by the run-time specializer.  We should soon
be able to generate code for simple dynamic regions.  Over the next
few months, we hope to complete the system.

In the near future, we plan to work out the annotations for static
fields of data structures and for pointers and add them to our system.
We also would like to add alias and side-effect analysis, and
interprocedural analysis as well.  Beyond those analyses, there are a
number of analyses we could add such as termination and exception
analysis to eliminate unnecessary lazy specialization, analysis of
relational expressions [Sakharov:1996:Specialization], or even theorem
proving.  We also would like a better solution to the problem of
clustering specialization-unit boundaries.  However, at this time we
do not know whether these analyses will be useful.

In addition to more aggressive analyses, we would like to add more
run-time optimizations, controlled by the annotations through
policies.  In particular, we would like to implement run-time register
allocation and run-time inlining.

Our annotations' design resulted from a search for a small set of
flexible primitive directives to govern dynamic compilation, suitable
for use by both human programmers and tools (such as a semi-automatic
dynamic-compilation front end).  With the exception of support for
static data structures, we believe that our make_static annotation
provides the flexibility we require in a concise, even elegant manner.
Furthermore (and to our surprise), most of the run-time optimizations
(e.g., multi-way loop unrolling) and other features we desired (e.g.,
conditional specialization) fit neatly into the framework of partial
evaluation.

We feel that DyC's main contribution lies in the flexibility achieved
by its blend of features:
	- the ability to promote dynamic variables to static or demote
	  static variables to dynamic at arbitrary program points by
	  use of simple annotations inserted into the program text,
	  permitting nested and overlapping dynamic regions, 
	- automatic dynamic-to-static promotion of specialization
	  variables, substantially reducing the number of annotations
	  necessary to continue specialization and permitting the
	  programmer to be ignorant of the accuracy of the system's
	  analyses,
	- polyvariant program-point specialization and division,
	  permitting, in particular, multi-way loop unrolling and
	  conditional specialization,
	- automatic caching and reuse of specialized code,
	- on-demand as well as speculative specialization, and
	- programmer control over these features through
	  straightforward policies on specialization variables
	  included in the annotations.

The result is a system far easier to use than a code-generation system
like `C and one that can be easily applied to existing programs.
Furthermore, the annotations and their corresponding array of policies
provide a convenient mechanism for experimenting with dynamic
compilation, and a potential target for future tools or a fully
automatic system.

********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: updated bibliography
Date: Sun, 30 Mar 1997 23:24:57 PST
From: Brian Grant <grant@thistle.cs.washington.edu>


@InProceedings{Andersen:1992:Self-ApplicableC,
  author = 	"L.O. Andersen",
  title = 	"Self-Applicable {C} Program Specialization",
  booktitle = 	PEPM92,
  publisher = 	Yale,
  year = 	"1992",
  pages = 	"54--61",
  month = 	"June",
  note = 	""}

@InProceedings{Andersen:1993:Binding-TimeAnalysis,
  author = 	"L.O. Andersen",
  title = 	"Binding-Time Analysis and the Taming of {C} Pointers",
  booktitle = 	PEPM93,
  year = 	"1993",
  pages = 	"47-58",
  publisher = 	ACM,
  OPTnote = 	""}

[APCEB96]
Author:        Auslander-J.  Philipose-M.  Chambers-C.  Eggers-S-J.
               Bershad-B-N.
Title:         Fast, effective dynamic compilation.
Source:        SIGPLAN Notices.  vol.31, no.5.  pp. 149-59.  May 1996.
Conf. Title:   ACM SIGPLAN '96: Programming Language Design and
               Implementation. Philadelphia, PA, USA.  ACM.  21-24 May 1996.
Year:          1996.

@InProceedings{Bondorf:1992:ImprovingBinding,
  author = 	"A. Bondorf",
  title = 	"Improving Binding Times without Explicit CPS-Conversion",
  booktitle = 	"1992 ACM Conference in Lisp and Functional
		 Programming, San Francisco, California  (Lisp
		 Pointers, vol. V, no. 1, 1992)",
  year = 	"1992",
  pages = 	"1-10",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Bulyonkov:1996:PracticalAspects,
  author = 	 "M.A. Bulyonkov and D.V. Kochetov",
  title = 	 "Practical Aspects of Specialization of {Algol}-like 
		  Programs",
  crossref =	 "Danvy:1996:PartialEvaluation",
  pages =	 "17-32"}

@INPROCEEDINGS{ Consel:1988:NewInsights,
  AUTHOR        = "C. Consel",
  TITLE         = "New Insights into Partial Evaluation: The {S}chism
Experiment",
  BOOKTITLE     = "ESOP '88, 2nd European Symposium on Programming,
Nancy, France, March 1988 (Lecture Notes in Computer Science, vol. 300)",
  EDITOR        = "H. Ganzinger",
  PUBLISHER     = S-V,
  PAGES         = "236-246",
  YEAR          = "1988",
  ANNOTE        = "A self-applicable partial evaluator called Schism
is presented. It is written in a first-order subset of Scheme, with
syntactic extensions and an extensible set of primitives, and can
handle certain forms of side-effects. User-defined annotations control
unfolding and specialization during partial evaluation. Work towards
automating the annotations is presented."}

C. Chambers, S. J. Eggers, J. Auslander, M. Philipose, M. Mock, and
P. Pardyak
Automatic dynamic compilation support for event dispatching in
extensible systems
In Workshop on Compiler Support for Systems Software
Feb. 1996

@InProceedings{Consel:1990:FromInterpreting:ESOP,
  author = 	"C. Consel and O. Danvy",
  title = 	"From Interpreting to Compiling Binding Times",
  booktitle = 	"ESOP '90.  3rd European Symposium on Programmong,
		 Copenhagen, Denmark, May 1990 (Lecture Notes in
		 Computer Science, vol. 432)",
  year = 	"1990",
  editor = 	"N. Jones",
  pages = 	"88-105",
  publisher = 	S-V,
  OPTnote = 	""}

@Manual{Consel:1990:TheSchism,
  author =      "Charles Consel",
  title =       "The {S}chism Manual, Version 1.0",
  Organization = "Yale University",
  address =      "New Haven, Connecticut",
  month =	"December",
  year =        "1990",
  note =        ""}

@InProceedings{Consel:1991:ForABetter,
  author = 	"C. Consel and O. Danvy",
  title = 	"For a Better Support of Static Data Flow",
  booktitle = 	"Functional Programming Languages and Computer
		 Architecture, Cambridge, Massachusetts, August 1991
		 (Lecture Notes in Computer Science, vol. 523)",
  year = 	"1991",
  editor = 	"J. Hughes",
  pages = 	"496-519",
  organization = 	"ACM",
  publisher = 	S-V,
  note = 	""}

@InProceedings{Consel:1993:ATour,
  author = 	"C. Consel",
  title = 	"A Tour of Schism: A Partial Evaluation System for
		 Higher-Order Applicative Languages",
  booktitle = 	PEPM93,
  year = 	"1993",
  pages = 	"145-154",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Consel:1993:IncrementalPartial,
  author = 	"C. Consel and C. Pu and J. Walpole",
  title = 	"Incremental Partial Evaluation: The Key to High
		 Performance, Modularity and Portability in OPerating Systems",
  booktitle = 	PEPM93,
  year = 	"1993",
  pages = 	"44-46",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Consel:1993:Tutorial,
  author = 	"C. Consel and O. Danvy",
  title = 	"Tutorial Notes on Partial Evaluation",
  booktitle = 	"Twentieth ACM Symposium on Principles of Programming
		 Languages, Charleston, South Carolina, January 1993",
  year = 	"1993",
  pages = 	"493-501",
  organization = 	"ACM",
  publisher = 	ACM,
  note = 	""}

Author:        Consel-C.  Noel-F.
Title:         A general approach for run-time specialization and its
               application to C.
Conf. Title:   Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT
               Symposium on Principles of Programming Languages.  St. 
               Petersburg Beach, FL, USA. pp. 145-56.  ACM.  21-24 Jan. 1996.
Year:          1996.

@InProceedings{Consel:1996:AUniform,
  author = 	 "C. Consel and others",
  title = 	 "A Uniform Approach for Compile-Time and Run-Time 
		  Specialization",
  crossref =	 "Danvy:1996:PartialEvaluation",
  pages =	 "54-72"}

@Proceedings{Danvy:1996:PartialEvaluation,
  title = 	 "Partial Evaluation.  Dagstuhl Castle, Germany, 
		  February 1996",
  year = 	 "1996",
  editor =	 "O. Danvy and R. Gl{\"u}ck and P. Thiemann",
  volume =	 "1110",
  series =	 "Lecture Notes in Computer Science",
  publisher =	 S-V,
  OPTannote = 	 "",
  OPTnote =	 ""}

@inproceedings{EnglerDaws1994a,
 author      ="Engler, Dawson R. and Proebsting, Todd A",
 title       ="DCG: An Efficient, Retargetable Dynamic Code Generator",
 pages       ="263-273",
 booktitle   ="Sixth International Conference on Architectural 
               Support for Programming Languages a nd Operating 
               Systems (ASPLOS-VI)",
 month       ="October",
 year        ="1994",
 scope       ="rcg",
 abstractURL ="http://www.pdos.lcs.mit.edu/~engler/papers.html",
 documentURL ="http://www.pdos.lcs.mit.edu/~engler/dcg.ps"
}

Author:        Engler-D-R.  Proebsting-T-A.
Title:         DCG: an efficient, retargetable dynamic code generation
               system.
Source:        SIGPLAN Notices.  vol.29, no.11.  pp. 263-72.  Nov. 1994.
Conf. Title:   Sixth International Conference on Architectural Support for
               Programming Languages and Operating Systems (ASPLOS-VI).  San
               Jose, CA, USA.  ACM. IEEE Comput. Soc.  4-7 Oct. 1994.
Year:          1994.

Author:        Engler-D-R.
Title:         VCODE: a retargetable, extensible, very fast dynamic
               code generation system.
Source:        SIGPLAN Notices.  vol.31, no.5.  pp. 160-70.  May 1996.
Conf. Title:   ACM SIGPLAN '96: Programming Language Design and
               Implementation. Philadelphia, PA, USA.  ACM.  21-24 May 1996.
Year:          1996.

[EnHsKa96]
Author:        Engler-D-R.  Hsieh-W-C.  Kaashoek-M-F.
Title:         `C: a language for high-level, efficient, and machine-
               independent dynamic    code generation.
Conf. Title:   Conference Record of POPL `96: The 23rd ACM SIGPLAN-SIGACT
               Symposium on Principles of Programming Languages.  St. 
               Petersburg Beach, FL, USA. pp. 131-44.  ACM.  21-24 Jan. 1996.
Year:          1996.

[PoEnKa97]
     tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation
     by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
     This paper will appear in PLDI 1997. 

     tcc: A Template-Based Compiler for `C
     by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
     This paper appeared in WCSSS 1996. 

ErosaHendren94:
A.M. Erosa and L.J. Hendren
Taming control flow: a structured approach to eliminating goto statements
Proceedings of the IEEE 1994 International Conference on Computer Languages
May 1994, pp.229-40.


@ARTICLE{       Ershov:1982:MixedComputation,
  AUTHOR        = "A.P. Ershov",
  TITLE         = "Mixed Computation: Potential Applications and Problems
  for Study",
  JOURNAL       = "Theoretical Computer Science",
  YEAR          = "1982",
  VOLUME        = "18", 
  PAGES         = "41-67",
  ANNOTE        = "English version of 
                \cite{Ershov:1980:MixedComputation:Russian}."}

@Book{Jones:1993:PartialEvaluation,
  author = 	"N.D. Jones and C.K. Gomard and P. Sestoft",
  title = 	"Partial Evaluation and Automatic Program Generation",
  publisher = 	P-H,
  year = 	"1993",
  isbn = 	"0-13-020249-5",
  note = 	""}

@article{Keppel:91a,
    author={David Keppel},
    title={A Portable Interface for On-The-Fly Instruction Space Modification},
    journal={Proceedings of the Fourth International Conference on
             Architectural Support for Programming Languages and
             Operating Systems (ASPLOS-IV)},
    month={April},
    year={1991},
    pages={86-95}
}

B. W. Kernighan and D. M. Ritchie
The C Programming Language
Second Edition
Prentice Hall
1988

Deferred Compilation: The Automation of Run-Time Code Generation. Mark
Leone and Peter Lee. Technical Report CMU-CS-93-225, School of
Computer Science, Carnegie Mellon University, December 1993.

@InProceedings{Leone:1994:DeferredCompilation,
  author = 	"M. Leone and P. Lee",
  title = 	"Lightweight Run-Time Code Generation",
 booktitle   ="PEPM'94 Workshop on Partial Evaluation and 
               Semantics-Based Program Manipulation",
 month       ="June",
  year = 	"1994",
  pages = 	"97-106",
 address     ="Orlando, Florida",
}

@techreport{LeoneMarka1995b,
 author      ="Leone, Mark and Lee, Peter",
 title       ="Optimizing ML with Run-Time Code Generation",
 type        ="Technical report",
 address     ="Pittsburgh, Pennsylvania",
 number      ="CMU-CS-95-205",
 institution ="School of Computer Science, Carnegie Mellon University",
 month       ="December",
 year        ="1995",
 scope       ="implemen",
 abstractURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.abstract",
 documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/ml-rtcg.ps",
 keywords    ="Run-time code generation, dynamic optimization, 
               partial evaluation, specialization, staged computation"

In Proceedings of the ACM SIGPLAN '96 Conference on Programming
Language Design and Implementation, pp. 137-148, Philadelphia, May
1996.
}

@inproceedings{LeoneMarka1996a,
 author      ="Leone, Mark and Lee, Peter",
 title       ="A Declarative Approach to Run-Time Code Generation",
 booktitle   ="Workshop on Compiler Support for System Software (WCSSS)",
 month       ="February",
 year        ="1996",
 scope       ="implemen",
 documentURL ="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/papers/declarative-rtcg.ps",
 keywords    ="run-time code generation, dynamic optimization, 
               specialization, partial evaluation"
In Workshop Record of WCSSS'96: The Inaugural Workshop on Compiler
Support for System Software, pp. 8-17, Tucson, February 1996.
}

[LFKLNOR93]
Author:        Lowney-P-G.  Freudenberger-S-M.  Karzes-T-J.  Lichtenstein-W-D.
               Nix-R-P.  ODonnell-J-S.  Ruttenberg-J-C.
Title:         The multiflow trace scheduling compiler.
Source:        Journal of Supercomputing.  vol.7, no.1-2.  pp. 51-142.  May
               1993.

@InProceedings{Meyer:1991:Techniques,
  author = 	"U. Meyer",
  title = 	"Techniques for Partial Evaluation of Imperative Languages",
  booktitle = 	PEPM,
  SIGPLAN Notices.  vol.26, no.9.  pp. 94-105.  Sept. 1991.
  Symposium on Partial Evaluation and Semantics-Based
  Program Manipulation, PEPM '91.
  year = 	"1991",
  pages = 	"94-105",
  publisher = 	ACM,
  OPTnote = 	""}

@InProceedings{Sakharov:1996:Specialization,
  author = 	 "A. Sakharov",
  title = 	 "Specialization of Imperative Programs through
		  Analysis of Relational Expressions",
  crossref =	 "Danvy:1996:PartialEvaluation",
  pages =	 "430-445"}		  

@inproceedings{WeiseDanie1991a,
 author      ="Weise, Daniel and Conybeare, Roland and 
               Ruf, Erik and Seligman, Scott",
 title       ="Automatic online partial evaluation",
 booktitle   ="Functional Programming & Computer Architecture",
 month       ="June",
 year        ="1991",
 scope       ="partial",
 abstractURL ="ftp://quilty.stanford.edu/pub/fuse-papers/README",
 documentURL ="ftp://quilty.stanford.edu/pub/fuse-papers/FUSE-MEMO-91-6.ps"
}

********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: another bibliography update
Date: Mon, 31 Mar 1997 10:02:28 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

The bibliography can now be found on our PEPM97 web page, as can the other
text I've mailed out:

http://www.cs.washington.edu/research/projects/unisw/DynComp/www/Internal/Documentation/DyC/Papers/PEPM97/

--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: running list of outstanding paper issues
Date: Mon, 31 Mar 1997 13:46:50 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

I'm building a running list of outstanding issues wrt the paper.  It includes
our discussions from this morning and most of the mail I've sent.  It can
be found on our PEPM97 web page under "Outstanding Issues".

--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: BTA comments
Date: Wed, 02 Apr 1997 00:03:47 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

My belated comments on the BTA are finally in my list of outstanding issues.

The only major flaw I found was indeed in removal of dead variables from
the Division.  It seems we can't remove a dead variable from the Division
until its last use, or not at all if we want to take the easier route.

There could be a problem with the Division meet operator, or otherwise I
just don't fully understand it.  What happens at a control-flow merge
if a variable on one path has the mono_divide policy but is dynamic on
another path?

Other problems were minor.  A few clarifications are needed in the text,
IMO.

Anyway, take a look at my full comments.  In the morning, I'll try to
sweep through my comments from other sections to update them for the latest
version of the paper.

--Brian