********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: Expanded PEPM Paper
Date: Thu, 01 Jan 1998 13:03:37 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

I found and updated our notes for the planned expansion of the PEPM
paper for the TR.  Seems like a good starting place for the JTCS submission.
It looks like a significant amount of work.

Title?

Introduction
------------

-- Mention name: DyC.
-- Use mipsi as an example of power of annotations? (#lines added, effect of
   annotations)

Example
-------

-- Change stack interpreter to PLDI submission register-based interpreter
	-- Indicate static/dynamic as in PLDI submission
-- Steal from PLDI submission:
	-- Interpreted program
	-- Generated code
	-- Generated code after register actions

Run-Time Specializer
--------------------

-- Need relative caching?

"ms" is one example:

Currently, when r is given the policy cache_all_unchecked, the
specializer enters an infinite loop because the inner loop is unrolled
with no cache check but it does not have a static exit condition.  The
back edge of the inner loop is an inter-unit edge because the dynamic
exit test produces a unit boundary in absence of linearization.  If
there were a diamond in the loop, there would be a similar problem of
needlessly duplicating the code following the diamond's merge.

void scalematrix(int **m, int s, int k) {
     int r,c;
     make_static(s,k);
     r=0;
     make_static(r:cache_all, eager); /* would like to be cache_all_unchecked */
     for (;r<s;r++) {
	  int *row = m[r];
	  for (c=0;c<s;c++) /* not unrolled, so dynamic exit => unit bndry */
	       row[c] *= k;
	  /* back edge of inner loop goes to another unit */
     }
}

"binary" is an example where a SRTP opt. does what we want:

We would like to make binary both loop_specialize_lazy and
cache_all_unchecked, but doing so currently causes new code to be
generated every time the lazy point is encountered in the loop.  We
would like a `one-time' lazy point.

/* For lazy point with single pred. (e.g., branch laziness) */
If (EmptyP(Promotions) &&
   ((EmptyP(Demotions) && (!merge || static merge)) ||
    (CachePolicy == cache_all_unchecked)))
    then use relative `cache_one_unchecked' (use a hole) and patch branch

Annotations
-----------

-- @

Analysis of the Annotations
---------------------------

-- Bug fixes in equations
	-- Problem of keeping roots in StaticVarInfo if derived
	   variables are still live

-- Demotions

-- Change `discordant' terminology?

-- Reachability conditions
	-- Equivalence classes of predecessors, defined by def points
	   of discordant variables, must be mutually exclusive in
	   order for a merge to be static
	-- By this definition, discordant loop heads will never be
	   static.
		-- How to prevent dervied static values from being
		   killed?
			-- Don't have to kill values derived only from
			   polyvariantly specialized variables,
			   loop-invariant variables, or constants?
			-- Any requirements for reachability
			   conditions (e.g., edges from inside loop
			   must be mutually exclusive)?
		-- How to eliminate cache lookup?
			-- Loop induction variable must progress
			   monotonically toward static loop bound?

-- Further explanation
	-- Division vs. StaticVarInfo
	-- Partition & MergePartitions
	-- FilterStaticVars
	-- SourceRoots

-- Examples
	-- Annotated variable becoming derived
	-- Annotated derived variable becoming root variable
	-- Dead variables removed from StaticVarInfo but not the Division

-- Naming problem (discuss it, don't solve it)
	-- Source-level variable / internal representation
	   correspondence following optimizations such as copy
	   propagation, induction-variable simplification, loop
	   unrolling, and other ranamings (e.g., for SSA)
	-- UsedVars was overly aggressive
	-- Policies for derived variables; we probably want to
	   attribute stronger policies to them
	-- Present IVS example?

-- Other problems
	-- Local structures (want to cache by-value, but cached by-address)

Generating the Run-Time Specializer
-----------------------------------

-- Unit identification
	-- Algorithm is iterative because of single-entry requirement
	-- More rationale for unit boundaries
		-- New variant (specialization)
		-- Sharing/reuse
	-- Relax demotion boundaries

-- Clustering
	-- Emphasize importance of control equivalence in the
	   presented algorithm
	-- Example
	-- Important not to cluster or reorder different types of
	   boundaries (e.g., promotions and discordant merges, or 
	   cache_one_unchecked and cache_all_unchecked)

-- Linearization
	-- Describe splitting and merging (but don't provide detailed algorithm)

Comparison to Related Work
--------------------------

-- More comparison with `C and Tempo?

-- Mention run-time inlining

-- In discussion of Schism's filters describe how it is possible for
   us to also use binding-time information (is_static(x)/is_dynamic(x))

Conclusions
-----------

-- Implementation status

-- Application of policies is difficult, but useful

--Brian


********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: expanded PEPM paper (SHORT e-mail)
Date: Tue, 06 Jan 1998 16:11:19 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

Since nobody read my long e-mail on the journal paper, here is a short one.
The expansion will consist of:

1) Corrections
2) Additional or expanded examples
3) Experience

Even the minimum revisions (for example, corrections alone) will require a
substantial amount of work.  That is, this won't be "something for nothing."
Given that, people should probably start working on it in their "spare time."
Here is a high-level breakdown of what needs to be done and preliminary task
assignments (also, roughly ordered by priority):

Minimal set of tasks:

Person	Section	Task
------	-------	----
MM	5 	BTA corrections and addition of demotions
MP	2 	Expanded main example (by borrowing from PLDI submission)
MM/BKG	5	Discussion of troubles with BTA (naming, policies)
BKG	6	Implementation corrections
BKG	6	Discussion of troubles with caching
MP	?	Experience with mipsi (from annotations/expressiveness viewpt.)
BKG	All	Misc. corrections and adding @ to s.2

Other possible tasks:
	5	Explanations of a few tricky parts of BTA equations
	5	A couple examples illustrating tricky cases handled by BTA
	6	Clustering example
	6	Proof that clustering algorithm is optimal in restricted case
		(this IS a theory journal)
	6	Rough description of linearization algorithm (previously
		promised to be in an expanded form of the paper)
	7	More comparison with `C & Tempo (perhaps borrowed from PLDI sub)

We'll also need to suitably update the conclusion to summarize our
experiences and state the progress of our implementation.

Further details (e.g., on BTA examples/clarifications) can be found in my
previous e-mail.

--Brian

********************************************************************************
********************************************************************************
********************************************************************************

To: spin-comp@thistle.cs.washington.edu
Subject: working draft of expanded PEPM paper
Date: Sun, 11 Jan 1998 09:38:59 PST
From: Brian Grant <grant@thistle.cs.washington.edu>

I have created a working draft of the expanded PEPM paper by copying the
PEPM paper and making a few corrections.  The path is:

/afs/cs/project/dyncomp/grant/Papers/jtcs98/dc.doc

What format does our submission need to be in, anyway?

--Brian