******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: Expanded PEPM Paper Date: Thu, 01 Jan 1998 13:03:37 PST From: Brian Grant I found and updated our notes for the planned expansion of the PEPM paper for the TR. Seems like a good starting place for the JTCS submission. It looks like a significant amount of work. Title? Introduction ------------ -- Mention name: DyC. -- Use mipsi as an example of power of annotations? (#lines added, effect of annotations) Example ------- -- Change stack interpreter to PLDI submission register-based interpreter -- Indicate static/dynamic as in PLDI submission -- Steal from PLDI submission: -- Interpreted program -- Generated code -- Generated code after register actions Run-Time Specializer -------------------- -- Need relative caching? "ms" is one example: Currently, when r is given the policy cache_all_unchecked, the specializer enters an infinite loop because the inner loop is unrolled with no cache check but it does not have a static exit condition. The back edge of the inner loop is an inter-unit edge because the dynamic exit test produces a unit boundary in absence of linearization. If there were a diamond in the loop, there would be a similar problem of needlessly duplicating the code following the diamond's merge. void scalematrix(int **m, int s, int k) { int r,c; make_static(s,k); r=0; make_static(r:cache_all, eager); /* would like to be cache_all_unchecked */ for (;r unit bndry */ row[c] *= k; /* back edge of inner loop goes to another unit */ } } "binary" is an example where a SRTP opt. does what we want: We would like to make binary both loop_specialize_lazy and cache_all_unchecked, but doing so currently causes new code to be generated every time the lazy point is encountered in the loop. We would like a `one-time' lazy point. /* For lazy point with single pred. (e.g., branch laziness) */ If (EmptyP(Promotions) && ((EmptyP(Demotions) && (!merge || static merge)) || (CachePolicy == cache_all_unchecked))) then use relative `cache_one_unchecked' (use a hole) and patch branch Annotations ----------- -- @ Analysis of the Annotations --------------------------- -- Bug fixes in equations -- Problem of keeping roots in StaticVarInfo if derived variables are still live -- Demotions -- Change `discordant' terminology? -- Reachability conditions -- Equivalence classes of predecessors, defined by def points of discordant variables, must be mutually exclusive in order for a merge to be static -- By this definition, discordant loop heads will never be static. -- How to prevent dervied static values from being killed? -- Don't have to kill values derived only from polyvariantly specialized variables, loop-invariant variables, or constants? -- Any requirements for reachability conditions (e.g., edges from inside loop must be mutually exclusive)? -- How to eliminate cache lookup? -- Loop induction variable must progress monotonically toward static loop bound? -- Further explanation -- Division vs. StaticVarInfo -- Partition & MergePartitions -- FilterStaticVars -- SourceRoots -- Examples -- Annotated variable becoming derived -- Annotated derived variable becoming root variable -- Dead variables removed from StaticVarInfo but not the Division -- Naming problem (discuss it, don't solve it) -- Source-level variable / internal representation correspondence following optimizations such as copy propagation, induction-variable simplification, loop unrolling, and other ranamings (e.g., for SSA) -- UsedVars was overly aggressive -- Policies for derived variables; we probably want to attribute stronger policies to them -- Present IVS example? -- Other problems -- Local structures (want to cache by-value, but cached by-address) Generating the Run-Time Specializer ----------------------------------- -- Unit identification -- Algorithm is iterative because of single-entry requirement -- More rationale for unit boundaries -- New variant (specialization) -- Sharing/reuse -- Relax demotion boundaries -- Clustering -- Emphasize importance of control equivalence in the presented algorithm -- Example -- Important not to cluster or reorder different types of boundaries (e.g., promotions and discordant merges, or cache_one_unchecked and cache_all_unchecked) -- Linearization -- Describe splitting and merging (but don't provide detailed algorithm) Comparison to Related Work -------------------------- -- More comparison with `C and Tempo? -- Mention run-time inlining -- In discussion of Schism's filters describe how it is possible for us to also use binding-time information (is_static(x)/is_dynamic(x)) Conclusions ----------- -- Implementation status -- Application of policies is difficult, but useful --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: expanded PEPM paper (SHORT e-mail) Date: Tue, 06 Jan 1998 16:11:19 PST From: Brian Grant Since nobody read my long e-mail on the journal paper, here is a short one. The expansion will consist of: 1) Corrections 2) Additional or expanded examples 3) Experience Even the minimum revisions (for example, corrections alone) will require a substantial amount of work. That is, this won't be "something for nothing." Given that, people should probably start working on it in their "spare time." Here is a high-level breakdown of what needs to be done and preliminary task assignments (also, roughly ordered by priority): Minimal set of tasks: Person Section Task ------ ------- ---- MM 5 BTA corrections and addition of demotions MP 2 Expanded main example (by borrowing from PLDI submission) MM/BKG 5 Discussion of troubles with BTA (naming, policies) BKG 6 Implementation corrections BKG 6 Discussion of troubles with caching MP ? Experience with mipsi (from annotations/expressiveness viewpt.) BKG All Misc. corrections and adding @ to s.2 Other possible tasks: 5 Explanations of a few tricky parts of BTA equations 5 A couple examples illustrating tricky cases handled by BTA 6 Clustering example 6 Proof that clustering algorithm is optimal in restricted case (this IS a theory journal) 6 Rough description of linearization algorithm (previously promised to be in an expanded form of the paper) 7 More comparison with `C & Tempo (perhaps borrowed from PLDI sub) We'll also need to suitably update the conclusion to summarize our experiences and state the progress of our implementation. Further details (e.g., on BTA examples/clarifications) can be found in my previous e-mail. --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: working draft of expanded PEPM paper Date: Sun, 11 Jan 1998 09:38:59 PST From: Brian Grant I have created a working draft of the expanded PEPM paper by copying the PEPM paper and making a few corrections. The path is: /afs/cs/project/dyncomp/grant/Papers/jtcs98/dc.doc What format does our submission need to be in, anyway? --Brian