******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp From: mock Here's a proposal for some new annotations that will be powerful enough to deal with multiway loop unrolling and slowly changing constants. At the same time, they should be less ad-hoc and still implementable. Please take some time to read it before the meeting to have a good discussion. -- Markus ==================================================================== Annotations for Quasi-Constants and Multi-way Loop Unrolling 0. Introduction In the following some simple new anntotations will be proposed. They will allow us to deal with "slowly-changing constants" (or quasi-constants) and with multi-way loop unrolling. They also allow to express "traditional" loop unrolling as we did before and subsume the concept of keys. They are more general and simpler as the annotations we had before. The programming language level annotations induce annotations at the IR level which suggests a direct way of implementing them. Here's an example why we want the new annotations: Example 1: D: if (d) x := 1; else x := 2; M: [some code using x, d a dynamic condition] After the value of x becomes known at run time, B could be stitched using either x=1 or x=2. In the current framework there is no way to make x a constant, although we would like to treat it as a constant in B. There are two interesting points here: (i) At D the branch outcome becomes known. At this point we can decide if we want to stitch eagerly or lazily. (ii) At the merge point M the current BTA would label x as dynamic as the reachability conditions are not mutually exclusive. There are two ways to "keep x constant" at this point: a). Create a new stitched version (replicate or specialize) We simply replicate the template with the appropriate values stitched in. No checks are done if we previously stitched a version that could be reused now. b). Use a cache If, say, in a previous execution of the region x was set to 1 and now it is set 2, we check if code for this value has already been produced. If so, we look it up. Otherwise we produce it and cache it. N.B. The check whether there already is a stitched version, will usually be quite complex and have to take into account more context than just one variable. We call the context (the set of variables) that will have to be checked to do this lookup "signature". What will be needed for such a signature will be discusses in section 3. (i) and (ii) togehter give us "two knobs" to control dynamic compilation which are independent. 1. Annotations First, we will look at the annotations at the intermediate representation level that will give us fine-grained control over the dynamic compilation process. As this level of detail is neither available to nor desirable for the programmer, we will then present programming language level annotations that will translate into some desirable subset of possibilities that the IR-level annotations provide. 1.1 Annotations at the IR level Let's go back to a somewhat modified example 1: Suppose, we want to keep x constant. Example 2: D: test d EAGER|LAZY / \ x = 1 x = 2 y = 2 y = 1 \ / \ / M: ... (replicate|cache|cache_one|share|share_const) x B D: is the point where at stitch time the decision has to be made whether to pursue both sides of the branch eagerly or just one branch lazily. So this is the right point to have annotation guiding that decision. Based on this annotation the statically compiled code will tell the stitcher what to do at run time at this point. At M: a decision has to be made how to handle the new value for x. These are the possibilities: o don't use a cache for x: annotate as "replicate". Without checking any context a new version for B will be stitched o cache one value for x: annotate as "cache_one". One cached version for x is kept around. This will be profitable if x keeps a particular value for quite a while, changes, keeps the new value for some time and so forth. A new version replaces an old one in the cache. [we could also use a cache of size n, 1 < n < infinity. But this brings up the issues of cache replacement policy and a bunch of other things. Some musings on that are in the appendix.] o cache as many values for x as needed: annotate as "cache". For each new value of x a new version for B is produced and cached. N.B. this corresponds to our current KEY annotation. o x is treated like a variable: annotated as "share". One version is shared for all instances of B, i.e. B is not specialized on the value of x. o treat variable x as a constant: annotated as "share_const". A specialized version for B will be produced. The programmer guarantees that the value of variable x will not change. These annotations at merges are on a per variable basis. In example 2, to specialize on y too, the merge would be annotated: cache x, cache y. [skip to 1.2 if you like] In the example, it happens to be the case that we could treat y as a constant too as its value can be derived from x's value. This is not always the case, however: Example 3: (a dynamic multi-way branch) D: test d / | \ x = 1 x = 2 x = 1 y = 2 y = 1 y = 4 \ | / \ | / M: ... cache x B Here, the value of y cannot be determined from x at the merge point. Hence, y cannot be considered constant in B. We could do an analysis to figure out the cases in which such "incidental quasi-constants" could be derived from an annotated constant, but for now we simply ignore them. 1.2 Annotations at the programming language level 1.2.1 Syntactic Form and Semantics We don't want the programmer to annotate all merges and branches at the IR level. The proposed language-level annotations, however, will allow us to express the cases we are interested in and as the IR level annotations can be derived from them, they are implementable. Moreover, they require much less effort. The following annotation shall be used: keep-const() [eager|lazy|looplazy] := + := [cache|cache_one|replicate] where is a list of variables keep-const, will tell the compiler to treat the listed variable in var-list as constants. The [eager|lazy|looplazy] has the following semantics: o eager: treat all dynamic branches eagerly o lazy: treat all dynamic branches which reach a merge where a variable has to be kept constant, lazily. o looplazy: same as above, except that only merges at loop heads are considered for laziness, all others are annotated as eager. The specifier for the cache size for each variable that is to be kept constant directly translates to annotations at merge points at which the particular variable gets merged (cf BTA, section 2) 1.2.2 Scoping Instead of allowing the keep_const annotation only at the beginning of each dynamic region, we allow them to be placed at any point in the dynamic region. Thus, we can refer to variables at a point where they are already defined. The annotation is valid from the point of the annotation downstream in the region (ignoring back edges that go back beyond the point of the annotation). To allow arguments to the dynamic region that are to be treated as constants, the keep_const annotation will be part of the begin declaration of a dynamic region. 1.2.3 Default values A safe default for [eager|lazy|looplazy] is looplazy. The default for the cache size is infinity. 1.3 How are these annotations used? 1.3.1. We used to write dynamicRegion(cache) { } Now we write: dynamicRegion keep_const(cache) { ... uses of cache ... } 1.3.2 We used to write: dynamicRegion(cache) { ... unrolled for (set = 0; set < assoc; set++) { if (setArray[set]dynamic->tag == tag) return CacheHit; } return CacheMiss; } Now we write: dynamicRegion keep_const(cache){ ... keep_const(set) eager; for (set = 0; set < assoc; set++) { if (setArray[set]dynamic->tag == tag) return CacheHit; } return CacheMiss; } 1.3.3 We couldn't write: dynamicRegion(bytecodes,pc) { unrolled do { switch(bytecodes[pc]) { case ADD: .. pc++; case GOTO: pc = bytecodes[pc]; ... } } while (1) } But now we can write: dynamicRegion keep_const(bytecodes){ keep_const(pc) do { switch(bytecodes[pc]) { case ADD: .. pc++; case GOTO: pc = bytecodes[pc]; ... } } while (1); } and get multi-way loop unrolling. ==================================================================== APPENDIX: ========= 2. BTA Modifications For variables annotated to be made constant by "keep_const", the only change that needs to be made compared to the current BTA is, that the reachability conditions will be ignored at merges. Only the fact that a dynamic condition is on the reachability path will be remembered. Such merges will be annotated with the cache size and share policy that reaches that merge point. Dynamic branches will be annotated with eager | lazy based on the annotation that reaches that point. For looplazy only dynamic branches that reach merges at loop heads will be annotated as lazy, others will be annotated as eager. C(p0) = set of variables annotated by keep_const Flow functions: x := k S = S U {x}, where k is a compile time constant x := y op z, if {y,z} <= S: S = S U {x} if op is idempotent, side effect-free S - {x} otherwise the equation for function call and pointer assignment are analogous x := phi(x1,..,xn) S - {x1, .. xn} U {x} if {x1,.. xn} <= S S - {x,x1, .. cn} otherwise branches have no effect Meet function: meet(S1,S2) = S1 U S2 (for a meet which is not reached by a dynamic condition) = (S1 U S2) ^ Cp (Cp is the set of all variables annotated to be made constant at p) (^ = intersection (ouch)) ---------------------------------------------------------------------- More musings: 3. Signature / Cache Lookup 3.1 Signature If we choose not to treat "incidental constants" (such as in example 3) in the same way as annotated constants, the full state necessary to determine if stitched version for the current context was already stitched is given by: S(p) = Q(p) ^ Live(p) where Q(p) is the set of variables declared to be constant whose keep_const declaration reaches point p. Live(p) the set of live variables at point p 3.2 Cache lookup 3.2.1 Specialization on annotated values A future extension could allow the programmer not only talk about the size of the cache but also about particular values on which specializations should or should not be done. At the IR-level an annotation for that could look something like: Example 4: D: test d LAZY / \ x = xx1 x = dx2 y = dy1 y = dy2 \ / \ / M: ... cache x if (x=-1 or x > 100), cache y B Now the generation of specialized versions is not only determined by the size of the cache but also by certain runtime evaluated conditions. This would require the generation of different specialized versions of B: one for x treated as a constant and one for x being treated as a variable. 3.2.2 Interpretation of cache size annotation [weird stuff follows] Using cache sizes n 1 []) [] where ::= list of variables (possibly empty) ::= :cache | :cache1 | :unchecked | :replicate ::= eager | lazy | looplazy Defaults: keepConst( :cache) looplazy [NB unchecked was called shareConst in the previous draft] keepConst annotations are only allowed inside of dynamic regions. The following annotation can be used to "turn keepConst off": B. unkeepConst() which is also only allowed inside of a dynamic region. C. keepConst( [:cache | :cache1 | :unchecked] { /* the dynamic region */ } Default: cache The keepConst( [:cache | :cache1 | :unchecked] { /* the dynamic region */ } The keepConst annotation with a block is used to mark the beginning and extent of a dynamic region. It is also used to specify the set of variables that are assumed to be constant throughout the region, given in the variable list. The is used to specify how to deal with multiple versions, analogous to the keys in our previous dynamic compilation design. Using "cache" specifies an infinite size cache of versions, "cache1" a cache of size 1 (i.e. a new version will replace an existing old one). At the start of the region a check of the cache needs to be made when using cache or cache1. Cache1 will be useful if the incoming values exhibit temporal locality, i.e. the same set of values enters the region for some time then changes and then stays the same again for some time. If that is not the case, "cache" should be used, which is also the default. "unchecked" is like cache1 except that no cache check is done. The programmer asserts that the variable will only have one value and won't change. For variables in that are actually changed in the region, the behavior is like putting a keepConst inside the region, i.e. multiple versions will be created and cached using the specified cache policies. Nesting: ------- keepConst's with regions can be nested. For example keepConst(x) { ... /* R1 region */ keepConst(y) { ... /* R2 nested region */ } ... /* R1 */ } In region R1 only x will be kept constant (assuming no other keepConst annotations within R1, in R2 all variables of surrounding regions (here R1) will be kept constant and the variables declared for region R2. In this case, both x and y are kept constant inside R2. 2. IR-level annotations ======================= 2.1 Computation of IR level annotations ---------------------------------------- The annotations at the programming language level are converted to annotations at the IR level in the following way: 1. Annotations are propagated along non-backedges only 2. Propagation is done according to these rules: (:C L are the caching and laziness specifications, i.e C \in {cache, cache1, replicate, unchecked} L \in {eager, lazy, looplazy} keepConst( :C) L: Out = In + {(x1,C,L), .. (xn,C,L)} unkeepConst( :C) L: Out = In - {(x1,C,L), .. (xn,C,L)} keepConst( :C){}: Out = {(x1,C,eager) .. (xn,C,eager)} other: Out = In At Merges: Out = U In(p) for all predecessors p U is ordinary set union. Call the information computed by these rules Kconst(p) for program point p. For a (dynamic) merge point m, additionally the overall caching and laziness policies C(m) and L(m) are computed as follows: L({(x1,C1,L1), .. (xn, Cn, Ln)}) = min(L1, .. Ln) where min is the minimum of the following order: lazy < looplazy < eager C({(x1,C1,L1), .. (xn, Cn, Ln)}) = min(C1, .. Cn) where min is the minimum of the following order: replicate < cache < cache1 < unchecked Making different annotations for the same variable ((x,C,L) , (x,C',L') where (C,L) != (C',L')) is not allowed and flagged as an error. 2.2 Laziness annotations at the IR level ---------------------------------------- For lazy stitching we must compute which dynamic branches we want to treat lazily and which eagerly. In the previous section we have computed annotations for dynamic merge points based on the programmer annotations for the variables. From these (dynamic) merge point annotations we compute which dynamic branches are treated lazily. 1. All dynamic merges annotated as looplazy are changed to eager iff they are not loop head merges lazy iff they are loop head merges 2. For each dynamic merge M annotated lazy: compute the earliest post-dominating edges E of M and mark them as lazy 3. For each dynamic branch D: If any outgoing edge of D is marked lazy mark D as a lazy otherwise as eager When edge E post-dominates edge E' (or node p) we are sure that when we follow edge E we are guaranteed to traverse E' (pass thru p) later. [An vertex v' post-dominates a vertex v in a graph G iff v dominates v' in G' where G' is obtained from G by reversing the edges. The dominates-relation for edges is obtained by introducing a new vertex v12 for each edge (v1,v2) and replacing it in the graph with edges (v1,v) and (v, v2). An edge (v1,v2) dominates (v3,v4) iff in the new graph v12 dominates v34. Example: d1? l/ \ d2? B1 / \ | / B2 B3 d3? \ / l/ \______> ----- M: If M is annotated as lazy the left edge of d1 and d3 each post-dominate M and hence are annotated as lazy. d1 and d3 will be stitched lazily, d2 eagerly. 3. BTA ====== The BTA needs the following modifications for the new annotations: 1. Initial set C(p0) = set of variables annotated as constant in the keepConst(){} region annotation 2. Flow functions Each flow function is modified (f(ins) denotes the previously used flow function for instruction ins): Cflow [[ins]] cs = f(ins) U Kconst(p) i.e. the set of variables to be kept constant at the program point are always considered constant at point p. 3. Merge function Mconstant(cs1, cs2) = cs1 U cs2 if cn1 and cn2 are mutually exclusive (static merge) (cs1 \intersection cs2) U Kconst(p) otherwise i.e. static merges (based on the unmodified reachability analysis) are treated in the same way as before. For dynamic merges all inferred constants based on variables annotated to be kept constant are forgotten, only the variables in Kconst(p) are considered constant on top of the "traditional" run time constants. Example: Assume the programmer has annotated keepConst(x,y) (SSA form) if (d) ; d dynamic condition t1 = x * y {x, y, t1} set of constants else t2 = x + y {x, y, t2} set of constants M: At M the merge is a dynamic merge, hence the rule is: ({x, y, t1 } \intersect {x, y, t2}) U {x,y} = {x, y} For the loop unrolling example in the paper this works out as follows: keepConst(p) p1 = 1st -->M: {p1,p3} | p2 = Phi(p1,p3) | t = (p2 != NULL) | | t? | | \----------> | | | ... | p3 = p2->next | | | | ------- M is a non-static merge. For loop unrolling this loop head merge was therefore marked as a constant merge to be able to derive the loop induction variable as run time constant. Using the keepConst annotation for variable p (the induction variable) achieves the same effect here, by unioning in {p} at M again. Hence, no special marking for the loop head merge as a constant merge is necessary, it is sufficient to use keepConst on the variable "along which" the loop is to be unrolled. Multi-way loop unrolling is achieved in the same way by annotating the induction variable along which to unroll. Example: pc = 0 keepConst(pc) while (1) { M: op = codes[pc] switch(op) { case ADD: ... pc++ break case GOTO: pc = codes[pc+1] break ... } } The keepConst(pc) annotation will keep pc constant at the merge point M and hence in all cases of the switch, which in turn will allow codes[pc+1] to be derived as a constant. ---------------------------------------------------------------- 4. Support for dynamic compilation of function calls In the present framework, results of function calls cannot be treated as run time constants and functions will not be specialized for certain arguments. The following is a proposal for adding some support for procedure specialization and memoization of results short of inter-procedural analysis. 4.1 Call site annotations Form: a = [sefree,idempotent] f() [specialize on ] Intended use: sefree (side effect free): asserts that the function f (at this call site) has no side effects that the programmer is interested in. Example: a call to random, will change the state of the random number generator but we might not care. idempotent: asserts that f (at the call site) is idempotent, i.e. for the same arguments it will return the same results. Idempotent subsumes sefree. Example: sin(x) Special case: all arguments are run time constants at the call site. Then "a" becomes a run time constant too, and the function f will be called only once during setup and the result will be cached. specialize on : is used to specify that the function f is to be specialized at the call site. It is to be specialized only in the arguments given in which must be a subset of the argument actuals given in . Example: a = idempotent f(x,y) specialize on (x) What happens at run time? sefree idempotent specialize specialized versions of f are created compiling f assuming the arguments being specialized on are constant like sefree, additionally if all arguments are used for specialization, the result can be cached and f needs to be called only once for each argument tuple. don't specialize --- if all arguments are rtc: call f only once during setup and cache the result 4.2 Callee Annotation Another annotation is useful to specialize f independent of the call site. It is then important to specify which arguments are interesting for specialization. Form: specialize [idempotent, sfree] f(x1,..,xn) on () where is a subset of the argument list. Intended Use: Here the intended use is to create as many specialized versions of f where specialization is done exactly on the arguments listed in the key. Additionally, f can be asserted to be side-effect free or idempotent. If there are both callee and call-site annotations for f, the call site annotation overrides the callee annotation and a specialized version for the particular call site is created independent of the other specialized version(s) for f. Implementation: -------------- BTA: All cases leave the set of computed run time constants unaffected except for a = idempotent f(x1, .. xn) where all x1, .. xn are rtc's. The BTA is changed to make "a" a derived rtc then. (static) compile time: Each specialized function becomes an implicit dynamic region and gets compiled as such. For each call site a different template will be produced. If all arguments are run time constants, now function call will need to be compiled. A general version of f will be kept, called as setup and the result will be used as the value of the runtime constant function result. Runtime: At the call site, the produced template gets stitched with the appropriate run time constants (the arguments to specialize on). A cache for versions of f is checked first, to find out if the required version has already been produced before. There seems to be no other magic involved. Hence, these annotations would be implementable and useful, enlarging the range of programs that can be profitably compiled, even without doing inter-procedural analysis. ================================================================ Design of New Annotations As of $Date: 1996/07/03 22:08:03 $