******************************************************************************** ******************************************************************************** ******************************************************************************** To: eggers@thistle.cs.washington.edu cc: grant@thistle.cs.washington.edu, spin-comp@thistle.cs.washington.edu Subject: eager stitching Date: Fri, 07 Jun 1996 14:01:37 PDT From: Brian Grant | | S Z / \ / A B \ / M | Assume S is a static branch, and Z is "the sky" (an edge from some unspecified place in the CFG). In our previous discussions, you wanted to leave edges AM and BM intact and place a test (does B need to be stitched?) on edge ZB. I failed to make an argument about why that couldn't be done in general. I think now I have one. The main impediment to leaving those edges intact is identifying that there is something special about the (nonconstant) merge M. If ZB is a cross edge (B is neither a successor or predecessor of Z in the depth-first-search tree created by traversing the CFG), it can be identified and (possibly) removed. If we recompute reachability conditions with cross edges removed, we eliminate the "most unstructured" instances of this example and we can conservatively just place tests on the cross edges. Unfortunately, most common cases of unstructure introduce forward edges, which cannot be eliminated from the graph. Forward edges are everywhere. A forward edge is an edge to a successor in the CFG. For example, edge BM in the picture above is a forward edge. If ZB is a forward edge, edge AM must be changed. Here's a more concrete example: if (D && S) {A} else {B} {M} This code translates to the following CFG: | D / \ S | B and M are nonconstant merges / \ / A B \ / M | Edges DB and BM are forward edges. We always want to eagerly stitch B, but only want to stitch A if S is true. So, we want to create the following CFG: | D | S /| A | \| B | M | What we have done is redirected edge AM to point to the destination of edge DB (the next edge from a dynamic branch that we backtrack to in the traversal), creating edge AB. So, we don't touch edges from static branches, but I don't see a way to avoid potentially changing edges at a nonconstant merge below a static branch. Here are some similar examples (S denotes static branch, D dynamic): | | S S / \ / \ D | D | / \ / transformed -> | / B is a constant merge A B A / M is a nonconstant merge \ / |/ M B | | M | | D1 / \ D2 | / \ / transformed -> -D1-D2-A-B-M- A B B and M are nonconstant \ / merges with identical M reachability conditions | on all incoming edges | S1 / \ S2 | / \ / transformed -> unchanged A B B and M are constant merges \ / M | Craig suggested a new way of looking at the problem by grouping blocks into nested equivalence classes. Blocks with the same reachability conditions would be put in the same equivalence class. New nested equivalence classes would only be created at static branches and weird merges (merges with edges from the sky). Subgraphs within equivalence classes would only contain dynamic branches and could be linearized by topological sort. So, by starting at the innermost nested equivalence classes and working out, the whole graph could be linearized. Static branches would choose between linearized subgraphs at setup/stitch time. The main trickiness is in dealing with equivalence classes created at weird merges. A check would have to be inserted somewhere to make sure the subgraph would be stitched if it needed to be. This is equivalent to my original problem of figuring out where to put in checks to go back and stitch some blocks. It is just a new way of looking at it. What I need to think about: 1) What happens if I remove cross edges, recompute reachability conditions, then reinsert the edges with checks on them (does this block need to be stitched?)? Does it mess anything up? Does it simplify the rest of the algorithm or improve the quality of our code layout? 2) Does this notion of equivalence classes help? Where do we check to go back and stitch subgraphs started at weird merges? Will there be too many of these weird merges with small amounts of unstructure, creating a lot checks? How does this relate to control-dependence trees? 3) What exactly is the algorithm for doing the topological sort? I have a start. At a static branch, do nothing special. At a dynamic branch, push all edges but the first on a stack and remove them from the graph. At a constant merge, do nothing. On any edge but the last at a nonconstant merge, redirect the edge to point to the destination of the first edge popped off the stack that doesn't lead to a nonconstant merge (or is the last edge into a nonconstant merge). I think there may also be a special case when all edges into a merge have the same reachability conditions. 4) What happens if the CFG has backedges (unrolled loops, esp. multiway)? Backedges for loops that are not unrolled can be removed if other edges into the loop head have the same (or less restrictive) reachability conditions as the backedge. That is, we're sure the destination of the backedge will be stitched. Otherwise, we have to insert a check on the backedge. --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: algorithm for eager stitching Date: Mon, 01 Jul 1996 15:07:49 PDT From: Brian Grant I think I have the algorithm for eager stitching mostly done, except for two things (one thing?): 1) How to combine eager stitching with lazy stitching, if we're going to implement lazy. 2) How to handle multi-way unrolling if we're not. I realize it's too late for people to read this before the meeting. It's also too late for me to prepare a good presentation of it, so I suggest we postpone the full discussion until next week. --Brian Eager Stitching in DyC Brian K. Grant ******************************************************************************** The Problem ******************************************************************************** We would like to statically transform the CFG for the combined setup and stitching of the dynamic region (I'll call this the SCFG, for static CFG) such that basic blocks corresponding to specialized basic blocks that are to be eagerly stitched (I'll call this the DCFG, for dynamic CFG) will be reached, without maintaining a stack or worklist at run-time (except in the case of multi-way loop unrolling). The problem has two levels: 1) The order that the setup/stitcher blocks (of the SCFG) are ordered statically, and 2) The order in which specialized blocks (of the DCFG) will be stitched at run-time. We would like both to correspond roughly to a topological traversal of the basic blocks, for several reasons (first three are pretty standard): 1) Shorter live ranges, solving our problem with Multiflow and hopefully producing fewer spills, 2) Minimization of the number of branches, 3) Spacial locality, and (at run-time) 4) Minimization of checks for whether blocks of the DCFG have already been stitched or not. What makes this problem difficult is that in a topological traversal, a merge is not visited except until it is reached by its last incoming edge. However, at setup/stitch time we might not traverse the last incoming edge due to the presence of static branches on the path to that edge. To preserve the topological ordering of blocks we must wait until that edge is either traversed or passed by before we visit its destination. So, the tricky part is figuring out where to check whether that edge was traversed or not and to go back and stitch the unstitched blocks. An alternative to this checking is splitting the CFG at the merges, duplicating the code. Optimizing the resulting graph to eliminate unnecessary duplicates (tail merging) looks as difficult as inserting checks, so I'm going with inserting checks. At the moment, this discussion assumes that the SCFG initially looks like the original CFG, back edges and all, and that there is more or less a one-to-one correspondence between blocks in the SCFG and blocks in the DCFG. However, this may not be the case. This brings up two questions: 1) How do blocks in the DCFG correspond to blocks in the SCFG? We need to determine whether global optimizations on the templates moving instructions across basic-block boundaries is a problem or not. This problem also exists with lazy stitching. 2) Should we perform the CFG transformation before or after merging the templates with the setup code? If we do the transformation before, we have to make sure that no new basic blocks are added to the SCFG during the merge. If we do it after, we have to be sure that we don't modify the DCFG, which is embedded in the SCFG that we are transforming. ******************************************************************************** Structured vs. unstructured code ******************************************************************************** One possible definition of structured code is code containing only looping and decision constructs that are strictly nested, with no short cuts or gotos. However, most C code contains simple forms of unstructure: -- short-circuiting Boolean expressions -- break -- continue -- fall-through cases -- goto bottom of function Ideally, we should be able to handle these simple forms of unstructure as efficiently as perfectly structured code. Can we come up with more lenient definitions of structure that include these cases? Can we identify the underlying structure in a CFG with a few stray "unstructured edges"? If we could determine which edges were unstructured and which ones were structured, we could possibly eliminate the unstructured ones when transforming the CFG in order to preserve its structure. -- OR in reachability conditions: introduced at merges that are the destinations of unstructured control flow from static branches Doesn't capture unstructuredness when dynamic branches are involved. Also includes the simple forms of unstructure. -- What about cross edges? What about forward edges? See below. ******************************************************************************** Classification of edges ******************************************************************************** I made some mistakes in the labeling of cross edges in my previous writeup. I'll try to clarify that here. The definitions of tree, forward, and cross edges of the CFG come from the DFS tree built when traversing the CFG. In the DFS traversal, a merge node is traversed the first time it is reached and the algorithm must backtrack on all subsequent visits. This is the opposite of the depth-first topological traversal, in which a merge node is only traversed once it is reached from the last parent to be traversed. Nodes are added to the DFS tree as the CFG is traversed. A tree edge is an edge to a node not yet in the DFS tree (the first edge to a block). A forward edge is an edge to a successor in the DFS tree (an nth edge into a merge). A cross edge is an edge to a node that is neither a successor or predecessor in the DFS tree. Any right-hand edge entering a merge is a cross edge, even the edge from the else clause of a simple if statement. Cross edges that don't fall into this category generally point backwards in the program text, but are not loop edges. One example is a goto from an else clause to a then clause of the same if statement. Equivalent unstructured constructs in the other direction produce forward edges. In fact, an unstructured edge can be a tree edge of the DFS tree, making the structured edge a forward edge. This happens with fall-through cases in switch statements. So, it is impossible to classify an edge as structured or unstructured based purely on its DFS classification. ******************************************************************************** Should we remove any edges? ******************************************************************************** If we remove any (e.g., forward, cross, back) edges, we may violate the topological ordering of blocks. For every such violation, we have to insert checks to stop stitching. This introduces two problems: 1) Where do we insert the checks? 2) How do we know where we should continue stitching if we decide to stop stitching along one path? A function-call-like mechanism would necessitate saving and restoring state at different points in the CFG, similar to what is required for lazy stitching. A long list of explicit checks (if we came from X, goto to Y) might be just as inefficient. By removing edges, we may also create dead-ends in the CFG. Back edges are a special case. They can be removed without creating dead-ends. They generally have to be removed (or at least ignored) when performing a topological traversal. We still may get in trouble if the last non-back edge into the destination of a back edge has more restrictive or different reachability conditions than the back edge. ******************************************************************************** Why can't all subgraphs corresponding to static conditionals be left unchanged? ******************************************************************************** Study the following diagram of a CFG. Assume S is a static branch, and Z is "the sky" (an edge from some unspecified place in the CFG). | | S Z / \ / A B \ / M | Why can't we leave edges SB, AM, and BM intact and place a test (does B need to be stitched?) on edge ZB? The main impediment to leaving those edges intact is identifying that there is something special about the (nonconstant) merges B and M. Edges SA, AM, and SB in the picture above are tree edges, BM is a cross edge, and ZB could be a forward or cross edge (assume for now that it is not a back edge). If we could identify that there were something special about the edge ZB, we could eliminate it from the graph and recompute the reachability conditions. That would eliminate the merge at B and make M a constant merge. Unfortunately, I don't know how to identify that edge. Here is a more concrete example (D dynamic, S static): if (D && S) {A} else {B} {M} This code translates to the following CFG: | D / \ S | B and M are nonconstant merges / \ / A B \ / M | Edges DB and BM are forward edges. We always want to eagerly stitch B, but only want to stitch A if S is true. So, we want to create the following CFG: | D | S /| A | \| B | M | What we have done is redirected edge AM to point to the destination of edge DB (the next edge from a dynamic branch that we backtrack to in the traversal), creating edge AB. Edge SB is not really left untouched. It is also redirected to the destination of edge DB, which happens to be to the same block, B. Here are some similar examples (S denotes static branch, D dynamic): | | S S / \ / \ D | D | / \ / transformed -> | / B is a constant merge A B A / M is a nonconstant merge \ / |/ M B | | M | | D1 / \ D2 | / \ / transformed -> -D1-D2-A-B-M- A B B and M are nonconstant \ / merges with identical M reachability conditions | on all incoming edges | S1 / \ S2 | / \ / transformed -> unchanged A B B and M are constant merges \ / M | ******************************************************************************** Equivalence classes and control-dependence trees ******************************************************************************** One way of tackling the transformation problem is to identify subgraphs of the SCFG that are to be linearized, linearize those subgraphs, and connect them so that the appropriate subgraphs are stitched. To do this, we group blocks in nested equivalence classes. Blocks with the same reachability conditions are put in the same equivalence class. New nested equivalence classes are created at static branches and weird merges. A weird merge is one producing an OR in the reachability conditions. Subgraphs within equivalence classes by definition only contain dynamic branches and (possibly) nested equivalence classes. The blocks in the innermost equivalence classes contain only dynamic branches and can be linearized by topological sort. Once that is completed, the surrounding equivalence class can be linearized. Static branches would choose between linearized subgraphs at setup/stitch time. Subgraphs corresponding to equivalence classes created at weird merges would just be concatenated and appended to the subgraph of the parent equivalence class. So, by starting at the innermost nested equivalence classes and working out, the whole graph could be linearized. I'm not yet sure how back edges should be handled. The following diagram shows a CFG with two weird merges. ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 \ \ / / \ 8 / \ / / 9 / \ / 10 | EXIT The following tree shows the nested equivalence classes for this CFG. Blocks within parentheses are in the same equivalence class, and children in the tree are in nested equivalence classes. (ENTRY,1,10,EXIT) | \_ \_ S1 \_ \_ t/ \f \w \w (2) (3) (8)(9) | | S2 S3 t/ \f t/ \f (4) (5) (6) (7) From this equivalence-class tree, we construct the following CFG. ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 | | | | 9NS 8NS 8NS | \ / \ / \/ \/ \ / \ / \ / 10 | 8NS? /| 8 | \| 9NS? /| 9 | \ | \| EXIT So, all blocks are stitched that need to be. Since we don't want blocks 8 and 9 to be stitched unnecessarily, we set need-to-be-stitched flags for the blocks on the edges from blocks 4, 5, and 6 and check those flags before the blocks are stitched. How does this relate to control-dependence trees? Reachability conditions are a type of control-dependence information, where we are only concerned about dependence upon static branches. Building the nested equivalence classes based on reachability conditions is very similar to insertion of region nodes into the control-dependence subgraph. The region nodes become the parents of nodes with the same control dependencies and children of nodes with fewer dependencies. The main difference in the two approaches is in the handling of weird merges and back edges. Does this notion of equivalence classes make eager stitching: -- Easier to implement? -- Easier to understand? -- More or less efficient? I think the equivalence-class method is easy to understand, but building up the equivalence classes would be more work than it is worth. I also think that weird merges occur quite frequently and just appending the blocks and their tests in a linear fashion is less efficient than the algorithm I propose. It should be possible to insert subgraphs for equivalence classes for weird merges back into the CFG more intelligently than appending them to the end, but I haven't yet thought of one. However, I haven't yet worked on this approach for as long as I worked on the algorithm that follows, which I had nearly worked out before Craig came up with this idea. ******************************************************************************** Identifying predecessors in the CFG ******************************************************************************** Remove all back edges from the CFG. Perform a depth-first topological traversal and count the nodes as they are visited. These counts are the begin times. Then, reverse all edges (with back edges removed from the original graph), and do the same again. These counts are the end times. If the (begin, end) range of one node is completely contained within the range of another node, the first node is a successor of the second in the original CFG. ******************************************************************************** The Algorithm ******************************************************************************** Since we're traversing the SCFG and making modifications as we go along, we maintain two graphs for sanity (traverse the original and build a duplicate). By passing another parameter to transform(), we could eliminate the BACK_STACK, but it would complicate the code. This algorithm is quite short, but complicated, and therefore difficult to understand. I am nearly convinced that is correct, but I have been thinking about it for more than a month. Understandably, it may be more difficult to convince you that it is correct, and it would be nice to have a proof. I have included several comments to try to describe the rationale for each piece of the algorithm. Notation -------- RC is reachability conditions (modified) ME is mutually exclusive is_predecessor(A, B) is true if A is a predecessor of B SCFG2 is a duplicate CFG, initially containing duplicates of all the nodes, but no edges. The Algorithm ------------- 1) Peel off the first iteration of every one-way unrolled loop. This makes stitching of unrolled loops with multiple entries more efficient and any unnecessarily generated code will be eliminated. 2) Find all loops and annotate each loop-entry edge with the head of the outermost loop it enters. 3) Remove back edges from SCFG. 4) Redirect each loop-entry edge to point to the head of the outermost loop entered. 5) Insert empty nodes into the SCFG on all non-last edges into nonconstant merges. This prevents redirection to edges that will later themselves be redirected (edges from dynamic branches into nonconstant merges). 6) Recompute reachability conditions. 7) Compute SCFG predecessors. 8) Copy the nodes of SCFG to SCFG2. 9) Start the recursive transformation and topological traversal by calling transform(SCFG->entrynode). 10) Add back edges corresponding to loops to be unrolled to SCFG2. transform(node) { /* if we're done, quit */ if node == SCFG->exitnode { return } if ends_in_dynamic_branch(node) { /* Push the destinations of dynamic edges onto the stack so that we can redirect edges into nonconstant merges to them. We want the edges to be popped from BACK_STACK in the same order in which they are traversed. We don't redirect to edges emanating from static branches because: -- If we're redirecting, the RC of the edge are not ME with those of the last edge into a nonconstant merge. -- So, there must be an edge from a dynamic branch that leads to the path from which the last edge comes. -- The RC of the next edge from a static branch to be traversed must be ME with those of the redirected edge, and only one of the two paths should be stitched. */ for each outgoing edge node->e[i] (in reverse order) { push(BACK_STACK, node->e[i]->dest) } /* Add only the first edge to the new graph. The other branch destinations will be strung together. */ Add edge (node -> peek(BACK_STACK)) to SCFG2 } /* In the case of a static branch or single edge, copy all edges to SCFG2. */ else { for each outgoing edge node->e[i] { Add edge node->e[i] to SCFG2 } } /* Recursively invoke transform() on graph successors, backtracking at non-last edges into merges. */ for each outgoing edge node->e[i] { /* Remove the current edge from the stack so the top of the stack is the next dynamic edge we're going to handle. */ if ends_in_dynamic_branch(node) { assert(node->e[i]->dest == peek(BACK_STACK)) pop(BACK_STACK) } /* Case 1: edge is only incoming edge (not a merge) Simply move down the graph. */ /* If the edge destination is not a merge, then go to that node. */ if (not is_merge(node->e[i]->dest) { transform(node->e[i]->dest) } /* Case 2: last edge into a merge, an edge into a constant merge, or an edge into a nonconstant merge where the RC of the current edge are ME with those of the last edge Insert checks to go back if necessary and move down the graph if it's the last edge. */ /* we know the edge destination is a merge */ /* If the edge is the last edge or its reachability conditions are mutually exclusive with those of the last edge, we're going to stitch the merge next. So, if the merge is a nonconstant merge, check if we need to insert checks to go back and stitch any blocks to preserve topological order. At a constant merge (this case catches all constant merges), there won't be any blocks to go back and stitch. */ else if (node->e[i] == node->e[i]->dest->last_in_edge) or ( RC(node->e[i]) ME RC(node->e[i]->dest->last_in_edge) ) { if is_nonconstant_merge(node->e[i]->dest) { /* Merges in the CHECK_SET are tested in ascending order by their begin times. This ensures that no merge will be checked before a predecessor. That is necessary to ensure that we don't stitch the same part of the graph twice. */ for every node n in CHECK_SET { /* We need to insert a check to go back to a merge if the destination of the current edge is a successor of a merge in CHECK_SET, but the source of the edge is not. If the source is also a successor, we are guaranteed the merge to check has already been stitched due to the strict topological ordering, making the check unnecessary. */ if is_predecessor(n, node->e[i]->dest) && (not is_predecessor(n, node)) { Insert "is NTBS flag set for n?" on edge node->e[i] } } } /* If the edge is the last edge into a merge, then go to that node. If the edge is not the last edge into the merge, then we need to backtrack. */ if (node->e[i] == node->e[i]->dest->last_in_edge) { transform(node->e[i]->dest) } } /* Case 3: non-last edge into a nonconstant merge Redirect the edge and determine whether we might need to come back to the merge or not. */ /* we know the edge destination is a nonconstant merge */ /* we know the edge is not the last edge into the merge, so we're going to backtrack */ /* we know the reachability conditions of the edge are not mutually exclusive with those of the last edge, so we're going to redirect */ /* If the reachability conditions of the current edge are not mutually exclusive with those of the last edge (includes most edges at nonconstant merges), then redirect the edge and check whether the last edge might not be traversed. */ else { /* If reachability conditions of the current edge imply those of all remaining incoming edges (the last edge and all edges whose reachability conditions are mutually exclusive with those of the last edge), then we know we will reach the merge again and no check is needed. Otherwise, insert the check. */ rc_remaining_edges = RC(node->e[i]->dest->last_in_edge) for each incoming edge node->e[i]->dest->ie[j] { if ( RC(node->e[i]->dest->ie[j]) ME RC(node->e[i]->dest->last_in_edge) ) { rc_remaining_edges = rc_remaining_edges or RC(node->e[i]->dest->ie[j]) } } if ( rc_remaining_edges && RC(node->e[i]) ) != RC(node->e[i]) { /* The remaining edges might not be traversed, so insert a need-to- be-stitched flag and add the merge to the set of nodes to go back and check. */ assert(is_a_cross_edge( node->e[i]->dest->last_in_edge)) Insert "set NTBS flag for node->e[i]->dest" on edge node->e[i] Add node->e[i]->dest to CHECK_SET (begin times in ascending order) } Redirect edge node->e[i] in SCFG2 to peek(BACK_STACK), or add the edge if it is not present } } return } This algorithm correctly transforms the following CFG. | D1 / \ D2 E / \ \ A S | | / \ | \ B C | \ \ / / \ | / M1 / |/ M2 | To: | D1 | D2 | A | S / \ B C \ / M1 | E | M2 | ******************************************************************************** Does this algorithm correctly handle back edges? ******************************************************************************** What happens if the SCFG has back edges? Backedges for loops that are not unrolled can be ignored (not copied to the duplicate graph). Loops with multiple entries are handled correctly because all entering edges are redirected to the loop head, ensuring that the whole loop will be stitched and that no successor will be stitched before a predecessor. The following CFG is a good example of this situation. | S __ /|\ / | A | H | | \| | | Q | \ / | D | / \ | B T | | |__| With one-way unrolled loops, we should be able to just add the back edge to the duplicate graph after the transformation algorithm runs. Multi-way unrolled loops are a problem. If done eagerly, they require a stitch-time stack. I've only just begun thinking about how they affect this algorithm. ******************************************************************************** Integration with lazy stitching ******************************************************************************** How do we integrate this algorithm with the lazy stitching algorithm? Subgraphs of the SCFG to be stitched lazily should not be modified by this transformation, and movement between eager and lazy portions of the CFGs should be well-defined. Lazily stitched branches are similar, but not identical, to static branches. For example, we need to save some state before a lazy branch. Also, how do we identify merges below lazy branches so that we don't redirect edges? Do we need to identify lazy subgraphs in advance? ******************************************************************************** ******************************************************************************** ******************************************************************************** To: spin-comp@thistle.cs.washington.edu Subject: looks like we need lazy stitching Date: Mon, 15 Jul 1996 12:30:13 PDT From: Brian Grant I've discovered two problems with stitching only eagerly: 1) I don't think shareConst, cacheOne, and replicate make sense if we only stitch eagerly. 2) We never know the values of rtconsts at at start of a dynamic region until it is reached, and there are situations where we can encounter this problem inside the region as well. If we don't know the values, we can't stitch. See the following examples. DynRegion { ... x = foo(x, y, z); keepConst(x); if (d) x += 1; else x += 2; ... } Since we don't know the initial value of x, we can't compute its value on the two paths created at the merge of the if statement. Alternatively, we might know the initial value, but not the value on all paths. Here is an example of multi-way unrolling of a byte-code interpreter for a language that allows indirect jumps on run-time-computed values (e.g., a hardware simulator): keepConst(pc); pc = 0; while (opcode(pc) != RETURN) { ... switch opcode(pc) { case BR: pc += secondOperand(pc); break; case IJMP: pc = registerFile[firstOperand(pc)]; break; /* no way to compute this at stitch time! */ ... default: pc += 4; break; } } So, the question is: how do we implement lazy? If we only use lazy for specialization, we can't execute at the same time as stitching because the control flow of the stitcher is different from that of the original code. So, we need to alternate stitching and executing, but at what granularity? My first thought is to stitch until we hit a place where we need to specialize, then execute. Once the end of the stitched code is reached, go back and stitch some more (potentially skipping by the path that requires specialization). If we do lazy, we also need to relax the topological-ordering constraint because some blocks will not be stitched eagerly. So, the eager-stitching algorithm will have to be tweaked. As discussed before, the stitcher will need to run with a different stack than the stitched code, and we need to save and restore state when we switch between the stitcher and the stitched code. We'll need space to store state for every specialized path through the CFG that we don't complete. --Brian ******************************************************************************** ******************************************************************************** ******************************************************************************** To: mock Subject: new algorithm w/ splitting and specialization Date: Mon, 12 Jul 1996 From: Brian Grant Eager Stitching in DyC Brian K. Grant ******************************************************************************** Summary ******************************************************************************** The Problem -- Statically transform the CFG for eager setup & stitching -- Ensure all branch destinations are stitched (tricky: loops, cross edges) -- Stitch each block only once -- Minimize stitch-time overhead (statcks, work lists, checks, etc.) Assumptions (for now) -- No specialization other than simple unrolled loops -- Overlapping loops are disallowed (to ensure a well-defined outermost loop and unique loop head) The Solution -- Linearize the graph with a modified topological sort -- Don't serialize mutually exclusive paths -- Redirect loop entries to loop heads -- If a node might not be stitched, go back to it where the predecessor before successor rules would otherwise be violated -- Include back edges for unrolled loops The Algorithm 1) Peel 1st iteration off unrolled loops 2) Compute for each loop-entry edge the head of the outermost loop entered 3) Remove back edges from the graph 4) Insert empty nodes on all non-last edges into nonconstant merges 5) Redirect each loop-entry to the head of the outermost loop entered 6) Recompute reachability conditions 7) Compute graph predecessors 8) Copy the nodes to a duplicate graph 9) Invoke the recursive transformation routine on the entry node of the graph 10) Add the back edges corresponding to unrolled loops to the duplicate graph See the bottom of this writeup for the recursive transformation routine. ******************************************************************************** The Problem ******************************************************************************** We would like to statically transform the CFG for the combined setup and stitching of the dynamic region (I'll call this the SCFG, for static CFG) such that basic blocks corresponding to specialized basic blocks that are to be eagerly stitched (I'll call this the DCFG, for dynamic CFG) will be reached, without maintaining a stack or worklist at run-time (except in the case of multi-way loop unrolling). The problem has two levels: 1) The order that the setup/stitcher blocks (of the SCFG) are ordered statically, and 2) The order in which specialized blocks (of the DCFG) will be stitched at run-time. We would like both to correspond roughly to a topological traversal of the basic blocks, for several reasons (first three are pretty standard): 1) Shorter live ranges, solving our problem with Multiflow and hopefully producing fewer spills, 2) Minimization of the number of branches, 3) Spacial locality, and (at run-time) 4) Minimization of checks for whether blocks of the DCFG have already been stitched or not. What makes this problem difficult is that in a topological traversal, a merge is not visited except until it is reached by its last incoming edge. However, at setup/stitch time we might not traverse the last incoming edge due to the presence of static branches on the path to that edge. To preserve the topological ordering of blocks we must wait until that edge is either traversed or passed by before we visit its destination. So, the tricky part is figuring out where to check whether that edge was traversed or not and to go back and stitch the unstitched blocks. At the moment, this discussion assumes that the SCFG initially looks like the original CFG, back edges and all, and that there is more or less a one-to-one correspondence between blocks in the SCFG and blocks in the DCFG. However, this may not be the case. This brings up two questions: 1) How do blocks in the DCFG correspond to blocks in the SCFG? We need to determine whether global optimizations on the templates moving instructions across basic-block boundaries is a problem or not. This problem also exists with lazy stitching. 2) Should we perform the CFG transformation before or after merging the templates with the setup code? If we do the transformation before, we have to make sure that no new basic blocks are added to the SCFG during the merge. If we do it after, we have to be sure that we don't modify the DCFG, which is embedded in the SCFG that we are transforming. ******************************************************************************** Structured vs. unstructured code ******************************************************************************** One possible definition of structured code is code containing only looping and decision constructs that are strictly nested, with no short cuts or gotos. However, most C code contains simple forms of unstructure: -- short-circuiting Boolean expressions -- break -- continue -- fall-through cases -- early return -- goto bottom of function Ideally, we should be able to handle these simple forms of unstructure as efficiently as perfectly structured code. Can we come up with more lenient definitions of structure that include these cases? Can we identify the underlying structure in a CFG with a few stray "unstructured edges"? If we could determine which edges were unstructured and which ones were structured, we could possibly eliminate the unstructured ones when transforming the CFG in order to preserve its structure. -- OR in reachability conditions: introduced at merges that are the destinations of unstructured control flow from static branches Doesn't capture unstructuredness when dynamic branches are involved. Also includes the simple forms of unstructure. -- What about cross edges? What about forward edges? See below. ******************************************************************************** Classification of edges ******************************************************************************** The definitions of tree, forward, and cross edges of the CFG come from the DFS tree built when traversing the CFG. In the DFS traversal, a merge node is traversed the first time it is reached and the algorithm must backtrack on all subsequent visits. This is the opposite of the depth-first topological traversal, in which a merge node is only traversed once it is reached from the last parent to be traversed. Nodes are added to the DFS tree as the CFG is traversed. A tree edge is an edge to a node not yet in the DFS tree (the first edge to a block). A forward edge is an edge to a successor in the DFS tree (an nth edge into a merge). A cross edge is an edge to a node that is neither a successor or predecessor in the DFS tree. Any right-hand edge entering a merge is a cross edge, even the edge from the else clause of a simple if statement. Cross edges that don't fall into this category generally point backwards in the program text, but are not loop edges. One example is a goto from an else clause to a then clause of the same if statement. Equivalent unstructured constructs in the other direction produce forward edges. In fact, an unstructured edge can be a tree edge of the DFS tree, making the structured edge a forward edge. This happens with fall-through cases in switch statements. So, it is impossible to classify an edge as structured or unstructured based purely on its DFS classification. ******************************************************************************** Should we remove any edges? ******************************************************************************** If we remove any (e.g., forward, cross, back) edges, we may violate the topological ordering of blocks. For every such violation, we have to insert checks to stop stitching. This introduces two problems: 1) Where do we insert the checks? 2) How do we know where we should continue stitching if we decide to stop stitching along one path? A function-call-like mechanism would necessitate saving and restoring state at different points in the CFG, similar to what is required for lazy stitching. A long list of explicit checks (if we came from X, goto to Y) might be just as inefficient. By removing edges, we may also create dead-ends in the CFG or nodes with no predecessors. Back edges are a special case. They can be removed without orphaning nodes. They generally have to be removed (or at least ignored) when performing a topological traversal. We still may get in trouble if the loop has multiple entries. The solution is to redirect all entries to the loop head, ensuring that the entire loop is stitched. ******************************************************************************** Why can't all subgraphs corresponding to static conditionals be left unchanged? ******************************************************************************** Study the following diagram of a CFG. Assume S is a static branch, and Z is "the sky" (an edge from some unspecified place in the CFG). | | S Z / \ / A B \ / M | Why can't we leave edges SB, AM, and BM intact and place a test (does B need to be stitched?) on edge ZB? The main impediment to leaving those edges intact is identifying that there is something special about the (nonconstant) merges B and M. Edges SA, AM, and SB in the picture above are tree edges, BM is a cross edge, and ZB could be a forward or cross edge (assume for now that it is not a back edge). If we could identify that there were something special about the edge ZB, we could eliminate it from the graph and recompute the reachability conditions. That would eliminate the merge at B and make M a constant merge. Unfortunately, I don't know how to identify that edge. Here is a more concrete example (D dynamic, S static): if (D && S) {A} else {B} {M} This code translates to the following CFG: | D / \ S | B and M are nonconstant merges / \ / A B \ / M | Edges DB and BM are forward edges. We always want to eagerly stitch B, but only want to stitch A if S is true. So, we want to create the following CFG: | D | S /| A | \| B | M | What we have done is redirected edge AM to point to the destination of edge DB (the next edge from a dynamic branch that we backtrack to in the traversal), creating edge AB. Edge SB is not really left untouched. It is also redirected to the destination of edge DB, which happens to be to the same block, B. Here are some similar examples (S denotes static branch, D dynamic): | | S S / \ / \ D | D | / \ / transformed -> | / B is a constant merge A B A / M is a nonconstant merge \ / |/ M B | | M | | D1 / \ D2 | / \ / transformed -> -D1-D2-A-B-M- A B B and M are nonconstant \ / merges with identical M reachability conditions | on all incoming edges | S1 / \ S2 | / \ / transformed -> unchanged A B B and M are constant merges \ / M | ******************************************************************************** Identifying predecessors in the CFG ******************************************************************************** Remove all back edges from the CFG. Perform a depth-first topological traversal and count the nodes as they are visited. These counts are the begin times. Then, reverse all edges (with back edges removed from the original graph), and do the same again. These counts are the end times. If the (begin, end) range of one node is completely contained within the range of another node, the first node is a successor of the second in the original CFG. ******************************************************************************** Equivalence classes and control-dependence trees ******************************************************************************** One way of tackling the transformation problem is to identify subgraphs of the SCFG that are to be linearized, linearize those subgraphs, and connect them so that the appropriate subgraphs are stitched. To do this, we group blocks in nested equivalence classes. Blocks with the same reachability conditions are put in the same equivalence class. New nested equivalence classes are created at static branches and weird merges. A weird merge is one producing an OR in the reachability conditions. Subgraphs within equivalence classes by definition only contain dynamic branches and (possibly) nested equivalence classes. The blocks in the innermost equivalence classes contain only dynamic branches and can be linearized by topological sort. Once that is completed, the surrounding equivalence class can be linearized. Static branches would choose between linearized subgraphs at setup/stitch time. Subgraphs corresponding to equivalence classes created at weird merges would just be concatenated and appended to the subgraph of the parent equivalence class. So, by starting at the innermost nested equivalence classes and working out, the whole graph could be linearized. I'm not yet sure how back edges should be handled. The following diagram shows a CFG with two weird merges. ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 \ \ / / \ 8 / \ / / 9 / \ / 10 | EXIT The following tree shows the nested equivalence classes for this CFG. Blocks within parentheses are in the same equivalence class, and children in the tree are in nested equivalence classes. (ENTRY,1,10,EXIT) | \_ \_ S1 \_ \_ t/ \f \w \w (2) (3) (8)(9) | | S2 S3 t/ \f t/ \f (4) (5) (6) (7) From this equivalence-class tree, we construct the following CFG. ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 | | | | 9NS 8NS 8NS | \ / \ / \/ \/ \ / \ / \ / 10 | 8NS? /| 8 | \| 9NS? /| 9 | \ | \| EXIT So, all blocks are stitched that need to be. Since we don't want blocks 8 and 9 to be stitched unnecessarily, we set need-to-be-stitched flags for the blocks on the edges from blocks 4, 5, and 6 and check those flags before the blocks are stitched. How does this relate to control-dependence trees? Reachability conditions are a type of control-dependence information, where we are only concerned about dependence upon static branches. Building the nested equivalence classes based on reachability conditions is very similar to insertion of region nodes into the control-dependence subgraph. The region nodes become the parents of nodes with the same control dependencies and children of nodes with fewer dependencies. The main difference in the two approaches is in the handling of weird merges and back edges. Does this notion of equivalence classes make eager stitching: -- Easier to implement? -- Easier to understand? -- More or less efficient? I think the equivalence-class method is easy to understand, but building up the equivalence classes would be more work than it is worth. I also think that weird merges occur quite frequently and just appending the blocks and their tests in a linear fashion is less efficient than the algorithm I propose (which doesn't change the graph in the example above). It should be possible to insert subgraphs for equivalence classes for weird merges back into the CFG more intelligently than appending them to the end, but I haven't yet thought of one. However, I haven't yet worked on this approach for as long as I worked on the algorithm that follows, which I had nearly worked out before Craig came up with this idea. ******************************************************************************** Node splitting and tail merging ******************************************************************************** The concept for this algorithm is based on splitting at merges and stitching a subset of the graph in topological order, the subset being determined by which static branches are taken at stitch-time. NB: I'm not thinking about specialization yet. The duplicate CFG built by the following algorithm is a tree, with new branches of the tree created at static branches. After the algorithm runs, another pass would attempt to merge the tails (quadratic in number of tails? exponential in number of static branches?). Without specialization (including loop unrolling), no stitch-time checks or data structures (stacks, queues, etc.) are necessary. Before we include each block in the tree, we check whether we might reach a loop tail (block ending in a back edge) but not the loop head (destination of the back edge). If this is possible, we insert the loop head in the sorted list of nodes to stitch. The test I use to check this is whether the reachability conditions of the next node in the sorted list imply the intersection of the reachability conditions of each tail and its corresponding head. Assumptions and notation ------------------------ -- Edges are ordered by the topological numbers of their destinations -- Edge->Dest is the destination node of Edge -- Node->Edge[0] is the first edge of Node -- SCFG->Start has a single successor -- BranchStack is a stack of branches of the tree (SCFG2) that need to be completed -- NextNodeList is a list of nodes of SCFG sorted in ascending order by topological number; it is a list of nodes that need to be stitched, but haven't been copied to the duplicate graph yet; no node can be inserted into the list more than once (the list contains no duplicates) -- LoopList is a list of node pairs sorted in ascending order by the topological number of the first node in the pair; it is a list of loop heads paired with their tails; a loop head may appear more than once if it has multiple tails; each tail appears only once in the list, since a back edge can only point to one head -- CopyList(L2, L1) creates a copy L2 of list L1 -- CopyNode(N2, N1) creates a copy N2 of node N1 (not including connecting edges) -- RC is reachability conditions -- is_predecessor(N1, N2) is true if N1 is a predecessor of N2 or N1 == N2, false if N1 succeeds N2 or N1 neither preceeds nor succeeds N2 Algorithm --------- CreateEmptyList(LoopList) FindLoopHeadsAndTails(LoopList, SCFG->Start) RemoveBackEdges(SCFG->Start) ComputeTopologicalNumbers(SCFG->Start) CopyNode(SCFG2->Start, SCFG->Start) CreateEmptyList(NextNodeList) InsertIntoList(NextNodeList, SCFG->Start->Edge[0]->Dest) CreateEmptyStack(BranchStack) PushOntoStack(BranchStack, (SCFG2->Start, NoCondition, NextNodeList, LoopList)) until EmptyStack(BranchStack) { /* Start on a branch of the tree */ (PreviousDupNode, EdgeCondition, NextNodeList, LoopList) = PopOffStack(BranchStack) /* Add to the branch until it is completed */ until EmptyList(NextNodeList) { /* Check if there is a loop that we need to stitch before the next node scheduled to be stitched. */ NextNode = HeadOfList(NextNodeList) for each node pair (Head, Tail) in LoopList { /* If the next node to be stitched is the loop head or if we already passed by the tail, we don't need to worry about this loop, so remove it from the list. */ if (NextNode == Head) or (Topo#(Tail) < Topo#(NextNode)) { RemoveFromList(LoopList, (Head, Tail)) } else { /* If the reachability conditions of the next node to be stitched imply the intersection of the reachability conditions of the head and the tail, then we might reach the tail from NextNode but not the head, so we had better ensure that we stitch the head. */ if RC(NextNode) ==> (RC(Head) && RC(Tail)) { InsertIntoList(NextNodeList, Head) RemoveFromList(LoopList, (Head, *)) } } } /* Copy the next node in the topological order that needs to be stitched. */ ThisOrigNode = RemoveHeadOfList(NextNodeList) CopyNode(ThisDupNode, ThisOrigNode) CreateEdge(PreviousDupNode, ThisDupNode, EdgeCondition) /* Static branches spawn new branches in the SCFG2 tree. The current branch follows the first edge. */ if EndsInStaticBranch(ThisOrigNode) { for each outgoing edge ThisOrigNode->Edge[i] except the first (in reverse topo. order) { CopyList(NewNodeList, NextNodeList) CopyList(NewLoopList, LoopList) InsertIntoList(NewNodeList, ThisOrigNode->Edge[i]->Dest) PushOntoStack(BranchStack, (ThisDupNode, Condition(ThisOrigNode->Edge[i]), NewNodeList, NewLoopList)) } InsertIntoList(NextNodeList, ThisOrigNode->Edge[0]->Dest) EdgeCondition = Condition(ThisOrigNode->Edge[0]) } else { /* Ensure all the children of this dynamic branch or singleton edge will be stitched in the proper order by inserting them into the sorted list. */ for each outgoing edge ThisOrigNode->Edge[i] { InsertIntoList(NextNodeList, ThisOrigNode->Edge[i]->Dest) } EdgeCondition = NoCondition } PreviousDupNode = ThisDupNode } } ******************************************************************************** The Algorithm ******************************************************************************** Since we're traversing the SCFG and making modifications as we go along, we maintain two graphs for sanity (traverse the original and build a duplicate). By passing another parameter to transform(), we could eliminate the BACK_STACK, but it would complicate the code. The topological traversal defines a unique order on the edges (by the topological numbers of the edge destinations). Notation -------- NTBS_LIST is a sorted list of nodes NTBS is need to be stitched HBS is has been stitched RC is reachability conditions (modified) ME is mutually exclusive is_predecessor(A, B) is true if A is a predecessor of B or A == B SCFG2 is a duplicate CFG, initially containing duplicates of all the nodes, but no edges. The Algorithm ------------- 1) Peel off the first iteration of every one-way unrolled loop. This makes stitching of unrolled loops with multiple entries more efficient and any unnecessarily generated code will be eliminated. 2) Find all loops and annotate each loop-entry edge with the head of the outermost loop it enters. 3) Remove back edges from SCFG. 4) Redirect each loop-entry edge to point to the head of the outermost loop entered. 5) Insert empty nodes into the SCFG on all edges from dynamic branches. This prevents redirection of the first edge from a dynamic branch, and prevents redirection to edges that will later themselves be redirected (edges from dynamic branches into nonconstant merges). It also ensures that the destination of each edge from a dynamic branch has only one incoming edge. 6) Recompute reachability conditions. 7) Compute SCFG predecessors. 8) Copy the nodes of SCFG to SCFG2. 9) Start the recursive transformation and topological traversal by calling transform(SCFG->entrynode). 10) Add back edges corresponding to loops to be unrolled to SCFG2. Invariants on the SCFG when transform is called: -- No back edges -- Edges from dynamic branches don't go into merges transform(node) { /* if we're done, quit */ if node == SCFG->exitnode { return } if ends_in_dynamic_branch(node) { /* Push the destinations of dynamic edges onto the stack so that we can redirect edges into nonconstant merges to them. We want the edges to be popped from BACK_STACK in the same order in which they are traversed. We don't redirect to edges emanating from static branches because: -- If we're redirecting, the RC of the edge are not ME with those of the last edge into a nonconstant merge. -- So, there must be an edge from a dynamic branch that leads to the path from which the last edge comes. -- The RC of the next edge from a static branch to be traversed must be ME with those of the redirected edge, and only one of the two paths should be stitched. */ for each outgoing edge node->e[i] (in reverse order) { push(BACK_STACK, node->e[i]->dest) } } /* Recursively invoke transform() on graph successors, backtracking at non-last edges into merges. */ for each outgoing edge node->e[i] { /* Remove the current edge from the stack so the top of the stack is the next dynamic edge we're going to handle. */ if ends_in_dynamic_branch(node) { assert(node->e[i]->dest == peek(BACK_STACK)) pop(BACK_STACK) } /* Case 1: edge is only incoming edge (not a merge) Simply move down the graph. */ /* If the edge destination is not a merge, then go to that node. */ if (not is_merge(node->e[i]->dest) { /* For dynamic branches, add only the first edge to the new graph. The other branch destinations will be strung together. All singleton edges and edges from static branches may be added. */ if (not ends_in_dynamic_branch(node)) or (ends_in_dynamic_branch(node) and node->e[i] is the first edge from node) { Add edge node->e[i] to SCFG2 } transform(node->e[i]->dest) } /* Case 2: last edge into a merge, an edge into a constant merge, or an edge into a nonconstant merge where the RC of the current edge are ME with those of the last edge Insert checks to go back if necessary and move down the graph if it's the last edge. */ /* we know the edge destination is a merge */ /* If the edge is the last edge or its reachability conditions are mutually exclusive with those of the last edge, we're going to stitch the merge next. So, if the merge is a nonconstant merge, check if we need to insert checks to go back and stitch any blocks to preserve topological order. At a constant merge (this case catches all constant merges), there won't be any blocks to go back and stitch. */ else if (node->e[i] == node->e[i]->dest->last_in_edge) or ( RC(node->e[i]) ME RC(node->e[i]->dest->last_in_edge) ) { if is_nonconstant_merge(node->e[i]->dest) { /* Merges in the NTBS_LIST will be tested in ascending order by their begin times. This ensures that no merge will be checked before a predecessor. That is necessary to ensure that we don't stitch the same part of the graph twice. The NTBS_LIST is actually in descending order and we insert the checks backwards. We may also be able to use the ordering to optimize this loop. */ v = node->e[i]->dest for every node n in NTBS_LIST { /* We need to insert a check to go back to a merge if the destination of the current edge is a successor of a merge in NTBS_LIST, but the source of the edge is not. If the source is also a successor, we are guaranteed the merge to check has already been stitched due to the strict topological ordering, making the check unnecessary. */ if is_predecessor(n, node->e[i]->dest) && (not is_predecessor(n, node)) { Add subgraph u "if (n NTBS && !(n HBS)), then goto n, else goto v" to SCFG2 v = u } } /* Connect node to the string of checks anchored to node->e[i]->dest. */ Add edge (node, v) to SCFG2 } else { /* No checks are necessary, so just add the edge. */ Add edge node->e[i] to SCFG2 } /* If the edge is the last edge into a merge, then go to that node. If the edge is not the last edge into the merge, then we need to backtrack. */ if (node->e[i] == node->e[i]->dest->last_in_edge) { transform(node->e[i]->dest) } } /* Case 3: non-last edge into a nonconstant merge Redirect the edge and determine whether we might need to come back to the merge or not. */ /* we know the edge destination is a nonconstant merge */ /* we know the edge is not the last edge into the merge, so we're going to backtrack */ /* we know the reachability conditions of the edge are not mutually exclusive with those of the last edge, so we're going to redirect */ /* we know the edge is a singleton edge or an edge from a static branch */ /* If the reachability conditions of the current edge are not mutually exclusive with those of the last edge (includes most edges at nonconstant merges), then redirect the edge and check whether the last edge might not be traversed. */ else { /* If reachability conditions of the current edge imply those of all remaining incoming edges (the last edge and all edges whose reachability conditions are mutually exclusive with those of the last edge), then we know we will reach the merge again and no check is needed. Otherwise, insert the check. */ rc_remaining_edges = RC(node->e[i]->dest->last_in_edge) for each incoming edge node->e[i]->dest->ie[j] { if ( RC(node->e[i]->dest->ie[j]) ME RC(node->e[i]->dest->last_in_edge) ) { rc_remaining_edges = rc_remaining_edges or RC(node->e[i]->dest->ie[j]) } } if ( rc_remaining_edges && RC(node->e[i]) ) != RC(node->e[i]) { /* The remaining edges might not be traversed, so insert a need-to- be-stitched flag and add the merge to the set of nodes to go back and check. */ assert(is_a_cross_edge( node->e[i]->dest->last_in_edge)) Add node n "set NTBS flag for node->e[i]->dest" to SCFG2 Add edge (node, n) to SCFG2 v = n Insert "set HBS flag for node node->e[i]->dest" in node node->e[i]->dest, if it is not there already Insert node->e[i]->dest into NTBS_LIST (begin times in descending order) } else { v = node } /* If we could have backtracked from the merge, we're going to want to skip back to it. */ backtrack = false for every node n in NTBS_LIST { if is_predecessor(n, node) && (not is_predecessor(n, node->e[i]->dest->last_in_edge)) { backtrack = true break } } /* If we don't want to go straight back to the merge, go to the redirected location. */ if backtrack { Add subgraph u "if peek(BACK_STACK) HBS, then goto node->e[i]->dest, else goto peek(BACK_STACK)" to SCFG2 Add edge (v, u) to SCFG2 Insert "set HBS flag for node peek(BACK_STACK)" in node peek(BACK_STACK), if it is not there already } else { Add edge (v, peek(BACK_STACK)) to SCFG2 } } } return } This algorithm correctly transforms the following CFG. | D1 / \ D2 E / \ \ A S | | / \ | \ B C | \ \ / / \ | / M1 / |/ M2 | To: | D1 | D2 | A | S / \ B C \ / M1 | E | M2 | ******************************************************************************** Does this algorithm correctly handle back edges? ******************************************************************************** What happens if the SCFG has back edges? Backedges for loops that are not unrolled can be ignored (not copied to the duplicate graph). Loops with multiple entries are handled correctly because all entering edges are redirected to the loop head, ensuring that the whole loop will be stitched and that no successor will be stitched before a predecessor. The following CFG is a good example of this situation. | S __ /|\ / | A | H | | \| | | Q | \ / | D | / \ | B T | | |__| With one-way unrolled loops, we should be able to just add the back edge to the duplicate graph after the transformation algorithm runs. We also need to ensure that NTBS and HBS flags for nodes within the unrolled loop are duplicated for each copy of the loop body. Multi-way unrolled loops are a problem. If done eagerly, they require a stitch-time stack. I've only just begun thinking about how they affect this algorithm. ******************************************************************************** Integration with lazy stitching, multi-way unrolling, and other specialization ******************************************************************************** How do we integrate this algorithm with the lazy stitching algorithm? Subgraphs of the SCFG to be stitched lazily should not be modified by this transformation, and movement between eager and lazy portions of the CFGs should be well-defined. Lazily stitched branches are similar, but not identical, to static branches. For example, we need to save some state before a lazy branch. Also, how do we identify merges below lazy branches so that we don't redirect edges? Do we need to identify lazy subgraphs in advance? Even without lazy, how do we handle multi-way unrolling and other specialization? ******************************************************************************** ******************************************************************************** ******************************************************************************** To: mock Subject: latest stitching writeup w/ splitting and specialization Date: Mon, 20 Jul 1996 From: Brian Grant Stitching in DyC Brian K. Grant ******************************************************************************** Summary ******************************************************************************** I will generally refer to the combined task of setup and stitching simply as stitching. The Problem ----------- -- Statically transform the CFG for combo eager & lazy setup & stitching by using a modified topological sort; this produces code the stitcher simply follows, eliminating the need for the stitcher to manipulate or traverse the templates Why Perform A Static Transformation? ------------------------------------ -- Minimize stitch-time overhead by -- Solving our problem with Multiflow, allowing setup computations to be performed using temps -- Eliminating the need for stacks and work lists to keep track of setup computations to perform and templates that need to be stitched -- Minimizing checks of whether blocks need to be stitched or not -- Reducing context-switching overhead (between stitched code and the stitcher) by eagerly stitching as much as possible and by generating optimized fixup code -- Fits with the idea of specializing the stitcher to the dynamic region Why Use A Topological Sort? --------------------------- -- It is _the_ traversal of the CFG -- Both depth-first and bread-first topological traversals are possible, but DF is easier to implement -- No successor is traversed before a predecessor, so graph invariants are easier to maintain (especially if information is propagated along edges) -- Preserves data-flow dependencies -- For structured programs, the code generated is in has the same layout as the source program -- It is easy to understand -- Why not DFS? -- Graph invariants are difficult or impossible to maintain because merges are traversed before all predecessors have been traversed -- Why not something else (like BFS)? -- Length of live ranges (short with TS, DFS) -- Spacial locality (good with TS, DFS) -- Number of branches (low with TS, DFS) -- Doesn't preserve data dependencies What's So Difficult About That? ------------------------------- -- We don't want to stitch all blocks, just all blocks we need to stitch, and we don't know which blocks those are statically -- We must ensure all branch destinations are stitched This is tricky because of -- Unstructured code -- Combinations of static and dynamic branches -- Replicating specialization (keepConst) -- Loops, especially overlapping ones, are the biggest problem Competing Strategies -------------------- 1) Edge Redirection -- Perform a topological traversal of the graph -- Include only the first edge from a dynamic branch, all edges from static branches -- Redirect certain edges at nonconstant merges -- Non-last edges whose reachability conditions are not mutually exclusive with those of the last edge -- Redirect to the next edge from a dynamic branch that we will traverse -- Blocks that need to be stitched could be bypassed because only the last edge into a merge is not redirected, so if we could miss some blocks, insert them with NTBS checks The remaining questions are: -- Elimination of redundant NTBS checks? -- Loops? -- Specialization? 2) Equivalence Classes on Reachability Conditions and Specialized Variables -- Compute nested equivalence classes of nodes with the same reachability conditions -- New nested equivalence classes begin at the destinations of static branches and at "weird merges", merges that introduce OR's into the reachability conditions (note that this includes some constant merges) -- Innermost equivalence classes are topologically sorted, then connected to their parent class, which can then sorted, and so on -- Equivalence classes started at the children of static branches are simply connected to the static branches; constant weird merges can also be connected by their original incoming edges; nonconstant weird merges must be appended to the end of the parent graph The remaining questions are: -- Is there a more efficient way to connect equivalence classes started by nonconstant weird merges? -- How do loops affect this algorithm? -- We must be careful that the NTBS flags for the heads of loops entered are set -- We must also ensure that the flags are set before they are tested (can't just set them on back edges) -- How does specialization affect this algorithm? -- We need to intersect RC equiv. classes with Kconst equiv. classes -- A nested Kconst equiv. class should be stitched eagerly if the new variables kept constant were already rtconsts -- We need a stack for eagerly stitching multiple versions 3) Splitting and Merging Plus Topological Sort Plus Equivalence Classes on Specialized Variables ******************************************************************************** Classification of edges ******************************************************************************** I will sometimes use DFS edge-classification terminology when describing certain types of edges. The edge classifications tree, forward, and cross have no particular significance in a CFG. Back edges, however, do, and they must be eliminated before performing a topological traversal. The definitions of tree, forward, and cross edges of the CFG come from the DFS tree built when traversing the CFG. In the DFS traversal, a merge node is traversed the first time it is reached and the algorithm must backtrack on all subsequent visits. This is, in a sense, the opposite of the depth-first topological traversal, in which a merge node is only traversed once it is reached from the last parent to be traversed. Nodes are added to the DFS tree as the CFG is traversed. A tree edge is an edge to a node not yet in the DFS tree (the first edge to a block). A forward edge is an edge to a successor in the DFS tree. A cross edge is an edge to a node that is neither a successor or predecessor in the DFS tree. ******************************************************************************** Identifying predecessors in the CFG ******************************************************************************** Predecessor information is used in the algorithms. 1) Identify back edges using DFS. 2) Remove all back edges from the CFG. 3) Perform a depth-first topological traversal and count the nodes as they are visited. These counts are the topological numbers and will also serve as begin times. 4) Reverse all edges and continue counting from the bottom of the CFG to the top. These counts are the end times. The bottom of the CFG will have end_time = begin_time + 1. If the (begin, end) range of one node is completely contained within the range of another node, the first node is a successor of the second in the original CFG. ******************************************************************************** Can we preserve structured subgraphs? Can we preserve subgraphs beginning at static branches? ******************************************************************************** Suppose we want to preserve "structured subgraphs" during the graph transformation. Can we identify the underlying structure in a CFG with a few stray "unstructured edges"? To do that, we would need definitions of "structured" and "unstructured". One possible definition of structured code is code containing only looping and decision constructs that are strictly nested, with no short cuts or gotos. However, most C code contains simple forms of unstructure: -- short-circuiting Boolean expressions -- break -- continue -- fall-through cases -- early return -- goto bottom of function Ideally, we should be able to handle these common forms of unstructure as efficiently as perfectly structured code. Can we come up with more lenient definitions of "structured subgraphs" that include these cases? Unfortunately, isolating the "unstructuredness" of a CFG is very difficult. One characteristic of a node in the graph we could use to locate "unstructuredness" is whether a new OR is introduced into the reachability conditions at a merge. We'll call merges with new ORs introduced into the reachability conditions weird merges. It turns out we'll use the concept of weird merges in one of the algorithms. Let's look at how we would want to transform some simple graphs with a little unstructuredness. Study the following diagram of a CFG. Assume S is a static branch, and Z is "the sky" (an edge from some unspecified place in the CFG). | | S Z / \ / A B \ / M | Why can't we leave edges SB, AM, and BM intact and place a test (does B need to be stitched?) on edge ZB? The main impediment to leaving those edges intact is identifying that there is something special about the nonconstant merges B and M, that treatment of the edge ZB should be delayed. The merge at B may not even be a weird merge, such as in the following example. Here is a more concrete example (D dynamic, S static): if (D && S) {A} else {B} {M} This code translates to the following CFG: | D / \ S | B and M are nonconstant merges, but not weird / \ / A B \ / M | The reachability conditions at B and M are the same as those at D, so we always want to eagerly stitch B, but only want to stitch A if S is true. So, we want to create the following CFG: | D | S /| A | \| B | M | If we look at this from the edge-redirection perspective, then we have redirected edge AM to point to the destination of edge DB (the next edge from a dynamic branch that we backtrack to in the traversal), creating edge AB. Edge SB is not really left untouched. It is also redirected to the destination of edge DB, which happens to be to the same block, B. From the RC-equivalence-class perspective, blocks D, S, B, and M are in the same equivalence class, with A being in a nested class. A is made conditional on the test at S being true and the other blocks are topologically sorted. From the splitting and merging perspective, we connect D to S, then split at S, creating two tails ABM and BM. We obtain the above graph by merging the BM suffixes. Here are some similar examples (S denotes static branch, D dynamic): | | S S / \ / \ D | D | / \ / transformed -> | / B is a constant merge A B A / M is a nonconstant merge \ / |/ M B | | M | | D1 / \ D2 | / \ / transformed -> -D1-D2-A-B-M- A B B and M are nonconstant \ / merges with identical M reachability conditions | on all incoming edges | S1 / \ S2 | / \ / transformed -> unchanged A B B and M are constant merges \ / B is a weird merge M | ******************************************************************************** Equivalence Classes based on Reachability Conditions ******************************************************************************** One way of tackling the transformation problem is to identify subgraphs of the SCFG that are to be linearized, linearize those subgraphs, and connect them so that the appropriate subgraphs are stitched. To do this, we group blocks in nested equivalence classes. Blocks with the same reachability conditions are put in the same equivalence class. Thus, new nested equivalence classes are created at static branches and weird merges. A weird merge is one introducing an OR into the reachability conditions. The blocks in the innermost equivalence classes contain no static branches and so can be linearized by topological sort. Once that is completed, the surrounding equivalence class can be linearized. So, by starting at the innermost nested equivalence classes and working out, the whole graph can be linearized. A subgraph of a nested equivalence class spawned by a static branch is connected to the subgraph of its parent by the original static branch. Static branches, then, choose between linearized subgraphs at stitch-time. Subgraphs of weird-merge equivalence classes are connected to the parent graph in different ways, depending on whether the merge is constant or nonconstant. can be connected to their parents in the original CFG. set flag on edge from original parent concatenated and appended to the subgraph of the parent equivalence class. The following diagram shows a CFG with two constant weird merges. ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 \ \ / / \ 8 / \ / / 9 / \ / 10 | EXIT The following tree shows the nested equivalence classes for this CFG. Blocks within parentheses are in the same equivalence class, and children in the tree are in nested equivalence classes. (ENTRY,1,10,EXIT) | \_ \_ S1 \_ \_ t/ \f \w \w (2) (3) (8)(9) | | S2 S3 t/ \f t/ \f (4) (5) (6) (7) From this equivalence-class tree, we construct the following CFG. ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 | | | | 9NS 8NS 8NS | \ / \ / \/ \/ \ / \ / \ / 10 | 8NS? /| 8 | | | 9NS | \| 9NS? /| 9 | \ | \| EXIT So, all blocks are stitched that need to be. Since we don't want blocks 8 and 9 to be stitched unnecessarily, we set need-to-be-stitched flags for the blocks on the edges from blocks 4, 5, and 6 and check those flags before the blocks are stitched. How does this relate to control-dependence trees? Reachability conditions are a type of control-dependence information, where we are only concerned about dependence upon static branches. Building the nested equivalence classes based on reachability conditions is very similar to insertion of region nodes into the control-dependence subgraph. The region nodes become the parents of nodes with the same control dependencies and children of nodes with fewer dependencies. The main difference in the two approaches is in the handling of weird merges and back edges. ******************************************************************************** Splitting and Merging + Topological Sort + Equivalence Classes on Kconst ******************************************************************************** The concept for this algorithm is based on splitting at merges and stitching a subset of the graph in topological order, the subset being determined by which static branches are taken at stitch-time. The duplicate CFG built by the following algorithm is a forest, with new roots created at special merges and new tree branches created at static branches. After the algorithm runs, another pass will attempt to merge the tails (quadratic in number of tails? exponential in number of static branches?). Before we include each block in the tree, we check whether we might reach a loop tail (block ending in a back edge) but not the loop head (destination of the back edge). If this is possible, we insert the loop head in the sorted list of nodes to stitch. The test I use to check this is whether the reachability conditions of the next node in the sorted list imply the intersection of the reachability conditions of each tail and its corresponding head. Each tree created corresponds to a group of blocks that will be stitched eagerly. The tree rooted at the start of the original CFG is stitched first, then the stitcher will jump to the stitched code. Other trees will be stitched lazily. In this version of the writeup, I haven't included the details about how to link the trees together. We need to ensure that the values of live rtconsts are carried to the roots of the trees. We also need to ensure that the stitcher returns to the right root when more code needs to be stitched lazily. This is not as straightforward as it seems because there can be multiple entries into unrolled loops. I also haven't worked out all of the details of the stitcher, such as branch patching. Because many copies and versions of the same block can be stitched, patching correctly will be nontrivial. Assumptions and notation ------------------------ -- The BTA runs before this algorithm, computing the reachability conditions and marking merges as special where specialization must begin -- RC is reachability conditions -- IsSpecial(N) is true if N is a merge where we need to begin specializing; that is, N is a merge with multiple reaching definitions for some variable in the set Kconst(N) -- MakeUnspecial(N) causes IsSpecial(N) to return false -- Zero(N) returns 0 -- Topo#(N) returns the topological number of node N -- IsPredecessor(N1, N2) is true if N1 is a predecessor of N2 or N1 == N2, false if N1 succeeds N2 or N1 neither preceeds nor succeeds N2 -- CopyNode(N2, N1) creates a copy N2 of node N1 (not including connecting edges) -- Edges are ordered by the topological numbers of their destinations -- Edge->Dest is the destination node of Edge -- Node->Edge[0] is the first edge of Node -- SCFG->Start has a single successor -- (F0, F1, ..., FN) is an (N+1)-tuple of objects F0, F1, F2, ..., FN; F0 is in field 0, F1 is in field 1, etc. -- Field(Tuple, N) returns field N of Tuple -- Stacks: stacks contain fixed-length tuples of pointers or scalars CreateEmptyStack(Stack, Size) creates the empty stack Stack of tuples of size Size DestroyStack(Stack) frees Stack EmptyStack(Stack) returns true if Stack is empty, false otherwise PushOntoStack(Stack, Tuple) pushes Tuple onto Stack Tuple = PopOffStack(Stack) pops and returns the top of Stack Tuple = PeekAtStack(Stack) returns the top of Stack without popping -- BranchStack is a stack of 5-tuples: (Node, EdgeCondition, List, List, List); the tuples contain information about branches of the tree (SCFG2) that need to be completed -- Lists: sorted lists contain fixed-length tuples of pointers or scalars and no duplicate tuples CreateEmptyList(List, Size, SortField, ValueFunction) creates a list of tuples of size Size that is sorted in ascending order by the value returned by invoking ValueFunction on field SortField of each tuple DestroyList(List) frees List EmptyList(List) returns true if List is empty, false otherwise InsertIntoList(List, Tuple) inserts Tuple into List in the proper position RemoveFromList(List, Tuple) removes Tuple from List; a wildcard (*) may be used in any tuple field IsInList(List, Tuple) returns true if Tuple is in List, false otherwise Tuple = HeadOfList(List) returns the head of the list (least tuple by the ValueFunction) Tuple = TailOfList(List) returns the tail of the list (greatest tuple by the ValueFunction) CopyList(NewList, OldList) creates a copy NewList of list OldList -- SpecializationRootList is a list of nodes in the SCFG sorted in ascending order by topological number; it is a list of roots of specialized subgraphs that must be created -- NextNodeList is a list of nodes of SCFG sorted in ascending order by topological number; it is a list of nodes that need to be stitched, but haven't been copied to the duplicate graph yet -- LoopHeadList is a list of (LoopHead, LoopTail) node pairs sorted in ascending order by the topological number of the first node in the pair; a loop head may appear more than once if it has multiple tails; each tail appears only once in the list, since each back edge can only point to one head -- LoopTailList contains the same node pairs as LoopHeadList, but is sorted by the second node in the pair Algorithm --------- CreateEmptyList(OriginalHeadList, 2, 0, Topo#) CreateEmptyList(OriginalTailList, 2, 1, Topo#) FindLoopHeadsAndTails(OriginalHeadList, OriginalTailList, SCFG->Start) RemoveBackEdges(SCFG->Start) ComputeTopologicalNumbers(SCFG->Start) CreateEmptyList(SpecializationRootList, 1, 0, Topo#) InsertIntoList(SpecializationRootList, (SCFG->Start)) CreateEmptyList(RootsDone, 1, 0, Topo#) /* This list doesn't need to be sorted */ CreateEmptyList(GraphsDone, 1, 0, Zero) until EmptyList(SpecializationRootList) { /* Create a dummy node from which to hang this subgraph */ CreateEmptyNode(DummyNode) InsertIntoList(GraphsDone, (DummyNode)) /* Get the root of this subgraph */ (ThisRoot) = RemoveHeadOfList(SpecializationRootList) InsertIntoList(RootsDone, (ThisRoot)) CreateEmptyList(NextNodeList, 1, 0, Topo#) InsertIntoList(NextNodeList, (ThisRoot)) /* We don't want to stop at the root again */ MakeUnspecial(ThisRoot) /* Initialize the loop lists and prune them by eliminating tails that are not reachable. We could compute the transitive closure instead. */ CopyList(TmpTailList, OriginalTailList) until EmptyList(TmpTailList) { (Head, Tail) = TailOfList(TmpTailList) /* If the root or another loop head precedes the tail, then it is reachable. */ if (IsPredecessor(ThisRoot, Tail)) { InsertIntoList(LoopHeadList, (Head, Tail)) InsertIntoList(LoopTailList, (Head, Tail)) } else if (not EmptyList(LoopHeadList)) { CopyList(TmpHeadList, LoopHeadList) until EmptyList(TmpHeadList) { (ReachedHead, ReachedTail) = HeadOfList(TmpHeadList) if IsPredecessor(ReachedHead, Tail) { InsertIntoList(LoopHeadList, (Head, Tail)) InsertIntoList(LoopTailList, (Head, Tail)) break } RemoveFromList(TmpHeadList, (ReachedHead, ReachedTail)) } DestroyList(TmpHeadList) } RemoveFromList(TmpTailList, (Head, Tail)) } /* Add all heads preceding the root to the NextNodeList. */ while ((not EmptyList(LoopHeadList)) and (Topo#(Field(HeadOfList(LoopHeadList), 0)) < Topo#(ThisRoot)) { (Head, Tail) = HeadOfList(LoopHeadList) InsertIntoList(NextNodeList, (Head)) RemoveFromList(LoopHeadList, (Head, *)) RemoveFromList(LoopTailList, (Head, *)) } CreateEmptyStack(BranchStack, 5) PushOntoStack(BranchStack, (DummyNode, NoCondition, NextNodeList, LoopHeadList, LoopTailList)) until EmptyStack(BranchStack) { /* Start on a branch of the tree */ (PreviousDupNode, EdgeCondition, NextNodeList, LoopHeadList, LoopTailList) = PopOffStack(BranchStack) /* Add to the branch until it is completed */ until EmptyList(NextNodeList) { (NextNode) = HeadOfList(NextNodeList) /* Prune the loop lists */ if (not EmptyList(LoopTailList)) { /* Move all (head, tail) pairs with tails with topological numbers less than that of NextNode to a new list. */ (Head, Tail) = HeadOfList(LoopTailList) while (Topo#(Tail) < Topo#(NextNode)) { RemoveFromList(LoopHeadList, (Head, Tail)) RemoveFromList(LoopTailList, (Head, Tail)) InsertIntoList(TmpTailList, (Head, Tail)) if (not EmptyList(LoopTailList)) { (Head, Tail) = HeadOfList(LoopTailList) } else { break } } /* We may reach the tails remaining in the loop lists. If there is a corresponding head whose topological number is less than or equal to the tail we're checking, then we must add that (head, tail) pair back to the loop lists. */ until EmptyList(TmpTailList) { (Head, Tail) = TailOfList(TmpTailList) if ((not EmptyList(LoopHeadList)) and ((Topo#(Field(HeadOfList(LoopHeadList), 0)) <= Topo#(Tail)))) { InsertIntoList(LoopHeadList, (Head, Tail)) InsertIntoList(LoopTailList, (Head, Tail)) } RemoveFromList(TmpTailList, (Head, Tail)) } } Assert(Topo#(Field(HeadOfList(LoopHeadList), 0)) > Topo#(NextNode)) /* Check if there is a loop that we need to stitch before the next node scheduled to be stitched. */ CopyList(TmpHeadList, LoopHeadList) until EmptyList(TmpHeadList) { (Head, Tail) = HeadOfList(TmpHeadList) /* If the next node to be stitched is the loop head, we don't need to worry about the loop, so remove it from the loop lists. */ if (NextNode == Head) { RemoveFromList(LoopHeadList, (Head, *)) RemoveFromList(LoopTailList, (Head, *)) RemoveFromList(TmpHeadList, (Head, *)) } /* If the reachability conditions of the next node to be stitched imply the intersection of the reachability conditions of the head and the tail, then we might reach the tail from NextNode but not the head, so we had better ensure that we stitch the head. */ else if RC(NextNode) ==> (RC(Head) && RC(Tail)) { InsertIntoList(NextNodeList, (Head)) RemoveFromList(LoopHeadList, (Head, *)) RemoveFromList(LoopTailList, (Head, *)) RemoveFromList(TmpHeadList, (Head, *)) } else { RemoveFromList(TmpHeadList, (Head, Tail)) } } DestroyList(TmpHeadList) /* If the next node is a merge where we need to begin specializing, then it becomes a root of its own subgraph and we will come back to it. */ while IsSpecial(Field(HeadOfList(NextNodeList), 0)) { (NewRoot) = RemoveHeadOfList(NextNodeList) if (not IsInList(RootsDone, (NewRoot))) { InsertIntoList(SpecializationRootList, (NewRoot)) } } /* Copy the next node in the topological order that needs to be stitched. */ (ThisOrigNode) = RemoveHeadOfList(NextNodeList) CopyNode(ThisDupNode, ThisOrigNode) CreateEdge(PreviousDupNode, ThisDupNode, EdgeCondition) /* Static branches spawn new branches in the SCFG2 tree. The current branch follows the first edge. */ if EndsInStaticBranch(ThisOrigNode) { for each outgoing edge ThisOrigNode->Edge[i] except the first (in reverse topo. order) { CopyList(NewNodeList, NextNodeList) CopyList(NewLoopList, LoopList) InsertIntoList(NewNodeList, (ThisOrigNode->Edge[i]->Dest)) PushOntoStack(BranchStack, (ThisDupNode, Condition(ThisOrigNode->Edge[i]), NewNodeList, NewLoopList)) } InsertIntoList(NextNodeList, (ThisOrigNode->Edge[0]->Dest)) EdgeCondition = Condition(ThisOrigNode->Edge[0]) } else { /* Ensure all the children of this dynamic branch or singleton edge will be stitched in the proper order by inserting them into the sorted list. */ for each outgoing edge ThisOrigNode->Edge[i] { InsertIntoList(NextNodeList, (ThisOrigNode->Edge[i]->Dest)) } EdgeCondition = NoCondition } PreviousDupNode = ThisDupNode } DestroyList(NextNodeList) DestroyList(LoopHeadList) DestroyList(LoopTailList) } DestroyStack(BranchStack) DestroyList(TmpTailList) } DestroyList(TmpTailList) DestroyList(OriginalHeadList) DestroyList(OriginalTailList) DestroyList(SpecializationRootList) DestroyList(GraphsToBuild) MergeTails(GraphsDone) ******************************************************************************** Specialization and lazy stitching ******************************************************************************** Specialization will require insertion of cache checks at merges where variables to be kept constant would have different incoming values. We'll need to build a new tree containing blocks with uses of rtconsts derived from the rtconsts that have changed and the intervening blocks. The new tree extends from the cache check to the unkeepConst of the variables (or bottom of the nested dynamic region, such as for unrolled loops). The original tree (well, the graph, after tail merging) will be specialized and stitched first. The new tree (again, graph after merging) will be stitched each time we respecialize for different values. It looks like specialization will have to be handled lazily. I've discovered two problems with stitching only eagerly: 1) I don't think shareConst, cacheOne, and replicate make sense if we only stitch eagerly. 2) We never know the values of rtconsts at at start of a dynamic region until it is reached, and there are situations where we can encounter this problem inside the region as well. If we don't know the values, we can't stitch. See the following examples. DynRegion { ... x = foo(x, y, z); keepConst(x); if (d) x += 1; else x += 2; ... } Since we don't know the initial value of x, we can't compute its value on the two paths created at the merge of the if statement. So, if any of the arguments of a keepConst or isConst is not already an rtconst, it creates a region that cannot be stitched until it is reached and so it must be stitched lazily (e.g., the outermost dynamic region is not stitched until it is reached). So, the question is: how do we implement lazy? If we only use lazy for specialization, we can't execute at the same time as stitching because the control flow of the stitcher is different from that of the original code. So, we need to alternate stitching and executing, but at what granularity? My first thought is to stitch until we hit a place where we need to specialize, then execute. Once the end of the stitched code is reached, go back and stitch some more (potentially skipping by the path that requires specialization). If we do lazy, we also need to relax the topological-ordering constraint because some blocks will not be stitched eagerly. So, the eager-stitching algorithm will have to be tweaked. As discussed before, the stitcher will need to run with a different stack than the stitched code, and we need to save and restore state when we switch between the stitcher and the stitched code. We'll need space to store state for every specialized path through the CFG that we don't complete. ******************************************************************************** When replicating specialization stops ******************************************************************************** 1) At unkeepConst or end of enclosing dynamic region 2) When the BTA can no longer determine the variable's value based on other rtconsts. Here is an example of multi-way unrolling of a byte-code interpreter for a language that allows indirect jumps on run-time-computed values (e.g., a hardware simulator): keepConst(pc); pc = 0; while (opcode(pc) != RETURN) { ... switch opcode(pc) { case BR: pc += secondOperand(pc); break; case IJMP: pc = registerFile[firstOperand(pc)]; break; /* no way to compute this at stitch time! */ ... default: pc += 4; break; } } In this case, the pc can no longer be kept constant after the merge because it takes on a totally dynamic value, not one based on other rtconsts. ******************************************************************************** CFG Test Suite ******************************************************************************** Simple case #1: | D / \ S C / \ | A B | \ / | M | \ / Z | Edge ME with last edge: | D / \ A S | / \ | B C \ | / \ \ / \|/ Z | Several constant weird merges: ENTRY | 1 | S1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 \ \ / / \ 8 / \ / / 9 / \ / 10 | EXIT Some nonconstant weird merges: ENTRY | 1 | D1 / \ 2 3 | | S2 S3 / \ / \ 4 5 6 7 \ \ / / \ 8 / \ / / 9 / \ / 10 | EXIT A graph with edges that cut across both left and right: | D1 / \ S1 D2 / \ /|\ A | S3 E H | |/ \| | | B F | | | | | | S2 / | \ /\___ | C / \ / \ / I G / \ / M | Lotsa weird merges: | D1 / \ / \ D2 S1 / \ / \ S2 A D3 B /||\ | /\ / / || | _/ \ / D4 / | | / I /\/ | |/ | | C E F G | | \ | / / | | \ || / | / \||/ / / J / / \ / / \ / / K / \ / Z | The following loop is an example of one with multiple entries: | S __ /|\ / | A | H | | \| | | Q | \ / | D | / \ | B T | | |__| Ensure loop is stitched iff it needs to be: | S1 / \ / S2 __ . / \ / \ . | H | . | | | | S3 | | / \ | | A B | | \ / | | M1 | \ | | \ / | \/ | X1 | / \ | / S4 | / / \ | . . T | . . |__| . . Simple, straight-line, overlapping loops: | / \ H1 _|__ | / | | H2 | | | | | T1 | | /|\__| | / T2 | / /|_____| . . . . . . Loops can be much more convoluted, with multiple entries, multiple tails, and overlapping loops with heads and tails on different paths: | _ S1 _ / \ / \ / \ | H1 H2 | | | | | | \ / | | M | | | | | S2 | | / \ | | T1 T2 | \_/| |\_/ | | \ / Z | Another one: ____ | / \| | H1 ______ | | / \ | H2 | | | | | S1 | | / \ | | D1 S2 | |_/ \ / \ | D2 D3 | / \ / \_| | \/ | M1 \ / \ / M2 |