Matthai; April 15, 1996 On Craig's advice, I put together the following example of what we need to do, to get the lazy stitching approach to work. I've tried to include the issues brought up in last Monday's meeting. One difference from Monday's plan is that we're back to having template code merged with setup/stitch code, so that we don't keep jumping between merged and stitched code at every dynamic branch. Here's the original piece of code in C(the si's are runtime constants, the di's are non-rtconsts; also I'm assuming everything's in registers for now): ... s1:= s2+s3 BBLOCK 1 d1:= d0+s1 if(d1!=0) s4:= s1+s2 BBLOCK 2 d2:= d1+s4 else s4:= s1+s3 BBLOCK 3 d2:= d1+s4 s5:=s4-1 BBLOCK 4 d3:=d2+s5 ... The template assembler we expect from this eg is: T1: t1:= t0 + T2: beq t1, L1 T3: t2:= t1 + L2:t3:= t2 + ... L1: t2:= t1 + T4: br L2 The merged setup/stitcher code would look something like (only the merged code for BBLOCK 1 given below); note that the Ti below refer to the labels to the left of the template assembler above: s1:= s2 + s3 stitch T1 d1:= d0 + s1 if(d1!=0) stitch T2, pointing to the fixup code at D1.T (see below) if bblock d1.T is stitched jump to stitched code after remapping** registers (remapping == all di registers are moved to ti) else record the location to which d1.T will be stitched else stitch T2, pointing to the fixup code at D1.F (see below) if bblock d1.F is stitched jump to stitched code remapping registers else record the location to which d1.T will be stitched beq d1, L1 ...The rest of the setup is similar... [The above code brings to light at least one datastructure shared between the stitcher fragments in the merged code, a table mapping each bblock to the location it is stitched.] The (statically produced) fixup code at D1.T would look like (D1.F is similar): D1.T: ;;;remap registers live @ BBLOCK 2 entrance mv d1,t1 ;;mv t1 into d1 mv d2,t2 load s1 load s2 ;;;patch the branch to fixup ;;;(use unused register/save and restore a live one) stl ... ;;;jump to merged code br BBLOCK2 Now we need to address the question of what happens when we have spills to stack, and not enough registers. There are at least two problems that can crop up: 1)The remapping above is done so that the template code at the end of the merged block we're jumping to has the right stuff in its registers. However, the setup or stitcher code that executes before the template code may clobber these registers (note when we move from merged to stitched code, we don't have this problem). 2)The stack frames for the stitched code and the merged codes may not be aligned (i.e. spilled template code variable may not be located in the same place on the stack frame in the two cases). The first problems shows up essentially because the compiler generated the code for the merged version and the template without taking into account the possible jumps between the two sides at stitch time. One solution (Craig was pushing this one at the meeting) to these problems is to produce merged, fixup and template code at the IR level, and show explicitly to the compiler all the extra branches between merged code, fixup code and template code. We then have to constrain the scheduler/register allocator to register allocate/schedule the template code exactly as it did the first time around; all compensating code (spills/restores) should appear in the fixup fragments (this because the stitcher code in the merged version already assumes a certain layout for the template... all compensating code should work with that layout). If we do this right, the second problem should go away, since both pieces of code should use the same stack frame. This is about as far as I got. Please get back to me to point out points that I've left out, flaws with the above system, etc. If the system works along the lines outlined above, even the cost of the lazy approach may not be as prohibitive as we feared (mainly because remapping live variables is cheaper than doing context switches, especially if we set things up so minimum remapping is needed). Matthai ps: I have not thought about how the above scheme interacts with dynamic loop unrolling... anyone see obvious problems that crop up? ******************************************************************************** ******************************************************************************** ******************************************************************************** ----------An example with the results of the discussions: Matthai; Mar. 8, 1996 Here's an example of a multi-way unrolled loop and the merged setup/stitcher code. The new design addresses many of the issues that came up in the last whole group meeting: i)It includes temporaries used by the stitcher code (cur_stitch_loc_temp below) ii)What is the "context" on which we decide whether to re-stitch, and how often do we have to recalc it? The context at a program point is every keep_const variable reaching that program point. It needs to be recalculated at every merge reached by keep_const variables. iii)How do we restore the runtime constants when in the fixup code ? --There were some suggestions of building the fixup code (that built up the rtcs) dynamically, but this method seems expensive (building a 64-bit value can take a lot of instructions, and we have to store each of these instructions into the new fixup-code we're building, so we don't save on storing overhead), and it's not clear how we can cheaply avoid clobbering the values that were in the registers used for the build. --The method suggested below basically uses the fact that the runtime constants that need to be restored in the fixup for a branch Bij are those that are live across the branch. We store these values in a table indexed by the current context when the other side of the branch is taken (and we stitch the branch to fixup code), and stitch an instruction just before the branch that builds the current context into a scratch register. The fixup code then just needs to load off the table (using the context from the scratch register) to restore runtime constant context. iv)How does the code that stitches in the branch to fixup know where the fixup code is? --We keep around fixup code locations in a fixup_table (see below) keep_const(s0 cache,s2) lazy while(s0 != 0){ s1:= s0+s2 BBLOCK 1 d1:= d0+s1 if(d1!=0) s0:= s0+s2 BBLOCK 2 d2:= d1+s1 else s0:= s0-s2 BBLOCK 3 d2:= d1+s2 s5:=s1-1 BBLOCK 4 d3:=d2+s5 } MERGED_BBLOCK1: /*Load the location of the current location being stitched from the global*/ cur_stitch_loc_temp := cur_stitch_loc /*Need to recalculate context at every merge reached by keep const variables*/ current_context = find_context(s0) /*save runtime-constants that may need to be restored*/ rtconst_table[current_context][0]:=s0 rtconst_table[current_context][2]:=s2 s1:= s0 + s2 stitch(r1 := r0 + s1) d1:= d0 + s1 beq d1,MERGED_BBLOCK3 MERGED_BBLOCK2: /*Stitch a branch to the entry 0 in the fixup table*/ stitch(build scratch_reg, current_context) /*Note that we do not have to check if the other side of the branch (BBLOCK3) has been stitched because the current bblock (BBLOCK 2) has a common dominating basic block with BBLOCK 3*/ stitch(beq d1, fixup_table,0) /*save any rtcs FIXUP_13 needs to restore; need to save all rtcs live across br13*/ rtconst_table[current_context][0]:=s0 rtconst_table[current_context][1]:=s1 rtconst_table[current_context][2]:=s2 /*Note that we do not have to check if the current basic block has already been stitched because its predecessor, which dominates it, should have done this check*/ record_stitched_loc( current_context,BBLOCK_2,cur_stitch_loc_temp) s0 := s0 + s2; stitch(r2 := r1 + s1); d2 := d1 + s1 goto MERGED_BBLOCK4 MERGED_BBLOCK3: stitch(build scratch_reg, current_context) stitch(bne d1, fixup_table,1) rtconst_table[current_context][0]:=s0 rtconst_table[current_context][1]:=s1 rtconst_table[current_context][2]:=s2 record_stitched_loc( current_context,BBLOCK_3,cur_stitch_loc_temp) s0 := s0 - s2; stitch(r2 := r1 + s2); d2 := d1 + s2 goto MERGED_BBLOCK4 MERGED_BBLOCK4: /*context-changing merge: need to recalculate context*/ current_context = find_context(s0) /*merge: need to check if bblock has already been stitched*/ if (stitched_loc:=already_stitched(BBLOCK_4,current_context)) /*Communicate context to fixup code*/ stitch(build scratch_reg, current_context) /*remap template live on entry too BBLOCK 4*/ r2 := d2 r0:= d0 /*jump to stitched version*/ (*stitched_loc)() else /*record where the bblock will be stitched*/ record_already_stitched( current_context,BBLOCK_2,cur_stitch_loc_temp, current_stitch_loc) s5:=s1-1 stitch(r5:=r1-1) d3:=d2+s5 bne s0, MERGED_BBLOCK1 ;;;fixup code emitted statically... FIXUP_13: ;;;restore temps used in the template ;;;portion of the merged code d1:= r1 ;;;restore temps used in setup portion c_context:= scratch s1:= rtconst_table[c_context][1] ;;;restore temps used in the stitcher part current_stitch_loc_temp:=current_stitch_loc ;;;patch the branch to fixup stitch(beqfixup_branch_loc_table,context,BBLOCK_3) goto MERGED_BBLOCK3 ;;;fixup code emitted statically... FIXUP_12: ;;;restore temps used in the template ;;;portion of the merged code d1:= r1 ;;;restore temps used in setup portion c_context:= scratch s0:= rtconst_table[c_context][0] s1:= rtconst_table[c_context][1] s2:= rtconst_table[c_context][2] ;;;restore temps used in the stitcher part current_stitch_loc_temp:=current_stitch_loc ;;;patch the branch to fixup stitch(beqfixup_branch_loc_table,context,BBLOCK_3) goto MERGED_BBLOCK2 ;;;somewhere in the data segment.... .globl fixup_table fixup_table: .long FIXUP_13 .long FIXUP_12 ... ****************************************************************************** ****************************************************************************** ****************************************************************************** From: matthai@peony Date: Wed, 17 Jul 1996 08:40:01 -0700 Apparently-To: grant@peony Original C source: ------------------ keep_const(s0 cache,s2) lazy <--(how does lazy differ from looplazy?) while(s0 != 0){ s1:= s0+s2 BBLOCK 1 d1:= d0+s1 if(d1!=0) s0:= s0+s2 BBLOCK 2 d2:= d1+s1 else s0:= s0-s2 BBLOCK 3 d2:= d1+s2 s5:=s1-1 BBLOCK 4 d3:=d2+s5 } Merged setup/stitcher/template: ------------------------------- MERGED_BBLOCK0: /*Load the location of the current location being stitched from the global*/ cur_stitch_loc_temp := cur_stitch_loc /*remapping template code*/ d1:=rw d2:=rx /*remapping rtconsts*/ s0:=ry s2:=rz MERGED_BBLOCK1: /*Need to recalculate context at every merge reached by keep const variables*/ current_context = find_context(s0) /*merge: need to check if bblock has already been stitched*/ if (stitched_loc:=already_stitched(BBLOCK_1,current_context)) /*remap template live on entry too BBLOCK 1*/ r0:= d0 /*jump to stitched version*/ (*stitched_loc)() else /*record where the bblock will be stitched*/ record_already_stitched( current_context,BBLOCK_1,cur_stitch_loc_temp, current_stitch_loc) s1:= s0 + s2 stitch(r1 := r0 + s1) d1:= d0 + s1 beq d1,MERGED_BBLOCK3 MERGED_BBLOCK2: /*Stitch a branch to the entry 0 in the fixup table*/ stitch(build scratch_reg, current_context) /*Note that we do not have to check if the other side of the branch (BBLOCK3) has been stitched because the current bblock (BBLOCK 2) has a common dominating basic block with BBLOCK 3*/ stitch(beq d1, fixup_table,0) /*save any rtcs FIXUP_13 needs to restore; need to save all rtcs live across br13*/ rtconst_table[current_context][0]:=s0 rtconst_table[current_context][1]:=s1 rtconst_table[current_context][2]:=s2 /*Note that we do not have to check if the current basic block has already been stitched because its predecessor, which dominates it, should have done this check*/ record_stitched_loc( current_context,BBLOCK_2,cur_stitch_loc_temp) s0 := s0 + s2; stitch(r2 := r1 + s1); d2 := d1 + s1 goto MERGED_BBLOCK4 MERGED_BBLOCK3: stitch(build scratch_reg, current_context) stitch(bne d1, fixup_table,1) rtconst_table[current_context][0]:=s0 rtconst_table[current_context][1]:=s1 rtconst_table[current_context][2]:=s2 record_stitched_loc( current_context,BBLOCK_3,cur_stitch_loc_temp) s0 := s0 - s2; stitch(r2 := r1 + s2); d2 := d1 + s2 goto MERGED_BBLOCK4 MERGED_BBLOCK4: /*context-changing merge: need to recalculate context*/ current_context = find_context(s0) /*merge: need to check if bblock has already been stitched*/ if (stitched_loc:=already_stitched(BBLOCK_4,current_context)) /*remap template live on entry too BBLOCK 4*/ r2 := d2 r0:= d0 /*jump to stitched version*/ (*stitched_loc)() else /*record where the bblock will be stitched*/ record_already_stitched( current_context,BBLOCK_4,cur_stitch_loc_temp, current_stitch_loc) s5:=s1-1 stitch(r5:=r1-1) d3:=d2+s5 bne s0, MERGED_BBLOCK1 ;;;fixup code emitted statically... FIXUP_13: ;;;restore temps used in the template ;;;portion of the merged code d1:= r1 ;;;restore temps used in setup portion c_context:= scratch s1:= rtconst_table[c_context][1] ;;;restore temps used in the stitcher part current_stitch_loc_temp:=current_stitch_loc ;;;patch the branch to fixup stitch(fixup_branch_loc_table,context,BBLOCK_3) goto MERGED_BBLOCK3 ;;;fixup code emitted statically... FIXUP_12: ;;;restore temps used in the template ;;;portion of the merged code d1:= r1 ;;;restore temps used in setup portion c_context:= scratch s0:= rtconst_table[c_context][0] s1:= rtconst_table[c_context][1] s2:= rtconst_table[c_context][2] ;;;restore temps used in the stitcher part current_stitch_loc_temp:=current_stitch_loc ;;;patch the branch to fixup stitch(fixup_branch_loc_table,context,BBLOCK_2) goto MERGED_BBLOCK2 ;;;somewhere in the data segment.... .globl fixup_table fixup_table: .long FIXUP_13 .long FIXUP_12 ... ------------- How to make the merged code lazy: 0)At every entry point (note: fixup code is an entry point) into the merged section, i)Load from the current_stitch_loc global into temp ii)Map from registers to temps, for every temp live downstream of the entry point tempi := ri (may also need to map from stack locations to temps) first entry point is different from fixup because in fixup, runtime constants have to be loaded from table. 1)At every merge where two possibly different values of a keepconst variable meet: i)recompute context ii)check if the bblock after the merge has been stitched in the context: --if so, map temps to registers and jump --else, record where the is being stitched 2)At every dynamic branch target: i)stitch in a build of current context into scratch reg ii)stitch in a branch to fixup leading to other side of branch iii)save all rtconsts live on entry to other side of branch iv)record where the is being stitched.