Restrictions
Apart from the restriction that the register-allocated array only be accessed at run-time constant offsets, there are no other restrictions, for instance, array elements may be freely assigned in the dynamic region. The current BTA is, however, not capable of identifying that values at certain indices are run-time constant, but we may be able to extend it for that purpose if that appears profitable. To be able to access the correct values of array elements after the dynamic region, code needs to be generated at the end of a region that stores the values of register-allocated array elements back to the array. For local arrays a simple use-analysis can determine if this is necessary, a conservative default is to always write back the values from the registers to memory at the end of a region.
Generalization to C-Structs
The algorithm for register actions to allocate array variables for DyC can be generalized to C-structs. Instead of indices the field names could be used to specify what parts of the struct should be allocated to registers, and a small modification of the algorithm could compute registers actions for structs as well (cf. Section on computation of register actions).
.
e.g., dyc_allocate(reg,1,3), to specify
that reg[1] .. reg[3] are eligible for register
allocation through register actions.
An alternative or additional annotation that could be used, could specify
a function called at specialization time to decide which array elements
are to be candidates for register allocation: dyc_allocate(VAR,
fn) where fn: Int ®
Bool a function that is called at specialization
time.
E.g., dyc_allocate(reg, f) reg[i] is a
candidate for register allocation iff f(i) = true.
In an interpreter, for instance, the user-supplied function could check
the bytecodes to determine which registers are used in the interpreted
program or it could be an interface function that consults profiling data
of previous runs to determine the most frequently accessed array offsets,
etc. Multiple dyc_allocate annotations for
the same array variable can be used to specify non-consecutive array indices
as candidates for register allocation.
3. Computation of Register Actions
The actions can be computed by essentially one modification to Wall's algorithm [WALL86]. Wall's algorithm computes the actions for each basic block. Each instruction in the basic block is either
1.REMOVE, OP, and RESULT actions
r1 := load y REMOVE.y
r2 := load x REMOVE.x
r3 := r1 + r2 OP1.y OP2.z RESULT.x
store x := r3 REMOVE,x
REMOVE.v specifies that the load/store can be removed if variable v is allocated to a register, OPi.v specifies that operand i is replaced by the register to which variable v is allocated (if it is), and RESULT.v says that the result register needs to be replaced by the register to which variable v is alloated.
2.Load Actions
If a variable is evaluated (loaded), then changed and later the original value is used, a remove action cannot be generated. For instance, for the C-statement x = y++ + z the following code would be produced:
r1 := load y LOAD.y
r2 := r1 + 1 RESULT.y
store y := r2 REMOVE.y
r2 := load z REMOVE.z
r3 := r1 + r2 OP2.z RESULT.x
store x := r3 REMOVE.x
The LOAD action LOAD.y says that the instruction says that the load will be changed to a register move from the register y's allocated to. Since the loaded value of y is used after it is changed, the load cannot be removed since the original value of y has to be available for the add operation. Similarly, the first operand of the add operation cannot be annotated OP1.y since the original value of y (preserved in register r1) must be used. Load operations whose current value is used after the variable is changed, are called time-critical by Wall. Instead of a remove action, a load action is generated for them. In addition, operands that use the loaded value in operations are not annotated with an OP-action.
3.STORE Actions
If a computed value is used as an operand in more than one command, using the RESULT action becomes impossible. For instance, for the C-statement x = y = a + b the following code would be produced:
r1 := load a REMOVE.a
r2 := load b REMOVE.b
r3 := r1 + r2 OP1.a OP2.b
store y:= r3 STORE.y
store x := r3 STORE.x
So, instead of using a RESULT action, for each use of the computed value in an assignment to a variable, a STORE action is produced that will be replaced by a register move in case y or x, respectively, are allocated to registers.
One degenerate case arises as well: naively generating register actions for x = y might produce:
r1 := load y REMOVE.y
store x := r1 OP1.y REMOVE.x
in which case both commands would be removed if both x and y were allocated to registers. Instead,
r1 := load y REMOVE.y
store x := r1 OP1.y STORE.x
has to be generated. The rule here is, if the operand of a store is a leaf command (i.e., a load) then generate a STORE action.
Computing Time-Criticality
Time-critical leaves are identified in a backwards pass over the commands of a basic block, which identifies where the values of commands are used. If it turns out that a leaf is used after its variable is assigned, it has to be marked as time critical. Here's an example:
r1 := load y #1 leaf y
r2 := r1 + 1 #2 operate #1 + 1
store y := r2 #3 assign y := #2
r2 := load z #4 leaf z
r3 := r1 + r2 #5 operate #1 + #4
store x := r3 #6 assign x := #5
The first column shows the original code for x = y++ + z, the second column a three-address representation where loads are designated as leaf, stores as assign, and operations producing values in registers as operate. Leaf command #1 is used as an operand in command #5 and the destination of the assignment in command #3 is the same as variable loaded by it, leaf #1 is marked as time-critical.
To decide if an assign command is critical, Wall's algorithm checks whether the destination of the assign (command #3 in the example) is a live leaf, that is a leaf that is used after the assign (in the example leaf #1 is used in command #5 after command #3). Since we allocate parts of arrays whose indices we don't know until specialization time, we don't know until then if the destination of an assign is equal to a particular leaf, for instance if reg[rt] refers to the same location as reg[rs]. Therefore, instead of computing for each leaf it is time-critical or noe, we generate a condition marker for each leaf that is initialized to false. Whenever the algorithm checks for potential criticality we OR the condition that determines if the leaf will be critical. For instance, if the load is from offset n and the store is to offset m, the condition that is added to the leaf is ` OR(n == m). The back quote is used here to indicate that the condition is typically not evaluated at compile time when the action is pre-computed since n and m are typically run-time and not compile-time constants. At specialization time the disjunction constructed for each leaf is evaluated to decide if the leaf is in fact time critical. Loads of and stores to time critical leaves will not be removed, instead they will be replaced by register moves. So here's Wall's modified algorithm:
For each leaf LEAF do:
initialize LEAF.TimeCritical := false
Register action types:
The conditions that determine which action is generated for a particular command, are similar to Wall's:
An Example
Here's the code we generate for our interpreter branch that does the multiply:
ldl $27,interpreter.rtconst_5_1($24)
ldl $25,interpreter.rtconst_5_3($24)
mull $27,$25,$25
stl $25,interpreter.rtconst_5_5($24)
at the BTA stage this looks as follows:
536 <S> SHIFT.ASHL<>t474
@<_rt$1.2>t457 2(I64)
537 <D> ALD<>t349 @<>t474<>t0
_reg$1.2
538 <S> SHIFT.ASHL<>t475
@<_rs$1.2>t458 2(I64)
602 <D> ALD<>t351 @<>t475<>t0
_reg$1.2
539 <D> GIMUL<>t352 <>t349<>t351]
540 <S> SHIFT.ASHL<>t476
@<_rd$1.2>t456 2(I64)
541 <D> AST<>t352 @<>t476<>t0
_reg$1.2
Computing the leaf information for the dynamic (<D>) operations gives:
#1 leaf reg[t474]
#2 leaf reg[t475]
#3 oper #1 * #2
#4 assign reg[t476] = #3
where t474, t475, and t476 are all run-time constants.
None of the leaves has a time critical condition, so for all of them timeCritical = `false.
So the following actions would be generated:
TC_LOAD(reg, t474, false)
TC_LOAD(reg, t475, false)
RESULT(reg, t476) OP1(reg, t474,
false), OP2(reg, t475, false)
REMOVE(reg, t476)
At specialization time we check the values of the run-time constant offsets and can decide if we rewrite the code. Assuming, for instance, that t474=1, t475=2, t476=3 and that these offsets are allocated to registers r1, r2, and r3, respectively, the dynamic code would be rewritten to:
mull r1, r2, r3
getting rid of the loads and the store.
References
[WALL86] David D. Wall: Global Register Allocation at Link Time. In SIGPLAN 86 Symposium on Compiler Construction, Palo Alto, California, 1986, pp. 264-275. Also as Technical Report URL: ftp://gatekeeper.dec.com/pub/DEC/WRL/research-reports/WRL-TR-86.3.ps