Monday, November 25, 2013





Q.5.19 and 5.20: Register Renaming
Given the DAXPY kernel shown in Figure 5.31 and the IBM RS/6000 (RIOS-I) floating-point load renaming scheme also discussed in class (both are shown in the following figure), simulate the execution of two iterations of the DAXPY loop and show the state of the floating-point map table, the pending target return queue, and the free list.
• Assume the initial state shown in the table for Problem 5.19.
• Note the table only contains columns for the registers that are referenced in the DAXPY loop.
• As in the RS/6000 implementation discussed, assume only a single load instruction is renamed per cycle and that only a single floating-point instruction can complete per cycle.
• Only floating-point load, multiply, and add instructions are shown in the table, since only these are relevant to the renaming scheme.
• Remember that only load destination registers are renamed.
• The first load from the loop prologue is filled in for you.



Q.5.19:  Fill in the remaining rows in the following table with the map table state and pending target return queue state after the instruction is renamed, and the free list state after the instruction completes.

Sol: 






Register Renaming

     In pipelining, Register renaming is a form of task that deals with data dependences between instructions by renaming their register operands. An assembly language programmer or a compiler specifies these operands using architectural registers - the registers that are explicit in the instruction set architecture. Renaming replaces architectural register names by, in effect, value names, with a new value name for each instruction destination operand. In a register machine, programs are composed of instructions which operate on values. The instructions must name these values in order to distinguish them from one another. A typical instruction might say, add X and Y and put the result in Z. In this instruction, X, Y, and Z are the names of storage locations. This eliminates the name dependences (output dependences and antidependences) between instructions and automatically recognizes true dependences.
      The recognition of true data dependences between instructions permits a more flexible life cycle for instructions. By maintaining a status bit for each value indicating whether or not it has been computed yet, it allows the execution phase of two instruction operations to be performed out of order when there are no true data dependences between them. This is called out-of-order execution. 
      We can understand data dependences by being more explicit about the action specified by an instruction. The action specified by an instruction is clearer if we describe instructions in terms of values rather than registers. We need to name the values in a way that captures changes in register contents over time.
We can capture the intent of a sequence of instructions by replacing register names in all operands with value names. To do this, we use a table that records the value names assigned to each register name. Then we apply the following algorithm.
  1. Replace each source operand by the most recent value name in the designated register column.
  2. Replace the destination operand by a new name and place the new name in the designated register column.
       It is important that step 1 is done first. Otherwise, when the same register is used both as a source operand and a destination operand we are indicating that the instruction execution phase cannot begin until its result is ready. This makes it impossible for the execution phase to begin.

Benefits of Renaming

       The instructions with value names capture the intent of a sequence of instructions by specifying relationships between register values, rather than just registers. This simplifies determining when the execution of an instruction can begin. We only need to check if its source operand values have been computed. The name dependences no longer complicate the picture.
         To determine when source operands have been computed, the value registers contain a status bit in addition to a data value. The status bit is initialized to 0 (not ready) when the value register is allocated for an instruction. It is set to 1 when a functional unit writes a result.



Next Topic:
Q.5.21: Simulate the execution of the following code snippet using Tomasulo’s algorithm. Show the contents of the reservation station entries, register file busy, tag (the tag is the RS ID number), and data fields for each cycle (make a copy of the table below for each cycle that you simulate). Indicate which instruction is executing in  each functional unit in each cycle. Also indicate any result forwarding across a common data bus by circling the producer and consumer and connecting them with an arrow.
i: R4  <-  R0  +  R8
j: R2  <-  R0  *  R4
k: R4  <-  R4  +  R8
l: R8   <-  R4  *  R2
Assume dual dispatch and dual CDB (common data bus). Add latency is two cycles, and multiply latency is 3 cycles. An instruction can begin execution in the same cycle that it is dispatched, assuming all dependencies are satisfied. 
Q.5.22: Determine whether or not the code executes at the dataflow limit for Problem 1. Explain why or why not. Show your work.
SOLUTION

Previous Topic:
Q.5.14: Below is the control flow graph of a simple program. The CFG is annotated with three different execution trace paths. For each execution trace circle which branch predictor (bimodal, local, or Gselect) will best predict the branching behavior of the given trace. More than one predictor may perform equally well on a particular trace. However, you are to use each of the three predictors exactly once in choosing the best predictors for the three traces. Circle your choice for each of the three traces and add. (Assume each trace is executed many times and every node in the CFG is a conditional branch. The branch history register for the local, global, and Gselect predictors is limited to 4 bits.)  


No comments:

Post a Comment