Q.5.7 through Problem 5.13: Consider the following code segment within a loop body for problems 5:
if (x is even) then (branch b1)
increment a (b1 taken)
if (x is a multiple of 10) then (branch b2)
increment b (b2 taken)
Assume that the following list of 9 values of x is to be processed by 9 iterations of
this loop.
Note: assume that predictor entries are updated by each dynamic branch before
the next dynamic branch accesses the predictor (i.e., there is no update delay).
Q.5.7: Assume that an one-bit (history bit) state machine (see above) is used as the prediction algorithm for predicting the execution of the two branches in this loop. Indicate the predicted and actual branch directions of the b1 and b2 branch instructions for each iteration of this loop. Assume initial state of 0, i.e., NT, for the predictor.
Sol:
8 9 10 11 12 20 29 30 31
b1 predicted: __N____T____ N____ T____ N____ T____ T____ N____ T__
b1 actual : __T____N____T____ N____ T____ T____ N____ T____ N__
b2 predicted: __N____N____N____ T____ N____ N____ T____ N____ T__
b2 actual : __N____ N____T____ N____ N____ T____ N____ T____ N__
Q.5.8: What are the prediction accuracies for b1 and b2?
Sol:
Q.5.9: What is the overall prediction accuracy?
Sol:
8 9 10 11 12 20 29 30 31
For g=0
b1 predicted: __ N____ T____N___ ___ __ T____ T___ __ __ T ______
b1 actual : __ T____ N____T____ N____ T____ T____ N____ T____ N__
b2 predicted: ____ __ N______ __ N____ __ ___ _ __ N____ __ __ N__
b2 actual : __ N____ N____T____ N____ N____ T____ N____ T____ N__
For g=1
b1 predicted: ____ ____ ____ __ N______ _ ___ __ N___ ___ __ N__
b1 actual : __ T____N____ T____ N____ T____ T____ N____ T____ N__
b2 predicted: __ N______ __ N______ __ T____ N____ __ __ T____ __
b2 actual : __ N____N____ T____ N____N____ T____ N____ T____ N__
Q.5.11: What are the prediction accuracies for b1 and b2?
Sol:
Q.5.12: What is the overall prediction accuracy?
Sol:
Q.5.13: What is the prediction accuracy of b2 when g = 0? Explain why.
if (x is even) then (branch b1)
increment a (b1 taken)
if (x is a multiple of 10) then (branch b2)
increment b (b2 taken)
Assume that the following list of 9 values of x is to be processed by 9 iterations of
this loop.
8, 9, 10, 11, 12, 20, 29, 30, 31
the next dynamic branch accesses the predictor (i.e., there is no update delay).
Q.5.7: Assume that an one-bit (history bit) state machine (see above) is used as the prediction algorithm for predicting the execution of the two branches in this loop. Indicate the predicted and actual branch directions of the b1 and b2 branch instructions for each iteration of this loop. Assume initial state of 0, i.e., NT, for the predictor.
Sol:
8 9 10 11 12 20 29 30 31
b1 predicted: __N____T____ N____ T____ N____ T____ T____ N____ T__
b1 actual : __T____N____T____ N____ T____ T____ N____ T____ N__
b2 predicted: __N____N____N____ T____ N____ N____ T____ N____ T__
b2 actual : __N____ N____T____ N____ N____ T____ N____ T____ N__
Q.5.8: What are the prediction accuracies for b1 and b2?
Sol:
b1: 1/9 = 11%
b2: 3/9 = 33%
Sol:
Overall: 4/18 = 22%
Q.5.10: Assume a two-level branch prediction scheme is used. In addition to the one-bit predictor, a one bit global register (g) is used. Register g stores the direction of the last branch executed (which may not be the same branch as the branch currently being predicted) and is used to index into two separate one-bit branch history tables (BHTs) as shown below. Depending on the value of g, one of the two BHTs is selected and used to do the normal one-bit prediction. Again, fill in the predicted and actual branch directions of b1 and b2 for nine iterations of the loop. Assume the initial value of g = 0, i.e., NT. For each prediction, depending on the current value of g, only one of the two BHTs is accessed and updated. Hence, some of the entries below should be empty.
Note: assume that predictor entries are updated by each dynamic branch before the next dynamic branch accesses the predictor (i.e. there is no update delay).
Sol:
8 9 10 11 12 20 29 30 31
For g=0
b1 predicted: __ N____ T____N___ ___ __ T____ T___ __ __ T ______
b1 actual : __ T____ N____T____ N____ T____ T____ N____ T____ N__
b2 predicted: ____ __ N______ __ N____ __ ___ _ __ N____ __ __ N__
b2 actual : __ N____ N____T____ N____ N____ T____ N____ T____ N__
For g=1
b1 predicted: ____ ____ ____ __ N______ _ ___ __ N___ ___ __ N__
b1 actual : __ T____N____ T____ N____ T____ T____ N____ T____ N__
b2 predicted: __ N______ __ N______ __ T____ N____ __ __ T____ __
b2 actual : __ N____N____ T____ N____N____ T____ N____ T____ N__
Q.5.11: What are the prediction accuracies for b1 and b2?
Sol:
b1 : 6/9 = 67%
b2 : 6/9 = 67%
Sol:
Overall: 67%
Q.5.13: What is the prediction accuracy of b2 when g = 0? Explain why.
Sol: 100%. Whenever b1 is not taken (i.e. g=0), the number being checked is odd (not even). It follows that the number is also not evenly divisible by ten. Hence, in these cases, b2 is always not taken and the predictor is able to predict b2 with high accuracy in this global context.
Next Topic:
Previous Topic:
Q.5.14: Below is the control flow graph of a simple program. The CFG is annotated with three different execution trace paths. For each execution trace circle which branch predictor (bimodal, local, or Gselect) will best predict the branching behavior of the given trace. More than one predictor may perform equally well on a particular trace. However, you are to use each of the three predictors exactly once in choosing the best predictors for the three traces. Circle your choice for each of the three traces and add. (Assume each trace is executed many times and every node in the CFG is a conditional branch. The branch history register for the local, global, and Gselect predictors is limited to 4 bits.)
Q.5.1: The displayed code that follows steps through the elements of two arrays (A[] and B[]) concurrently, and for each element, it puts the larger of the two values into the corresponding element of a third array (C[]). The three arrays are of length N.
The instruction set used for Problems 5.1 through 5.6 is as follows:
Identify the basic blocks of this benchmark code by listing the static instructions belonging to each basic block in the following table. Number the basic blocks based on the lexical ordering of the code.
Note: There may be more boxes than there are basic blocks.
Q.5.2: Draw the control flow graph for this benchmark.
No comments:
Post a Comment