Friday, November 22, 2013

Ex 5.1 and 5.2 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.5.1: The displayed code that follows steps through the elements of two arrays (A[] and B[]) concurrently, and for each element, it puts the larger of the two values into the corresponding element of a third array (C[]). The three arrays are of length N.

The instruction set used for Problems 5.1 through 5.6 is as follows:


Identify the basic blocks of this benchmark code by listing the static instructions belonging to each basic block in the following table. Number the basic blocks based on the lexical ordering of the code.
Note: There may be more boxes than there are basic blocks.

Sol: 



Q.5.2: Draw the control flow graph for this benchmark.

Sol: 





Instruction Flow Techniques:

      In any processor, instructions and its flow techniques deals with early stages which includes fetching and decoding stages. Hence, instruction flow techniques are always considered first. The throughput of all subsequent stages is always imposed by throughput of the early pipeline stages.  For complementary pipeline processor, the traditional partitioning of a processor into control path and the data path is no longer clear or effective. The early pipeline stages along with the branch execution unit can be viewed as corresponding to the traditional control path whose primary function is to enforce the control flow semantic of a program. The primary goal for all instruction flow techniques is to maximize the supply of instructions of the superscalar pipeline subject to the requirements of the control flow semantics of a program.   
       The control flow semantic of a program are specified in the form of the control flow graph. In the control flow graph, the nodes represent basic blocks and the edges represent the transfer of control flow between basic blocks. The directed edges represent control flow between basic blocks. These edges are induced by conditional branch instructions. The run time execution of a program entails the dynamic traversal of the nodes and edges of its control flow graph. The actual path of traversal is dictated by the branch instructions and their branch conditions which can be dependent on run time data.




Next Topic:
Q.5.7 through Problem 5.13: Consider the following code segment within a loop body for problems 5:
        if (x is even) then                                     (branch b1) 
                     increment a                                 (b1 taken)
        if (x is a multiple of 10) then                     (branch b2) 
                     increment b                                 (b2 taken)
Assume that the following list of 9 values of x is to be processed by 9 iterations of
this loop.  8, 9, 10, 11, 12, 20, 29, 30, 31 
Note: assume that predictor entries are updated by each dynamic branch before 
the next dynamic branch accesses the predictor (i.e., there is no update delay).
Q.5.7: Assume that an one-bit (history bit) state machine (see above) is used as the prediction algorithm for predicting the execution of the two branches in this loop. Indicate the predicted and actual branch directions of the b1 and b2 branch instructions for each iteration of this loop. Assume initial state of 0, i.e., NT, for the predictor. 
Q.5.8: What are the prediction accuracies for b1 and b2?  
Q.5.9: What is the overall prediction accuracy?   
Q.5.10: Assume a two-level branch prediction scheme is used. In addition to the one-bit predictor, a one bit global register (g) is used. Register g stores the direction of the last branch executed (which may not be the same branch as the branch currently being predicted) and is used to index into two separate one-bit branch history tables (BHTs) as shown below. Depending on the value of g, one of the two BHTs is selected and used to do the normal one-bit prediction. Again, fill in the predicted and actual branch directions of b1 and b2 for nine iterations of the loop. Assume the initial value of g = 0, i.e., NT. For each prediction, depending on the current value of g, only one of the two BHTs is accessed and updated. Hence, some of the entries below should be empty. 
Note: assume that predictor entries are updated by each dynamic branch before the next dynamic branch accesses the predictor (i.e. there is no update delay). 
Q.5.11: What are the prediction accuracies for b1 and b2? 
Q.5.12: What is the overall prediction accuracy?  
Q.5.13: What is the prediction accuracy of b2 when g = 0? Explain why. 

Previous Topic:
Q.4.8: In an in-order pipelined processor, pipeline latches are used to hold result operands from the time an execution unit computes them until they are written back to the register file during the writeback stage. In an out-of-order processor, rename registers are used for the same purpose. Given four-wide out of order processor TYPpipeline, compute the minimum number of rename registers needed to prevent rename register starvation from limiting concurrency. What happens to this number if frequency demands force a designer to add five extra pipeline stages between dispatch and execute, and five more stages between execute and retire/writeback?

No comments:

Post a Comment