Wednesday, November 13, 2013

Ex 2.8, 2.9 & 2.15 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti :

Q.2.8: Consider adding a store instruction with indexed addressing mode to the TYP pipeline. This store differs from the existing store with register+immediate addressing mode by computing its effective address as the sum of two source registers, that is, stx r3, r4, r5 performs r3<-MEM[r4+r5]. Describe the additional pipeline resources needed to support such an instruction in the TYP pipeline. Discuss the advantages and disadvantages of such an instruction. 

Solution: This instruction must read three operands from the register file. Hence a third read port must be added, along with hazard detection logic and bypass networks  for this third operand. Alternatively, an underpipelined implementation that stalls the pipeline for this instruction can be used. This implementation will never outperform an “add, st” pair where a separate add performs the indexed addressing mode computation.




Q.2.9: Consider adding a load-update instruction with register+immediate and postupdate addressing mode. In this addressing mode, the effective address for the load is computed as register+immediate, and the resulting address is written back into the base register. That is, lwu r3,8(r4) performs r3<-MEM[r4+8]; r4<r4+8. Describe the additional pipeline resources needed to support such an instruction in the TYP pipeline. 


Solution: This instruction performs two register writes. It can either be under pipelined, by forcing the second write to stall the pipeline, or, to maintain a fully pipelined implementation,a second write port must be added to the register file. In addition, the hazard detection and bypass network must be augmented to handle this special case of a second register write.




Q.2.15: The IBM study of pipelined processor performance assumed an instruction mix based on popular C programs in use in the 1980s. Since then, object oriented languages like C++ and Java have become much more common. One of the effects of these languages is that object inheritance and polymorphism can be used to replace conditional branches with virtual function calls. Given the IBM instruction mix and CPI shown in the following table, perform the following transformations to reflect the use of C++/Java, and recompute the overall CPI and speedup or slowdown due to this change:
• Replace 50% of taken conditional branches with a load instruction followed by a jump register instruction
(the load and jump register implement a virtual function call).
• Replace 25% of not-taken branches with a load instruction followed by a jump register instruction.


Solution: The table drawn below shows, how the CPI is slowing down because of the changes mentioned in the question.




Next Topic
Q.2.16:  In a TYP-based pipeline design with a data cache, load instructions check the tag array for a cache hit in parallel with accessing the data array to read the corresponding memory location. Pipelining stores to such a cache is more difficult, since the processor must check the tag first, before it overwrites the data array. Otherwise, in the case of a cache miss, the wrong memory location may be overwritten by the store. Design a solution to this problem that does not require sending the store down the pipe twice, or stalling the pipe for every store instruction. Referring to Figure 2-15, are there any new RAW, WAR, and/or WAW memory hazards? 
Q.2.17: The MIPS pipeline shown in Table 2-7 employs a two-phase clocking scheme that makes efficient use of a shared TLB, since instruction fetch accesses the TLB in phase one and data fetch accesses in phase two. However, when resolving a conditional branch, both the branch target address and the branch fall-through address need to be translated during phase one--in parallel with the branch condition check in phase one of the ALU stage--to enable instruction fetch from either the target or the fall-through during phase two. This seems to imply a dual-ported TLB. Suggest an architected solution to this
problem that avoids dual-porting the TLB.

Previous Topic
Q.2.4: Consider that you would like to add a load-immediate instruction to the TYP instruction set and pipeline. This instruction extracts a 16-bit immediate value from the instruction word, sign-extends the immediate value to 32 bits, and stores the result in the destination register specified in the instruction word. Since the extraction and sign-extension can be accomplished without the ALU, your colleague suggests that such instructions be able to write their results into the register in the decode (ID) stage. Using the hazard detection algorithm described in Figure 2-15, identify what additional hazards such a change might introduce.
Q.2.5: Ignoring pipeline interlock hardware (discussed in Problem 6), what additional pipeline resources does the change outline in Problem 4 require? Discuss these resources and their cost.
Q.2.6: Considering the change outlined in Problem 4, redraw the pipeline interlock hardware shown in Figure 2-18 to correctly handle the load-immediate instructions.


SOLUTION

No comments:

Post a Comment