target IES: Solution Manual : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Exercise 7.10, 7.11 & 7.12

Q.7.10: If the P6 microarchitecture had to support an instruction set that included predication, what effect would that have on the register renaming process?

Sol: Cache Address Cache Write data Predicated instructions complicate renaming, since a false predicate nullifies the register write of an instruction that otherwise writes a register. Hence, until the predicate is known, the renamer does not know whether subsequent instructions should read from the previous or the new definition of a register written by a predicated instruction. Hence, the renamer could stall until the predicate is determined. Or, it could insert a move operation after the predicated op that reads both the old and new definitions of the predicated instruction’s destination register, and then copies one or the other definition to its own (nonarchitected) destination. All subsequent readers will then get renamed to the output of the move operation.

Q.7.11: As described in the text, the P6 microarchitecture splits store operations into a STA and STD pair for handling address generation and data movement. Explain why this makes sense from a microarchitectural implementation perspective.

Sol: Logically, the STA and STD perform two different operations that interact with different control portions of the microarchitecture: the STA uses an AGEN unit to generate the address, and then resides in the MOB to resolve memory dependences against newer loads. The STD simply transfers data from the register file to the store port at commit. Hence, it makes sense to split them. Note that the new Banias (Centrino) designs based on the P6 core no longer split STA/STD, but treat them as a single micro-op. This increases decode bandwidth and reduces ROB and RS occupancy.

Q.7.12: Following up on Problem 7, would there be a performance benefit (measured in instructions per cycle) if stores were not split? Explain why or why not?

Sol: Front-end decode bandwidth would increase, while ROB and RS occupancy would decrease, permitting an effectively larger window. Also, it is possible that commit bandwidth would increase.

target IES

Friday, December 6, 2013

Solution Manual : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Exercise 7.10, 7.11 & 7.12

No comments:

Post a Comment