target IES: Ex 4.8 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.4.8: In an in-order pipelined processor, pipeline latches are used to hold result operands from the time an execution unit computes them until they are written back to the register file during the writeback stage. In an out-of-order processor, rename registers are used for the same purpose. Given a four-wide out-of-order processor TYP pipeline, compute the minimum number of rename registers needed to prevent rename register starvation from limiting concurrency. What happens to this number if frequency demands force a designer to add five extra pipeline stages between dispatch and execute, and five more stages between execute and retire/writeback?

Sol: For maximum throughput, each pipeline stage will contain four inflight instructions. Since registers are allocated at decode, and freed at retire, each instruction holds a rename register for a minimum of five cycles. Hence 4x5 = 20 rename registers are needed at minimum, assuming no data dependences or cache misses that would cause instructions to stall. Adding five extra stages would increase the minimum to 40 registers.

Of course, the students should understand that this is a minimum that assumes no data dependences or cache misses. At the same time, it also assumes throughput of 4 IPC. Since few processors achieve 4 IPC on real programs due to data dependences, control dependences, and cache misses, this “minimum” may in fact be sufficient. The only reliable way to determine the right number of rename registers is a sensitivity study using detailed simulation of a range of rename registers to find the knee in the performance curve.

Core Designs for Chip Multiprocessors

The present era in which billion transistor chips is fast approaching, and the emerging trend is to use these transistors to integrate multiple processors onto a single chip. It is found that out-of-order cores provide better absolute performance than in-order cores, both for commercial and scientific workloads. In general, it takes twice as many in-order cores to equal the performance of out-of-order cores. However, when we consider the die area required for each type of core, it is find that in-order cores perform twice as well as out-of-order cores in half the space. Therefore, we believe future CMPs should use in-order cores.

For further detail follow the link: http://pages.cs.wisc.edu/~david/courses/cs838/projects/ahollowa.pdf

Next Topic:

Q.5.1: The displayed code that follows steps through the elements of two arrays (A[] and B[]) concurrently, and for each element, it puts the larger of the two values into the corresponding element of a third array (C[]). The three arrays are of length N.

The instruction set used for Problems 5.1 through 5.6 is as follows:

Identify the basic blocks of this benchmark code by listing the static instructions belonging to each basic block in the following table. Number the basic blocks based on the lexical ordering of the code.

Note: There may be more boxes than there are basic blocks.

Q.5.2: Draw the control flow graph for this benchmark.

SOLUTION

Previous Topic:

Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.

Q.3.34: Assume a single-platter disk drive with an average seek time of 4.5 ms, rotation speed of 7200 rpm, data transfer rate of 10 Mbytes/s per head, and controller overhead and queueing of 1 ms. What is the average access latency for a 4096-byte read?

Q.3.35: Recompute the average access latency for Problem 34 assuming a rotation speed of 15 K rpm, two platters, and an average seek time of 4.0 ms.

SOLUTION

target IES

Thursday, November 21, 2013

Ex 4.8 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Core Designs for Chip Multiprocessors

No comments:

Post a Comment