Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.
Sol: Arbitration : 1 cycle 10 ns
Issuing : 1 cycle 10 ns
Controller latency : 60 ns
Transmission : 64 byte / 8 byte = 8 cycles 80 ns
Total time needed to fill a single cache line : 10+10+60+80 = 160 ns
Bandwidth : 1/160 *10^9 * 64 bytes = 400 MB
Q.3.31: Consider finite DRAM bandwidth at a memory controller, as follows. Assume double-data-rate DRAM operating at 100 MHz in a parallel non-interleaved organization, with an 8 byte interface to the DRAM chips. Further assume that each cache line read results in a DRAM row miss, requiring a precharge and RAS cycle, followed by row-hit CAS cycles for each of the double words in the cache line. Assuming memory controller overhead of one cycle (10 ns) to initiate a read operation, and one cycle latency to transfer data from the DRAM data bus to the processor-memory bus, compute the latency for reading one 64 byte cache block. Now compute the peak data bandwidth for the memory interface, ignoring DRAM refresh cycles.
Sol: Memory latency is measured from when the memory controller sees the command to when it places the last doubleword on the processor bus:
Latency = precharge + RAS + overhead + 64Bx(1 xfer/8B)x(1 cycle)/(2 xfer)) = 7 cycles
= 70 ns
Peak data bandwidth = 64 B/70 ns = 0.914 B/ns = 914 million bytes/sec
Dynamic Random Access Memories (DRAM)
Dynamic Random Access Memories (DRAM) are the dominant solid-state memory devices. DRAM is used for primary memories in the advanced microprocessor systems of the present era. In recent years, growth rate for processor frequencies is at a rate of 80% per year, on the other side, DRAM latencies have improved at a rate of 7% per year. This growing gap has been referred to as the “Memory Wall.” DRAM architectures have been going through rapid changes. Those changes lead to reduce the performance impact.Because of DRAM architecture and controller policy there are lots of significant results on the execution of representative benchmarks. A 75% reduction in access latency (128 Byte L2 line) from a PC100 architecture, and a 34% reduction in execution time from a PC100 architecture result from using a cache enhanced DDR2 architecture. More significant results aspects of the DRAM contribute to the increase in performance. Bus utilization, effective cache hit rate, frequency of adjacent accesses mapping into a common bank, controller policy performance, as well as access latency are examined with regard to their improved impact upon execution time. Hence, we can conclude that the DRAM architecture has the capability to improve the performance of the present processors to a very high level.
Next Topic:
Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.
Q.3.34: Assume a single-platter disk drive with an average seek time of 4.5 ms, rotation speed of 7200 rpm, data transfer rate of 10 Mbytes/s per head, and controller overhead and queueing of 1 ms. What is the average access latency for a 4096-byte read?
Q.3.35: Recompute the average access latency for Problem 34 assuming a rotation speed of 15 K rpm, two platters, and an average seek time of 4.0 ms.
SOLUTION
Previous Topic:
Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.
Q.3.34: Assume a single-platter disk drive with an average seek time of 4.5 ms, rotation speed of 7200 rpm, data transfer rate of 10 Mbytes/s per head, and controller overhead and queueing of 1 ms. What is the average access latency for a 4096-byte read?
Q.3.35: Recompute the average access latency for Problem 34 assuming a rotation speed of 15 K rpm, two platters, and an average seek time of 4.0 ms.
SOLUTION
Previous Topic:
Q.3.16: Assume a two-level cache hierarchy with a private level one instruction cache (L1I), a private level one data cache (L1D), and a shared level two data cache (L2). Given local miss rates for the 4% for L1I, 7.5% for L1D, and 35% for L2, compute the global miss rate for the L2 cache.
Q.3.17: Assuming 1 L1I access per instruction and 0.4 data accesses per instruction, compute the misses per instruction for the L1I, L1D, and L2 caches of Problem 16.
Q.3.18: Given the miss rates of Problem 16, and assuming that accesses to the L1I and L1 D caches take one cycle, accesses to the L2 take 12 cycles, accesses to main memory take 75 cycles, and a clock rate of 1GHz, compute the average memory reference latency for this cache hierarchy.
Q.3.19: Assuming a perfect cache CPI (cycles per instruction) for a pipelined processor equal to 1.15 CPI, compute the MCPI and overall CPI for a pipelined processor with the memory hierarchy described in Problem 18 and the miss rates and access rates specified in Problem 16 and Problem 17.
No comments:
Post a Comment