Tuesday, November 26, 2013

Ex 5.21 and 5.22 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual



Q.5.21: Simulate the execution of the following code snippet using Tomasulo’s algorithm. Show the contents of the reservation station entries, register file busy, tag (the tag is the RS ID number), and data fields for each cycle (make copy of the table below for each cycle that you simulate). Indicate which instruction is executing in  each functional unit in each cycle. Also indicate any result forwarding across a common
data bus by circling the producer and consumer and connecting them with an arrow.
i: R4  <-  R0  +  R8
j: R2  <-  R0  *  R4
k: R4  <-  R4  +  R8
l: R8   <-  R4  *  R2
Assume dual dispatch and dual CDB (common data bus). Add latency is two cycles, and multiply latency is 3 cycles. An instruction can begin execution in the same cycle that it is dispatched, assuming all dependencies are satisfied. 

Sol: 









Q.5.22: Determine whether or not the code executes at the data flow limit for Problem 1. Explain why or why not. Show your work.

Sol: 

       Critical path is 1+3+3=7 cycles
       Prob. 18 executes in 7 cycles.
       Therefore, Tomasulo executes this at the dataflow limit.



Monday, November 25, 2013





Q.5.19 and 5.20: Register Renaming
Given the DAXPY kernel shown in Figure 5.31 and the IBM RS/6000 (RIOS-I) floating-point load renaming scheme also discussed in class (both are shown in the following figure), simulate the execution of two iterations of the DAXPY loop and show the state of the floating-point map table, the pending target return queue, and the free list.
• Assume the initial state shown in the table for Problem 5.19.
• Note the table only contains columns for the registers that are referenced in the DAXPY loop.
• As in the RS/6000 implementation discussed, assume only a single load instruction is renamed per cycle and that only a single floating-point instruction can complete per cycle.
• Only floating-point load, multiply, and add instructions are shown in the table, since only these are relevant to the renaming scheme.
• Remember that only load destination registers are renamed.
• The first load from the loop prologue is filled in for you.



Q.5.19:  Fill in the remaining rows in the following table with the map table state and pending target return queue state after the instruction is renamed, and the free list state after the instruction completes.

Sol: 



try 3 best

Year 1997 : Paper 1


1. If the energy gap of semiconductor is 1.1 eV, then it would be -
a. Opaque to the visible light
b. Transparent to the visible light
c. Transparent to the ultraviolet radiation
d. Opaque to the infrared radiation

2. The skin depth of copper is found to be 66 um at 1 MHz at a certain temperature. At the same temperature and at 2 MHz, the skin depth would be approximately -
a. 47 μm
b. 33 μm
c. 92 μm
d. 122 μm

3. With increasing temperature, the electrical conductivity would
a. Increase in metals as well as in intrinsic semiconductors
b. Increase in metals but decrease in intrinsic semiconductors
c. Decrease in metals but increase in intrinsic semiconductors
d. Decrease in metals as well as in intrinsic semiconductors

4. Which one of the following statements is correct?
a. All electrostrictive materials are piezoelectric, and all piezoelectric materials are electrostrictive
b. Piezoelectric materials are a subset of electrostrictive materials
c. Electrostrictive materials are a subset of piezoelectric materials
d. Piezoelectricity and electrostriction are two totally independent properties of materials

5. Ferrites have
a. Low copper loss
b. Low eddy current loss
c. Low resistively
d. Higher specific gravity compared to iron

Year 1997 : Paper 2


6. If α = 0.995, Ie = 10mA and Icbo = 0.5μA then Iceo will be -
a. 25 μA
b. 100 μA
c. 10.1 μA
d. 10.5 μA

7. The approximate value of input impedance of a common emitter amplifier with emitter resistance R is given by
a. β + ARe
b. (β + 1)Re + hie
c. hie
d. (β + 1)Re

8.The circuit diagram shown in the figure consists of transistors in:

a. Parallel connection
b. Cascode connection
c. Darlington connection
d. Cascade connection

9. If an amplifier with gain of -1000 and feedback of β = -0.1 had again change of 20% due to temperature, the change in gain of the feedback amplifier would be -
a. 10%
b. 5%
c. 0.2%
d. 0.01%

10. In the case of the circuit shown in the figure, Vio = 10 mV dc maximum, the maximum possible output offset voltage Voo caused by the input off set voltage Vio with respect to ground is -

a. 60 mV dc
b. 110 mV dc
c. 130 mV dc
d. 150 mV dc

General Ability Test


11. During whose tenure was the first session of Indian National Congress held?
a. Lord Curzon
b. Lord Dufferin
c. Lord Lytton
d. Lord Ripon

12. Which one of the following is the correct chronological order?
a. Champaran Satyagraha — Moplah Rebellion- Jallianwala Bagh Massacre
b. Champaran Satyagraha — Jallianwala Bagh Massacre — Moplah Rebellion
c. Jallianwala Bagh Massacre Champaran Satyagraha — Moplah Rebellion
d. Jallianwala Bagh Massacre — Moplah Rebellion— Champaran Satyagraha

13. Who founded the India House in England during the Indian freedom struggle?
a. Bhikaji Cama
b. Dadabhai Naoroji
c. Ras Bihari Bose
d. Shyamiji Krishna Verma

14. Which one of the following is the correct chronological order?
a. Nasik Conspiracy — Kokori Conspiracy — Lahore Conspiracy
b. Lahore Conspiracy — Nasik Conspiracy— Kakori Conspiracy
c. Nasik Conspiracy —Lahore Conspiracy- Kakori
d. Lahore Conspiracy—Kakori Conspiracy —Nasik Conspiracy

15. Prime Minister Manmohan Singh attended the BIMST-EC Meeting in Bangkok in July 2004 in his maiden overseas engagement after assuming the office of Prime Minister of India . Which of the following is not a member of BIMST-EC?
a. Bangladesh
b. Maldives
c. Thailand
d. Sri Lanka


Score =

try 2 best

Year 1997 : Paper 1


1. If the energy gap of semiconductor is 1.1 eV, then it would be -
a. Opaque to the visible light
b. Transparent to the visible light
c. Transparent to the ultraviolet radiation
d. Opaque to the infrared radiation

2. The skin depth of copper is found to be 66 um at 1 MHz at a certain temperature. At the same temperature and at 2 MHz, the skin depth would be approximately -
a. 47 μm
b. 33 μm
c. 92 μm
d. 122 μm

3. With increasing temperature, the electrical conductivity would
a. Increase in metals as well as in intrinsic semiconductors
b. Increase in metals but decrease in intrinsic semiconductors
c. Decrease in metals but increase in intrinsic semiconductors
d. Decrease in metals as well as in intrinsic semiconductors

4. Which one of the following statements is correct?
a. All electrostrictive materials are piezoelectric, and all piezoelectric materials are electrostrictive
b. Piezoelectric materials are a subset of electrostrictive materials
c. Electrostrictive materials are a subset of piezoelectric materials
d. Piezoelectricity and electrostriction are two totally independent properties of materials

5. Ferrites have
a. Low copper loss
b. Low eddy current loss
c. Low resistively
d. Higher specific gravity compared to iron

Year 1997 : Paper 2


6. If α = 0.995, Ie = 10mA and Icbo = 0.5μA then Iceo will be -
a. 25 μA
b. 100 μA
c. 10.1 μA
d. 10.5 μA

7. The approximate value of input impedance of a common emitter amplifier with emitter resistance R is given by
a. β + ARe
b. (β + 1)Re + hie
c. hie
d. (β + 1)Re

8.The circuit diagram shown in the figure consists of transistors in:

a. Parallel connection
b. Cascode connection
c. Darlington connection
d. Cascade connection

9. If an amplifier with gain of -1000 and feedback of β = -0.1 had again change of 20% due to temperature, the change in gain of the feedback amplifier would be -
a. 10%
b. 5%
c. 0.2%
d. 0.01%

10. In the case of the circuit shown in the figure, Vio = 10 mV dc maximum, the maximum possible output offset voltage Voo caused by the input off set voltage Vio with respect to ground is -

a. 60 mV dc
b. 110 mV dc
c. 130 mV dc
d. 150 mV dc

General Ability Test


11. During whose tenure was the first session of Indian National Congress held?
a. Lord Curzon
b. Lord Dufferin
c. Lord Lytton
d. Lord Ripon

12. Which one of the following is the correct chronological order?
a. Champaran Satyagraha — Moplah Rebellion- Jallianwala Bagh Massacre
b. Champaran Satyagraha — Jallianwala Bagh Massacre — Moplah Rebellion
c. Jallianwala Bagh Massacre Champaran Satyagraha — Moplah Rebellion
d. Jallianwala Bagh Massacre — Moplah Rebellion— Champaran Satyagraha

13. Who founded the India House in England during the Indian freedom struggle?
a. Bhikaji Cama
b. Dadabhai Naoroji
c. Ras Bihari Bose
d. Shyamiji Krishna Verma

14. Which one of the following is the correct chronological order?
a. Nasik Conspiracy — Kokori Conspiracy — Lahore Conspiracy
b. Lahore Conspiracy — Nasik Conspiracy— Kakori Conspiracy
c. Nasik Conspiracy —Lahore Conspiracy- Kakori Conspiracy
d. Lahore Conspiracy—Kakori Conspiracy —Nasik Conspiracy

15. Prime Minister Manmohan Singh attended the BIMST-EC Meeting in Bangkok in July 2004 in his maiden overseas engagement after assuming the office of Prime Minister of India . Which of the following is not a member of BIMST-EC?
a. Bangladesh
b. Maldives
c. Thailand
d. Sri Lanka

Score =

Ex 5.19 and 5.20 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution Manual





Q.5.19 and 5.20: Register Renaming
Given the DAXPY kernel shown in Figure 5.31 and the IBM RS/6000 (RIOS-I) floating-point load renaming scheme also discussed in class (both are shown in the following figure), simulate the execution of two iterations of the DAXPY loop and show the state of the floating-point map table, the pending target return queue, and the free list.
• Assume the initial state shown in the table for Problem 5.19.
• Note the table only contains columns for the registers that are referenced in the DAXPY loop.
• As in the RS/6000 implementation discussed, assume only a single load instruction is renamed per cycle and that only a single floating-point instruction can complete per cycle.
• Only floating-point load, multiply, and add instructions are shown in the table, since only these are relevant to the renaming scheme.
• Remember that only load destination registers are renamed.
• The first load from the loop prologue is filled in for you.




Q.5.19:  Fill in the remaining rows in the following table with the map table state and pending target return queue state after the instruction is renamed, and the free list state after the instruction completes.

Sol: 



Sunday, November 24, 2013

Ex 5.14 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution Manual




Q.5.14: Below is the control flow graph of a simple program. The CFG is annotated with three different execution trace paths. For each execution trace circle which branch predictor (bimodal, local, or Gselect) will best predict the branching behavior of the given trace. More than one predictor may perform equally well on a particular trace. However, you are to use each of the three predictors exactly once in choosing the best predictors for the three traces. Circle your choice for each of the three traces and add. (Assume each trace is executed many times and every node in the CFG is a conditional branch. The branch history register for the local, global, and Gselect predictors is limited to 4 bits.)  



Sol: Gselect
      Identical global history at b13 and b15, so the PC is need to differentiate them.



Sol: Local
     Identical global history at b1, so global history doesn’t work. The local history of b1 shows it alternates taken and not taken.




Sol: Bimodal
      All the branches in this trace have a constant behavior, so bimodal predicts well.








try

Year 1997 : Paper 1


1. If the energy gap of semiconductor is 1.1 eV, then it would be -
a. Opaque to the visible light
b. Transparent to the visible light
c. Transparent to the ultraviolet radiation
d. Opaque to the infrared radiation

2. The skin depth of copper is found to be 66 um at 1 MHz at a certain temperature. At the same temperature and at 2 MHz, the skin depth would be approximately -
a. 47 μm
b. 33 μm
c. 92 μm
d. 122 μm

3. With increasing temperature, the electrical conductivity would
a. Increase in metals as well as in intrinsic semiconductors
b. Increase in metals but decrease in intrinsic semiconductors
c. Decrease in metals but increase in intrinsic semiconductors
d. Decrease in metals as well as in intrinsic semiconductors

4. Which one of the following statements is correct?
a. All electrostrictive materials are piezoelectric, and all piezoelectric materials are electrostrictive
b. Piezoelectric materials are a subset of electrostrictive materials
c. Electrostrictive materials are a subset of piezoelectric materials
d. Piezoelectricity and electrostriction are two totally independent properties of materials

5. Ferrites have
a. Low copper loss
b. Low eddy current loss
c. Low resistively
d. Higher specific gravity compared to iron

Score =
Correct Answer:

http://targetiesnow.blogspot.in/

Saturday, November 23, 2013

IES Electronics : Day 1

Year 1997 : Paper 1


1. If the energy gap of semiconductor is 1.1 eV, then it would be -
a. Opaque to the visible light
b. Transparent to the visible light
c. Transparent to the ultraviolet radiation
d. Opaque to the infrared radiation

2. The skin depth of copper is found to be 66 um at 1 MHz at a certain temperature. At the same temperature and at 2 MHz, the skin depth would be approximately -
a. 47 μm
b. 33 μm
c. 92 μm
d. 122 μm

3. With increasing temperature, the electrical conductivity would
a. Increase in metals as well as in intrinsic semiconductors
b. Increase in metals but decrease in intrinsic semiconductors
c. Decrease in metals but increase in intrinsic semiconductors
d. Decrease in metals as well as in intrinsic semiconductors

4. Which one of the following statements is correct?
a. All electrostrictive materials are piezoelectric, and all piezoelectric materials are electrostrictive
b. Piezoelectric materials are a subset of electrostrictive materials
c. Electrostrictive materials are a subset of piezoelectric materials
d. Piezoelectricity and electrostriction are two totally independent properties of materials

5. Ferrites have
a. Low copper loss
b. Low eddy current loss
c. Low resistively
d. Higher specific gravity compared to iron

Year 1997 : Paper 2


6. If α = 0.995, Ie = 10mA and Icbo = 0.5μA then Iceo will be -
a. 25 μA
b. 100 μA
c. 10.1 μA
d. 10.5 μA

7. The approximate value of input impedance of a common emitter amplifier with emitter resistance R is given by
a. β + ARe
b. (β + 1)Re + hie
c. hie
d. (β + 1)Re

8.The circuit diagram shown in the figure consists of transistors in:

a. Parallel connection
b. Cascode connection
c. Darlington connection
d. Cascade connection

9. If an amplifier with gain of -1000 and feedback of β = -0.1 had again change of 20% due to temperature, the change in gain of the feedback amplifier would be -
a. 10%
b. 5%
c. 0.2%
d. 0.01%

10. In the case of the circuit shown in the figure, Vio = 10 mV dc maximum, the maximum possible output offset voltage Voo caused by the input off set voltage Vio with respect to ground is -

a. 60 mV dc
b. 110 mV dc
c. 130 mV dc
d. 150 mV dc

General Ability Test


11. During whose tenure was the first session of Indian National Congress held?
a. Lord Curzon
b. Lord Dufferin
c. Lord Lytton
d. Lord Ripon

12. Which one of the following is the correct chronological order?
a. Champaran Satyagraha — Moplah Rebellion- Jallianwala Bagh Massacre
b. Champaran Satyagraha—Jallianwala Bagh Massacre—Moplah Rebellion
c. Jallianwala Bagh Massacre Champaran Satyagraha — Moplah Rebellion
d. Jallianwala Bagh Massacre — Moplah Rebellion— Champaran Satyagraha

13. Who founded the India House in England during the Indian freedom struggle?
a. Bhikaji Cama
b. Dadabhai Naoroji
c. Ras Bihari Bose
d. Shyamiji Krishna Verma

14. Which one of the following is the correct chronological order?
a. Nasik Conspiracy — Kokori Conspiracy — Lahore Conspiracy
b. Lahore Conspiracy — Nasik Conspiracy— Kakori Conspiracy
c. Nasik Conspiracy — Lahore Conspiracy - Kakori Conspiracy
d. Lahore Conspiracy—Kakori Conspiracy —Nasik Conspiracy

15. Prime Minister Manmohan Singh attended the BIMST-EC Meeting in Bangkok in July 2004 in his maiden overseas engagement after assuming the office of Prime Minister of India . Which of the following is not a member of BIMST-EC?
a. Bangladesh
b. Maldives
c. Thailand
d. Sri Lanka

Score =
Correct Answer:

http://targetiesnow.blogspot.in/

Ex 5.7 through 5.13 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.5.7 through Problem 5.13: Consider the following code segment within a loop body for problems 5:
        if (x is even) then                                      (branch b1) 
                     increment a                                  (b1 taken)
        if (x is a multiple of 10) then                      (branch b2) 
                     increment b                                  (b2 taken)

Assume that the following list of 9 values of x is to be processed by 9 iterations of
this loop. 
8, 9, 10, 11, 12, 20, 29, 30, 31 

Note: assume that predictor entries are updated by each dynamic branch before 
the next dynamic branch accesses the predictor (i.e., there is no update delay).

Q.5.7: Assume that an one-bit (history bit) state machine (see above) is used as the prediction algorithm for predicting the execution of the two branches in this loop. Indicate the predicted and actual branch directions of the b1 and b2 branch instructions for each iteration of this loop. Assume initial state of 0, i.e., NT, for the predictor. 

Sol:
                         8        9       10       11       12       20       29      30       31
b1 predicted: __N____T____ N____ T____  N____ T____ T____ N____ T__
b1 actual     :  __T____N____T____ N____   T____ T____ N____ T____ N__
b2 predicted: __N____N____N____ T____   N____ N____ T____ N____ T__
b2 actual     : __N____ N____T____ N____   N____ T____ N____ T____ N__


Q.5.8: What are the prediction accuracies for b1 and b2?  

Sol: 
b1: 1/9 = 11%
b2: 3/9 = 33%


Q.5.9: What is the overall prediction accuracy?   

Sol:                                           
Overall: 4/18 = 22%



Q.5.10: Assume a two-level branch prediction scheme is used. In addition to the one-bit predictor, a one bit global register (g) is used. Register g stores the direction of the last branch executed (which may not be the same branch as the branch currently being predicted) and is used to index into two separate one-bit branch history tables (BHTs) as shown below. Depending on the value of g, one of the two BHTs is selected and used to do the normal one-bit prediction. Again, fill in the predicted and actual branch directions of b1 and b2 for nine iterations of the loop. Assume the initial value of g = 0, i.e., NT. For each prediction, depending on the current value of g, only one of the two BHTs is accessed and updated. Hence, some of the entries below should be empty. 

Note: assume that predictor entries are updated by each dynamic branch before the next dynamic branch accesses the predictor (i.e. there is no update delay). 

Sol:

                          8        9       10        11      12      20       29       30       31

For g=0
b1 predicted:  __ N____ T____N___ ___ __  T____ T___ __ __     T ______
b1 actual      :  __ T____ N____T____ N____ T____ T____ N____ T____ N__
b2 predicted:  ____  __  N______ __  N____ __ ___ _ __   N____ __ __   N__
b2 actual      :  __ N____ N____T____ N____ N____ T____ N____ T____ N__

For g=1
b1 predicted:  ____  ____ ____ __      N______ _ ___ __    N___ ___ __  N__
b1 actual      :  __ T____N____ T____ N____ T____ T____ N____ T____ N__
b2 predicted:  __ N______ __  N______ __   T____ N____ __ __  T____ __
b2 actual      :  __ N____N____ T____ N____N____ T____  N____ T____ N__ 


Q.5.11: What are the prediction accuracies for b1 and b2? 

Sol:
b1 : 6/9 = 67%
 b2 : 6/9 = 67%

Q.5.12: What is the overall prediction accuracy?  

Sol:                                             
Overall: 67%


Q.5.13: What is the prediction accuracy of b2 when g = 0? Explain why. 


Sol: 100%. Whenever b1 is not taken (i.e. g=0), the number being checked is odd (not even). It follows that the number is also not evenly divisible by ten. Hence, in these cases, b2 is always not taken and the predictor is able to predict b2 with high accuracy in this global context.



Friday, November 22, 2013

Ex 5.1 and 5.2 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.5.1: The displayed code that follows steps through the elements of two arrays (A[] and B[]) concurrently, and for each element, it puts the larger of the two values into the corresponding element of a third array (C[]). The three arrays are of length N.

The instruction set used for Problems 5.1 through 5.6 is as follows:


Identify the basic blocks of this benchmark code by listing the static instructions belonging to each basic block in the following table. Number the basic blocks based on the lexical ordering of the code.
Note: There may be more boxes than there are basic blocks.

Sol: 



Q.5.2: Draw the control flow graph for this benchmark.

Sol: 


Thursday, November 21, 2013

Ex 4.8 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.4.8: In an in-order pipelined processor, pipeline latches are used to hold result operands from the time an execution unit computes them until they are written back to the register file during the writeback stage. In an out-of-order processor, rename registers are used for the same purpose. Given four-wide out-of-order processor TYP pipeline, compute the minimum number of rename registers needed to prevent rename register starvation from limiting  concurrency. What happens to this number if frequency demands force a designer to add five extra pipeline stages between dispatch and execute, and five more stages between execute and retire/writeback?


Sol: For maximum throughput, each pipeline stage will contain four inflight instructions.  Since registers are allocated at decode, and freed at retire, each instruction holds a rename register for a minimum of five cycles.  Hence 4x5 = 20 rename registers are needed at minimum, assuming no data dependences or cache misses that would cause instructions to stall. Adding five extra stages would increase the minimum to 40 registers.
Of course, the students should understand that this is a minimum that assumes no data dependences or cache misses. At the same time, it also assumes throughput of 4 IPC. Since few processors achieve 4 IPC on real programs due to data dependences, control dependences, and cache misses, this “minimum” may in fact be sufficient. The only reliable way to determine the right number of rename registers is a sensitivity study using detailed simulation of a range of rename registers to find the knee in the performance curve.



Wednesday, November 20, 2013

Ex. 3.28, 3.34 and 3.35 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.

Sol:   Arbitration                                                 : 1 cycle 10 ns
        Issuing                                                        : 1 cycle 10 ns
        Controller latency                                       : 60 ns
        Transmission                                              : 64 byte / 8 byte = 8 cycles 80 ns
        Total time needed to fill a single cache line : 10+10+60+80 = 160 ns 
        Bandwidth                                                  : 1/160 *10^9 * 64 bytes = 400 MB 


Q.3.34: Assume a single-platter disk drive with an average seek time of 4.5 ms, rotation speed of 7200 rpm, data transfer rate of 10 Mbytes/s per head, and controller overhead and queueing of 1 ms. What is the average access latency for a 4096-byte read?

Sol: Assume the block size is 512 bytes, 
               the transfer time for a block is 512/10 M  = 51.2 us
        Transfer time for 4096 byte read is 4096/10 M = 0.4096 ms
                            Rotational latency is 60/7200/2    = 1/120/2 s = 4.17 ms 
          Latency for a 4096-byte read is: 4.17 ms + 4.5 + 0.4096 + 1 = 10.0796 ms


Q.3.35: Recompute the average access latency for Problem 34 assuming a rotation speed of 15 K rpm, two platters, and an average seek time of 4.0 ms.

Sol: Assume the block size is 512 bytes, 
            the transfer time for a block is 512/10 M  = 51.2 us
     Transfer time for 4096 byte read is 4096/10 M = 0.4096 ms
                        Rotational latency is 60/15000/2   =1/250 /2 s = 2 ms 
        Latency for a 4096-byte read is: 2 + 4 + 0. 4096 + 1 = 7.4096 ms

Note: The read heads on multiple platters read data in serial, not in parallel.