Performance of Single Cycle Machines
To help in understanding the "critical path" concept
from P&H, 3rd edition, revised, pp. 315ff
Example
Performance of Single-Cycle Machines
Assume that the operation times for the major functional units in this implementation are the following:- Memory units: 200 picoseconds (ps)
- ALU and adders: 100 ps
- Register file (read or write):o 50 ps
- An implemenation in which every instruction operates in 1 clock cycle
of a fixed length.
- An implementation where every instruction executes in 1 clock cycle using a variable length clock, which for each instruction is only as long as it needs to be. (Such an approach is not terribly practical, but it will allow us to see what is being sacrificed when all the instructions must execute in a single clock of the same length.)
To compare the performance, assume the following instruction mix: 25% loads, 10% stores, 45% ALU instructions, 15% branches, and 5% jumps.
Answer
Let's start by comparing the CPU execution times. Recall from Chapter 4 (Sections 1.4 and 1.5 in P&H, 4th edition) that
Since CPI must be 1, we can simplify this to
We need only find the clock cycle time for the two implementations, since the instruction count and CPI are the same for both implementations. The critical path for the different instruction classes is as follows:
Instruction class | Functional units used by the instuction class | ||||
---|---|---|---|---|---|
R-type | Instruction fetch | Register access | ALU | Register access | |
Load word | Instruction fetch | Register access | ALU | Memory access | Register access |
Store word | Instruction fetch | Register access | ALU | Memory access | |
Branch | Instruction fetch | Register access | ALU | ||
Jump | Instruction fetch | ||||
Using these critical paths, we can compute the required length for each instruction class:
Instruction class |
Instruction memory |
Register read |
ALU operation |
Data memory |
Register write |
Total |
---|---|---|---|---|---|---|
R-type | 200 | 50 | 100 | 0 | 50 | 400 ps |
Load word | 200 | 50 | 100 | 200 | 50 | 600 ps |
Store word | 200 | 50 | 100 | 200 | 550 ps | |
Branch | 200 | 50 | 100 | 0 | 350 ps | |
Jump | 200 | 200 ps | ||||
The clock cycle for a machine with a single clock for all instructions will be determined by the longest instruction, which is 600 ps. (This timing is approximate, since our timing model is quite simplistic. In reality, the timing of modern digital systems is complex.)
Thus, the average time per instruction with a variable clock is
= 447.5 ps
Since the variable clock implementation has a shorter average clock cycle, it is clearly faster. Let's find the performance ratio:
CPU performancevariable clock CPU execution timesingle clock --------------------------- = ----------------------------- CPU performancesingle clock CPU execution timevariable clockIC × CPU clock cyclesingle clock = -------------------------------- IC × CPU clock cyclevariable clock
CPU clock cyclesingle clock = --------------------------- CPU clock cyclevariable clock
= 600/447.5 = 1.34
The variable clock implementation would be 1.34 times faster. Unfortunately, implementing a variable-speed clock for each instruction class is extremely difficult, and the overhead for such an approach could be larger than any advantage gained. As we will see in the next section, an alternative is to use a shorter clock cycle that does less work and then vary the number of clock cycles for the different instruction classes.
Comments from the instructor:
The example above is taken directly from Patterson & Hennessy, COD, 3rd edition, revised, pp. 315 ff, even though I omitted the appropriate quotation marks. There appears to be no such example in the 4th edition, which is unfortunate, because this example clearly shows what is involved in a critical path.Also see the first paragraph on page 323 of the 4th edition:
"Figure 4.19 shows the operation of the datapath for an R-type instruction, such as add $t1, $t2, $t3. Although everything occurs in one clock cycle, we can think of four steps to execute the instruction; these steps are ordered by the flow of information ...."
Similarly, you may think of a store word instruction as having five steps in its critical path, while a jump instruction has only one step. Etc.
Please use these steps for determining the critical path on any exam question I may ask.
Please let me know if you find any typos in this writeup.