CS270 Fall 2008 Homework 1 Due November 11, 2008 by 11:59pm Submit hw1_username.txt file by using the following command: % ~cs270/bin/checkin HW1 hw1_username.txt ----------------------------------------------------------------------------- 1) Memory rate question The same rate issues that arise with I/O are relevant to memory as well. This question looks at the rate at which things can be fetched from memory as compared with the rate at which the CPU can perform floating point operations. On a forum, someone posted the following: "I ran McCalpin's stream benchmark on the 3.4 GHZ Xeon and got 4759 MB/s for Triad." http://software.intel.com/en-us/forums/intel-c-compiler/topic/61313/ October 17, 2008 posting Keep in mind that 1 GHz = 10^{9} = 1,000,000,000 cycles/sec, 1 MB = 2^{20} bytes = 1,048,576 bytes, and 1 byte = 8 bits. The stream benchmark measures achievable bandwidth from memory. Assume that the Xeon can execute 2 floating point operations (flop) per cycle. Each floating point operation requires two floating point inputs. Also assume the floating point operations are on doubles, which are 64 bits. a) What is the peak rate of floating point operations per second (flop/sec) assuming infinite bandwidth from memory? [flop/sec = 2 flop / cycle * 3.4*10^{9} / sec = 6.8*10^{9} flop/sec ] b) What is the rate of floating point operations per second as dictated by the achievable memory bandwidth? (Hint: no matter how fast the CPU can go, it can only operate on floating point numbers as fast as they can be pulled from memory). Note: (1 flop)/(2 doubles) = (1 flop)/(16 bytes) [flop/sec = (1 flop)/(16 bytes) * (4759*2^{20} bytes)/sec = 4759*2^{16} flop / sec = 311,885,824 flop / sec ] c) What percentage of peak is the example machine able to achieve assuming that it must pull all of its operands from memory? (Hint: divide the answer from (b) by the answer from (a) and gasp at the ridiculously small number). [ (311,885,824 flop / sec ) / (6.8*10^{9} flop/sec) = 4.6 % ] ----------------------------------------------------------------------------- 2) Interrupt-Driven vs. Polling question (similar to Example 8.1, which we went over in class) Assume we need to process 1000 sequences of 100 character genome sequences. The sequences are typed in at a rate of 1 character every 0.2 seconds. Assume that it takes 10 seconds to process a 100-character sequence. a) Assuming an average of 5 characters per word, what is the words per minute rate for the typist? [words/minute = 1 char / 0.2 sec * 1 word / 5 char * 60 sec /minute = 60 words/minute] b) Assuming polling I/O, how long in hours does it take for all 1000 sequences to be typed in and processed? [time to read one sequence = 0.2 * 100 = 20 secs ] [total time = (20 + 10) * 1000 = 30,000 secs = 8.33 hours ] c) Assuming interrupt-driven I/O where each character requires 0.0001 seconds to read in after an interupt, how long does it take for all 1000 sequences to be typed in and processed? (Hint: keep in mind that a string must be typed in before it can be processed). [read first sequence = 20 secs ] [read 2nd, process 1st = max(100*0.0001 + 10 secs = 10.01 secs, 20) = 20 secs ] [total = 1001 * 20 = 20,020 sec = 5.56 hours ] ----------------------------------------------------------------------------- 3) Interrupt-Driven I/O Why must the keyboard and display status registers be writable to support interrupt-driven I/O? (Answer using 2 sentences max). [ Interrupt-driven I/O requires that the CPU be able to set bit 14 in the KBSR and DSR to indicate that those I/O devices have the right to interrupt the processor.] ----------------------------------------------------------------------------- 4) What does the following LC-3 program do? (Note: You cannot run this program in the reference simulator because memory-mapped I/O has not been implemented.) .ORIG x3000 LD R0,ASCII AND R1,R1,#0 ADD R1,R1,#10 LOOP LDI R2, DSR BRzp LOOP STI R0, DDR ADD R0, R0, #1 ADD R1, R1, #-1 BRnp LOOP HALT ASCII .FILL x0030 DSR .FILL xFE04 DDR .FILL xFE06 .END [ Prints 0123456789 to standard out. ] ----------------------------------------------------------------------------- 5) Using one sentence describe what the following LC3 assembly program does to the data stored at the DATA label. The idea is to think about what would happen to the numbers in the 10 data slots no matter what their value. What is the goal of the computation? An example goal (not the answer for this question though) is "This assembly program counts all of the instances of 2's complement, 16-bit even numbers in the 10 data locations". .ORIG x3000 LEA R0, DATA AND R1, R1, #0 ADD R1, R1, #9 LOOP1 ADD R2, R0, #0 ADD R3, R1, #0 LOOP2 JSR SUB1 ADD R4, R4, #0 BRnz LABEL JSR SUB2 LABEL ADD R2, R2, #1 ADD R3, R3, #-1 BRp LOOP2 ADD R1, R1, #-1 BRp LOOP1 HALT DATA .BLKW 10 x0000 SUB1 LDR R5, R2, #0 NOT R5, R5 ADD R5, R5, #1 LDR R6, R2, #1 ADD R4, R5, R6 RET SUB2 LDR R4, R2, #0 LDR R5, R2, #1 STR R4, R2, #1 STR R5, R2, #0 RET .END [The program sorts the numbers in descending order.] ----------------------------------------------------------------------------- 6) Why doesn't this program print the #7? .ORIG x3000 JSR FOO OUT BRnzp DONE FOO AND R0, R0, #0 ADD R0, R0, #7 JSR BAR RET DONE HALT ASCII .FILL x0030 ; '0' BAR LD R1, ASCII ADD R0, R0, R1 RET .END [The JSR BAR instruction overwrites R7, which was storing the linkage back to main.] ----------------------------------------------------------------------------- 7) Answer the questions below about this C example. See http://www.gnu.org/software/libtool/manual/libc/Example-of-Getopt.html for an example of getopt() usage. Also type "man 3 getopt" on a linux box to help you with this problem. #include #include #include #include int main (int argc, char **argv) { printf("argc=%d\n"); int i; for (i=0; i int v = 42; int foo(int v); int main () { printf("main A: v = %d \n", v); int v = 3; printf("main B: v = %d \n", v); { int v = 7; printf("main C: v = %d \n", v); } v = foo(v); printf("main D: v = %d \n", v); } int foo(int v) { printf("main E: v = %d \n", v); return v+1; } [ main A: v = 42 main B: v = 3 main C: v = 7 main E: v = 3 main D: v = 4 ] ----------------------------------------------------------------------------- 9) Problem 14.15 in text describes the problem setup. Instead of drawing the run-time stack when the program is about to return from the call to f, show the stack when the function is about to return from the call to h. (NOTE: The publicly available answer for 14.15 has a typo in the comments. main pushes the parameters to f in reverse order. The numbers are correct and the comments are incorrect. ) [ xEFF0 | xEFF5 | // frame pointer for f xEFF1 | x3305 | // return address to f after call to h xEFF2 | 0 | // h's return value xEFF3 | 5 | // first actual parameter to h xEFF4 | 6 | // second actual parameter to h xEFF5 | 16 | // f's only local variable, int x1 xEFF6 | xEFFE | // frame pointer for main xEFF7 | x3103 | // return address to main xEFF8 | | // f's return value xEFF9 | 4 | // first actual parameter to f xEFFA | 5 | // second actual parameter to f xEFFB | 6 | // third actual parameter to f xEFFC | 6 | // main's third local variable, int c xEFFD | 5 | // main's second local variable, int b xEFFE | 4 | // main's first local variable, int a xEFFF | | ] ----------------------------------------------------------------------------- 10) The following is a picture of the run-time stack after main calls a function called foo and foo calls a function called bar. bar has set up its activation record, but has not done any more computation. xEFF1 | bl2 | // bar's second local variable xEFF2 | bl1 | // bar's first local variable xEFF3 | 0xEFF7 | // frame pointer for foo xEFF4 | | // return address for foo after call to bar xEFF5 | | // return value from bar xEFF6 | fl1 + 1 | // first arg to bar xEFF7 | fl1 | // foo's only local variable xEFF8 | 0xEFFE | // frame pointer for main xEFF9 | | // return address for main after call to foo xEFFA | | // return value from foo xEFFB | l1 | // first arg to foo xEFFC | l1+l2 | // second arg to foo xEFFD | l2 | // main's second local variable xEFFE | l1 | // main's first local variable xEFFF | | a) Write LC3 code for main(), foo(), and bar() that sets up the shown stack. Assume that upon entry into main, the stack pointer (R6) contains the address xEFFF. You do not have to set the local variables to an initial value, but you do have to access the local variables to perform computations specified for parameter passing (e.g. l1+l2). Put a comment for each line of assembly. [ main ADD R5, R6, #-1 ; set up main frame pointer ADD R6, R6, #-2 ; allocate space for main's locals ... LDR R0, R5, #0 ; R0 <- l1 LDR R1, R5, #-1 ; R1 <- l2 ADD R1, R0, R1 ; R1 <- l1 + l2 ADD R6, R6, #-1 STR R1, R6, #0 ; push l1 + l2 onto stack ADD R6, R6, #-1 STR R0, R6, #0 ; push l1 onto stack JSR foo foo ADD R6, R6, #-1 ; allocate spot for return value ADD R6, R6, #-1 ; STR R7, R6, #0 ; push return address onto stack ADD R6, R6, #-1 ; STR R5, R6, #0 ; push caller's frame pointer onto stack ADD R5, R6, #-1 ; set up foo frame pointer ADD R6, R6, #-1 ; allocate space for foo's local ... LDR R0, R5, #0 ; R0 <- fl1 ADD R1, R0, #1 ; R1 <- fl1+1 ADD R6, R6, #-1 STR R1, R6, #0 ; push fl1+1 onto stack JSR BAR bar ADD R6, R6, #-1 ; allocate spot for return value ADD R6, R6, #-1 ; STR R7, R6, #0 ; push return address onto stack ADD R6, R6, #-1 ; STR R5, R6, #0 ; push caller's frame pointer onto stack ADD R5, R6, #-1 ; set up bar frame pointer ADD R6, R6, #-2 ; allocate space for foo's local ] b) Now show the code in each of the functions that will clean up their activation records. [ main ADD R5, R6, #-1 ; set up main frame pointer ADD R6, R6, #-2 ; allocate space for main's locals ... LDR R0, R5, #0 ; R0 <- l1 LDR R1, R5, #-1 ; R1 <- l2 ADD R1, R0, R1 ; R1 <- l1 + l2 ADD R6, R6, #-1 STR R1, R6, #0 ; push l1 + l2 onto stack ADD R6, R6, #-1 STR R0, R6, #0 ; push l1 onto stack JSR foo ; CLEANUP starts here LDR R0, R6, #0 ; pop return value from foo ADD R6, R6, #1 ; ... ADD R6, R5, #2 ; pop local variables HALT foo ADD R6, R6, #-1 ; allocate spot for return value ADD R6, R6, #-1 ; STR R7, R6, #0 ; push return address onto stack ADD R6, R6, #-1 ; STR R5, R6, #0 ; push caller's frame pointer onto stack ADD R5, R6, #-1 ; set up foo frame pointer ADD R6, R6, #-1 ; allocate space for foo's local ... LDR R0, R5, #0 ; R0 <- fl1 ADD R1, R0, #1 ; R1 <- fl1+1 ADD R6, R6, #-1 STR R1, R6, #0 ; push fl1+1 onto stack JSR bar ; CLEANUP starts here LDR R0, R6, #0 ; pop return value from bar ADD R6, R6, #1 ; ... ADD R6, R5, #1 ; pop local variables LDR R5, R6, #0 ; pop caller frame pointer ADD R6, R6, #1 ; LDR R7, R6, #0 ; pop the return address ADD R6, R6, #1 ; RET bar ADD R6, R6, #-1 ; allocate spot for return value ADD R6, R6, #-1 ; STR R7, R6, #0 ; push return address onto stack ADD R6, R6, #-1 ; STR R5, R6, #0 ; push caller's frame pointer onto stack ADD R5, R6, #-1 ; set up bar frame pointer ADD R6, R6, #-2 ; allocate space for foo's local ; CLEANUP starts here ... ADD R6, R5, #2 ; pop local variables LDR R5, R6, #0 ; pop caller frame pointer ADD R6, R6, #1 ; LDR R7, R6, #0 ; pop the return address ADD R6, R6, #1 ; RET ]