CS270 Fall 2008 Homework 1 Due November 11, 2008 by 11:59pm Submit hw1_username.txt file by using the following command: % ~cs270/bin/checkin HW1 hw1_username.txt ----------------------------------------------------------------------------- 1) Memory rate question The same rate issues that arise with I/O are relevant to memory as well. This question looks at the rate at which things can be fetched from memory as compared with the rate at which the CPU can perform floating point operations. On a forum, someone posted the following: "I ran McCalpin's stream benchmark on the 3.4 GHZ Xeon and got 4759 MB/s for Triad." http://software.intel.com/en-us/forums/intel-c-compiler/topic/61313/ October 17, 2008 posting Keep in mind that 1 GHz = 10^{9} = 1,000,000,000 cycles/sec, 1 MB = 2^{20} bytes = 1,048,576 bytes, and 1 byte = 8 bits. The stream benchmark measures achievable bandwidth from memory. Assume that the Xeon can execute 2 floating point operations (flop) per cycle. Each floating point operation requires two floating point inputs. Also assume the floating point operations are on doubles, which are 64 bits. Using the benchmark results quoted above, answer the following questions. a) What is the peak rate of floating point operations per second (flop/sec) assuming infinite bandwidth from memory? b) What is the rate of floating point operations per second as dictated by the achievable memory bandwidth? (Hint: no matter how fast the CPU can go, it can only operate on floating point numbers as fast as they can be pulled from memory). Note: (1 flop)/(2 doubles) = (1 flop)/(16 bytes) c) What percentage of peak is the example machine able to achieve assuming that it must pull all of its operands from memory? (Hint: divide the answer from (b) by the answer from (a) and gasp at the ridiculously small number). ----------------------------------------------------------------------------- 2) Interrupt-Driven vs. Polling question (similar to Example 8.1, which we went over in class) Assume we need to process 1000 sequences of 100 character genome sequences. The sequences are typed in at a rate of 1 character every 0.2 seconds. Assume that it takes 10 seconds to process a 100-character sequence. a) Assuming an average of 5 characters per word, what is the words per minute rate for the typist? b) Assuming polling I/O, how long in hours does it take for all 1000 sequences to be typed in and processed? c) Assuming interrupt-driven I/O where each character requires 0.0001 seconds to read in after an interupt, how long does it take for all 1000 sequences to be typed in and processed? (Hint: keep in mind that a string must be typed in before it can be processed). ----------------------------------------------------------------------------- 3) Interrupt-Driven I/O Why must the keyboard and display status registers be writable to support interrupt-driven I/O? (Answer using 2 sentences max). ----------------------------------------------------------------------------- 4) What does the following LC-3 program do? (Note: You cannot run this program in the reference simulator because memory-mapped I/O has not been implemented.) .ORIG x3000 LD R0,ASCII AND R1,R1,#0 ADD R1,R1,#10 LOOP LDI R2, DSR BRzp LOOP STI R0, DDR ADD R0, R0, #1 ADD R1, R1, #-1 BRnp LOOP HALT ASCII .FILL x0030 DSR .FILL xFE04 DDR .FILL xFE06 .END ----------------------------------------------------------------------------- 5) Using one sentence describe what the following LC3 assembly program does to the data stored at the DATA label. The idea is to think about what would happen to the numbers in the 10 data slots no matter what their value. What is the goal of the computation? An example goal (not the answer for this question though) is "This assembly program counts all of the instances of 2's complement, 16-bit even numbers in the 10 data locations". .ORIG x3000 LEA R0, DATA AND R1, R1, #0 ADD R1, R1, #9 LOOP1 ADD R2, R0, #0 ADD R3, R1, #0 LOOP2 JSR SUB1 ADD R4, R4, #0 BRnz LABEL JSR SUB2 LABEL ADD R2, R2, #1 ADD R3, R3, #-1 BRp LOOP2 ADD R1, R1, #-1 BRp LOOP1 HALT DATA .BLKW 10 x0000 SUB1 LDR R5, R2, #0 NOT R5, R5 ADD R5, R5, #1 LDR R6, R2, #1 ADD R4, R5, R6 RET SUB2 LDR R4, R2, #0 LDR R5, R2, #1 STR R4, R2, #1 STR R5, R2, #0 RET .END ----------------------------------------------------------------------------- 6) Why doesn't this program print the #7? .ORIG x3000 JSR FOO OUT BRnzp DONE FOO AND R0, R0, #0 ADD R0, R0, #7 JSR BAR RET DONE HALT ASCII .FILL x0030 ; '0' BAR LD R1, ASCII ADD R0, R0, R1 RET .END ----------------------------------------------------------------------------- 7) Answer the questions below about this C example. See http://www.gnu.org/software/libtool/manual/libc/Example-of-Getopt.html for an example of getopt() usage. Also type "man 3 getopt" on a linux box to help you with this problem. #include #include #include #include int main (int argc, char **argv) { printf("argc=%d\n", argc); int i; for (i=0; i int v = 42; int foo(int v); int main () { printf("main A: v = %d \n", v); int v = 3; printf("main B: v = %d \n", v); { int v = 7; printf("main C: v = %d \n", v); } v = foo(v); printf("main D: v = %d \n", v); } int foo(int v) { printf("main E: v = %d \n", v); return v+1; } ----------------------------------------------------------------------------- 9) Problem 14.15 in text describes the problem setup. Instead of drawing the run-time stack when the program is about to return from the call to f, show the stack when the function is about to return from the call to h. (NOTE: The publicly available answer for 14.15 has a typo in the comments. main pushes the parameters to f in reverse order. The numbers are correct and the comments are incorrect. ) ----------------------------------------------------------------------------- 10) The following is a picture of the run-time stack after main calls a function called foo and foo calls a function called bar. bar has set up its activation record, but has not done any more computation. xEFF1 | bl2 | // bar's second local variable xEFF2 | bl1 | // bar's first local variable xEFF3 | 0xEFF7 | // frame pointer for foo xEFF4 | | // return address for foo after call to bar xEFF5 | | // return value from bar xEFF6 | fl1 + 1 | // first arg to bar xEFF7 | fl1 | // foo's only local variable xEFF8 | 0xEFFE | // frame pointer for main xEFF9 | | // return address for main after call to foo xEFFA | | // return value from foo xEFFB | l1 | // first arg to foo xEFFC | l1+l2 | // second arg to foo xEFFD | l2 | // main's second local variable xEFFE | l1 | // main's first local variable xEFFF | | a) Write LC3 code for main(), foo(), and bar() that sets up the shown stack. Assume that upon entry into main, the stack pointer (R6) contains the address xEFFF. You do not have to set the local variables to an initial value, but you do have to access the local variables to perform computations specified for parameter passing (e.g. l1+l2). Put a comment for each line of assembly. b) Now show the code in each of the functions that will clean up their activation records.