1. a. In proc P3, Buffer B: [39 41 43 45 47 10 5 0] b. In all procs, Buffer C: [39 41 ...] ... indicates the remainder of the buffer will be unchanged. c. Buffer C: P0: [0 2 ...] P1: [4 6 ...] P2: [8 10 ...] P3: [12 14 ...] d. All procs, Buffer D: [0 2 1 3 3 6 25 30] e. Trick question, no such MPI communication as MPI_AllScatter. 2. a. By eliminating multiples of 3, we reduce the number of integers by 1/3, however some of these were already eliminated by removing the evens. Taking account of this, the improvement is 1/3*f_2(n). You can also think of this by imagining a block of 6 numbers from 0 to 5, only 1 and 5 need be inspected which is 2/3 of the block, the running time would then be 1/3*f_2(n). b. In this case the evens are not eliminated, we reduce the number of integers to be inspected by 1/3 so 2/3*f_2(n) c. We can use two arrays to mark off the primes. Again, remember that in each block of 6 numbers the only integers we need to worry about are those in the form of 6i+5 and 6i+1. We let one array represent the integers of the form 6i+1 (i.e. 1, 7, 13, 19, ...) and the other array be the integers of the form 6i+5 (i.e. 5, 11, 17, 23, 29, ...). We now find the two lowest mutiples of the current prime p in each array, one will be in the form 6i+1 and the other 6i+5. We can now step along each array with a stride of p starting at their repective starting starting points. An implementation would compute an offset of the index between the starting point in the 6i+1 array and the 6i+5 array, allowing use one loop of the form for(i = start; i < highvalue; i += prime) marked[i] = marked[i + offset] = 1 3. a. The sequential version takes O(nX) time. output[n-1] = input[n-1] for(i = n-2; i >= 0; i==) Output[i] = Output[i+1] + Input[i] b. if(id == p-1) Output[n-1] = Input[n-1] MPI_SEND(Output[n-1], 1, MPI_INT, p-1 0, MPI_COMMWRLD) } else if (id == 0) { MPI_RECV(tmp, 1, MPI_INT, id+1, 0, MPI_COMMWRLD); Output[0] += tmp + Input[0]; } else { MPI_RECV(tmp, 1, MPI_INT, id+1, 0, MPI_COMMWRLD); Output[id] = tmp + Input[id]; MPI_SEND(Output[i], MPI_INT, id-1, 0, MPI_COMMWORLD); } c. The running time is n(l + v/b) + nX, the speedup is the running time of the sequential algorithm over this, so: nX / (n(l + v/b) + nX) = X / ((l + v/b) + X) d. We'll assume you do a block distribution, computing the values low which is the index of the low value and high which is the index of the high value for each block. Output[high] = Input[high]; /* Each processor computes the suffix sum of it's own block */ /* Note: Each block has incorrect answers except for id = p-1 */ for(i = high-1, i >= low; i--) { Output[i] = Ouput[i+1] + Input[i]; } /* Each processor sends the first element of its block to the processor * preceding it. */ if(id == p-1) MPI_SEND(Output[low], 1, MPI_INT, p-1 0, MPI_COMMWRLD) } else if (id == 0) { MPI_RECV(tmp, 1, MPI_INT, id+1, 0, MPI_COMMWRLD); Output[low] += tmp; } else { MPI_RECV(tmp, 1, MPI_INT, id+1, 0, MPI_COMMWRLD); Output[low] += tmp; MPI_SEND(low, MPI_INT, id-1, 0, MPI_COMMWORLD); } /* Now each process has the correct value in Output[low] but we still * need to add the value passed from the next processor to each other * element in the block */ if(id != p-1) { for(i = low+1; i < high; i++) { Output[i] += tmp; } } e. The running time is (2 * nX/p + p(l + v/b)), the speedup is the running time of the sequential algorithm over this, so: nX / (2 * nX/p + p(l + v/b))