CS575 Parallel Programming warm up exercise: Block Matrix Multiply ================================================================== Consider a block matrix multiply C = A * B, where both A and B are partitioned into four blocks: A = A00 A01 B = B00 B01 A10 A11 B10 B11 Multiplying A and B can be done by multiplying and adding blocks: C00 = A00*B00 + A01*B10 C01 = A00*B01 + A01*B11 C10 = A10*B00 + A11*B10 C11 = A10*B01 + A11*B11 Write an MPI program for four processors (PE0 .. PE3) that performs block matrix multiply. Initially PE0 creates A00 and B00, PE1 creates A01 and B01 PE2 creates A10 and B10, PE3 creates A11 and B11 A has elements aij = i + j (i = 0..15, j = 0..15) B has elements bij = i - j (i = 0..15, j = 0..15) So e.g. block A01 contains elements aij (i = 0 .. 7, j = 8 .. 15) Your program will consist of the following stages: 1. Init: Each PE creates its A and B blocks. 2. Row exchange: PE0 and PE1 exchange their A blocks, both PEs now have A00 and A01 PE2 and PE3 exchange their A blocks, both PEs now have A10 and A11 3. Col exchange: PE0 and PE2 exchange their B blocks, both PEs now have B00 and B10 PE1 and PE3 exchange their B blocks, both PEs now have B01 and B11 4. Multiply: PE0 computes C00, PE1 computes C01 PE2 computes C10, PE3 computes C11 5. Gather: PE1, PE2 and PE3 send their C blocks to PE0 6. Print: PE0 prints C Clearly mark the stages in your code and add comments stating your assumptions clearly. Turn in your C-code and the Makefile used for compilation. For MPI help check the cs475 assignments page, lab 5: getting started with MPI. Use either the department veggie machines (as was done in cs475), or the department fruit machines (dates, figs, grapes, huckle-berries, kiwis, lemons, melons, nectarines, peaches, pears, raspberries, pomegranates, kumquats, bananas, coconuts, apples, and oranges). These are the machines we will use later on in CUDA experiments. This is not a performance exercise, just a programming exercise, making sure you understand how to write a blocked matrix multiply in parallel. You are going to need this for your CUDA work later in this course. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.