Here are a makefile and a vanilla C matrix multiply code.
Use the sub-O optimizations discussed in lecture 6 to improve the performance. Do not parallelize the code (no SSE no multithreaduing no MPI). Just improve the sequential performance of the C code. We are doing this in C because this gives us explicit control over the memory management model.
Each code should contain the original MM code to verify your results against (i.e. check that your multiply returned the correct answers). Be sure not to include this in your timing code.
Once you have implemented your optimizations, run them against problem sizes of 256, 512, 1024, 2048. Record your results for your homework report. (This will be for multiple versions, so your results should be in the form of and N x M matrix where N is the optimization and M is the problem size.)
Document your optimizations, and document which (combinations of) optimizations provide which speedup. Document which system you used for your experiments: OS, compiler and optimization flags, CPU type, speed, cache size, memory size. In Linux you can find information about your machine in /proc/cpuinfo and /proc/meminfo. In Windows this info is available in system properties under my computer. In a MAC, check under About this Mac under Apple.
Check in your report and your code.
Copyright © Colorado State University. All rights reserved.