CS475 Lab 3: Locality

Objectives

Loop permutations and loop tiling are powerful program transformations that can be used to exploit data locality and achieve better performance. This lab is intended to introduce you to the idea of loop permutations. The next programming assignment will give you experience working with loop tiling, and you are encouraged to do the loop tiling sections of the link below if/when you have time.

Description

Refer to this lab exercise crafted by Tomofumi Yuki, given as a short, hands on introduction to fresh Ph.D. students (not necessarily in HPC, Embedded Systems, or Compilers). The goal was to give them an idea about the research issues in this domain. Download EJCP-Labs.tar and extract the contents. Read the Setup section in the link above to learn about the contents of this tar file. We have already done Section 0 and provided a makefile.

Do the following:

  1. Make and run the syr2k program to collect a baseline performance measurement:
    syr2k 1500 1500 2>/dev/null

    The contents of the matrices are printed to stderr so if you remove the
    2>/dev/null
    then you will see them displayed on screen.
  2. Complete section 4 of the link above entitled, "Apply loop permutation to take advantage of hardware prefetching".
  3. Compare the performance of syr2k-perm with the baseline syr2k.
  4. Parallelize both syr2k and syr2k-perm by adding the following line above the outer most loop in the kernel function.
    #pragma omp parallel for
    private(i,j,k)

    Don't forget to adjust the makefile to use the
    -fopenmp
    flag for gcc, and you need to add the
    #include <omp.h>
    line to any source code files.
  5. What can you observe about the speedups for each? How does the performance of syr2k compare with syr2k-perm for both sequential and parallel versions?
  6. Why do you think one loop permutation may or may not lead to better performance than another?

(Optional) Do the other exercises on loop tiling in order and comment on your observations. This is not required for this lab, but it may be helpful to revisit it as you work through PA3.