In this, you will modify a CUDA kernel and empirically test and compare its performance to the given code. The given program that performs vector addition. The provided program does not coalesce memory accesses. You will modify it to coalesce memory access.
Resources that may be useful for this assignment include:
You will need to use a linux machine with an NVIDIA graphics card. In the CSU 325 lab, the machines have such a graphics card. These are machines named after fish (anchovy .. ), see these machines and grep 325.
*** Exception: Do not use wahoo for your CUDA experiments.***
To use CUDA on the lab machines at CSU, you will need to set the right environment variables. It's convenient to edit your .cshrc or your .profile file (depending on whether you are using tcsh or bash) to set them when you log in. You should add
Download and untar the CUDAL8.tar file. To compile a CUDA program, use the CUDA compiler nvcc. You should use the provided Makefile as a starting point for this lab and for your CUDA assignmnts. The Makefile shows you which gcc is to be used.
As discussed in class, vecadd is a micro benchmark to determine the effectiveness of coalescing. You are provided with a non-coalescing version, and it is your job to create a coalescing version, and to measure the difference in performance of the two codes.
Here is a set of files provided in PA5.tar for Vecadd:
Compile and run the provided program vecadd00 and collect data on the time the program takes with 60,000 values per thread. Test your new program and compare the time it takes to perform the same work performed by the original. Discuss your results in discussion D8.