##
Cuda Exercise one: add vector and coalescing

The goal of the exercise is to build on your last touch vector program
(the one with a 2D grid of 2D thread blocks) to add two vectors element
wise: (forall i in 0 to N-1 C[i]=A[i]+B[i]), in two ways
- 1. Have each thread read and write a contiguous block of A, B, and C
- 2. Have the threads in a row of a thread block read interleaved values

and the difference in performance.
Let's discuss

- 1. the difference in code structure between touch vector and add vector.
- 2. which elements a thread adds and stores in the two approaches.

Here is a Makefile and an
incomplete cuda program to get started with.