Discussion 2: Finding peak integer performance (ceiling) on the capital machines


    The dp_fma_reg_4.c file in peak_code.tar is written for testing peak performance of double precision data values. Please refer to the README file for instructions on how to run the file and interpret the results.
    In the file, the 'test_dp_avx_4_internal' function executes multiply-add operations. It uses Intel AVX intrinsics to perform operations with operands in vector registers. In the test_dp_avx_4 function, the number of operations which are printed in the output are calculated. The formula used IS INCORRECT, you need to analyze and fix it. Check for a comment starting with '/*@'.

    Your second task is to modify the code to measure the peak integer add-max operations. Refer to https://software.intel.com/sites/landingpage/IntrinsicsGuide/ for usage of Intel intrinsics. Since the AVX instruction set does not support integer operations, you will have to use SSE2 intrinsics for this. You may also want to inspect the assembly code to ensure that the integer operations are not optimized away by the compiler. The '-S' gcc flag in the Makefile for generating the assembly code can help you do this.

    Questions:
  1. What is the correct number of operations and the peak GFLOPS for double precision multiply-add operations?
  2. How did you modify the code for integer add-max operations?
  3. What is the number of operations and GFLOPS for integer add-max operations?