CS 553 Programming Assignment #1 — LLVM introduction including CFGs, call graphs, and peephole optimization

(Due Monday September 8th at 2:30pm)
(9/7/14 update): Use Makefile.countLoads to link your countLoads.c file in with lulesh.cc. See the posting in RamCT for more details.

Introduction

Compiler infrastructures enable the analysis and transformation of programs. The goal of this assignment is to learn about compiler infrastructures in general and LLVM in particular. We recommend you read the full assignment through before starting to work through it.

The amount of programming needed for this assignment is on the order of 100 lines of code. The challenge will be learning how to write passes in the LLVM compiler infrastructure. If you get stuck, send me an email or stop by my office to ask what might be going on. I have already implemented the passes for PA1.

Installing LLVM

When writing passes for LLVM, it is easiest to install your own version of it. Here we have instructions for how to do the installation on the CS department linux machines. You are welcome to install llvm on your own machine, but we recommend you start from the same tar ball that the rest of the class will be using. Also, if you have platform specific installation issues, we won't necessarily be able to help.

You will be assigned a unix group such as cs553b, cs553c, etc. Log into a department workstation and then do the following:

  cd ~cs553/cs553x    // where X is your unix group letter
  cp ../public/llvm-rev215820.tgz .
  tar xvzf llvm-rev215820.tgz
  cd llvm
  ./configure --disable-optimized
  make -j 8               // this build will take about 30 minutes

Once the build is done, there should be a Debug+Asserts/ subdirectory. It contains bin/ and lib/ directories. Set your path so that you can use the executables in the bin/ subdirectory. Make sure to change cs553x to your cs553 group.

  // in tcsh or csh
  setenv PATH /s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin:$PATH
  
  // in bash or sh
  export PATH=/s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin:$PATH

Becoming Familiar with LLVM

This section was written by Susan Horowitz.

LLVM is composed of many separate pieces. To use LLVM to turn a C source file into an x86 (or x86_64) executable, we need to:

transform the C source into LLVM bitcode
optionally optimize the bitcode
transform the bitcode into assembly
assemble the program

Just as C files use the extension .c, LLVM bitcode uses the extension .bc, assembly uses the extension .s, and LLVM human-readable assembly uses the extension .ll.

Suppose we have a C program in the file foo.c. Below are the steps needed to create an executable (with no optimization):


    clang -emit-llvm -O0 -c foo.c -o foo.bc    // create bitcode .bc
    llc foo.bc                                 // create assembly .s
    gcc foo.s -o foo                           // create executable "foo"

To run your program:

    ./foo

To turn LLVM bitcode into human-readable LLVM assembly (foo.ll):

    llvm-dis -f foo.bc

LLVM Instructions

Let's get a little more familiar with LLVM's instructions. Consider the following C program, sum.c:

#include <stdio.h>

int main() {
  int n;
  int sum;
  sum = 0;
  for (n = 0; n < 10; n++)
    sum = sum + n*n;
  printf("sum: %d\n", sum);
}

Running clang and llvm-dis produces the following LLVM assembly code (on one of the "capital" machines):


; ModuleID = 'sum.bc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [9 x i8] c"sum: %d\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %n = alloca i32, align 4
  %sum = alloca i32, align 4
  store i32 0, i32* %retval
  store i32 0, i32* %sum, align 4
  store i32 0, i32* %n, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %0 = load i32* %n, align 4
  %cmp = icmp slt i32 %0, 10
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %1 = load i32* %sum, align 4
  %2 = load i32* %n, align 4
  %3 = load i32* %n, align 4
  %mul = mul nsw i32 %2, %3
  %add = add nsw i32 %1, %mul
  store i32 %add, i32* %sum, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %4 = load i32* %n, align 4
  %inc = add nsw i32 %4, 1
  store i32 %inc, i32* %n, align 4
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %5 = load i32* %sum, align 4
  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i32 0, i32 0), i32
 %5)
  %6 = load i32* %retval
  ret i32 %6
}

declare i32 @printf(i8*, ...) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-fra
me-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-
size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-n
on-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe
-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.6.0 (trunk)"}

Some remarks about this assembly code:

Anything on a line after ; is a comment.
There are 5 basic blocks named entry, for.cond, for.body, for.inc, and for.end,
Note that the entry and for.cond blocks, as well as the for.body and for.inc blocks are each really a single basic block (the first block ends with an unconditional branch to the second block).
Names that start with a percent sign, (like %0, %1, %n, and %for.cond) are either virtual register names (more about this later) or block labels.

You'll want to become comfortable with LLVM assembly (the human readable form of LLVM bitcode) because the first three projects that you will write for this class will accept LLVM bitcode as input and will emit LLVM bitcode as output.

Read About LLVM

Here are links to some LLVM documents that you may find useful during the semester:

Programmer's Manual
LLVM assembly language reference
Bugpoint, an LLVM debugging tool.
Writing an LLVM Pass. You will solve the remaining projects in this class by writing various LLVM passes. Ignore what this documentation says about Setting up the build environment.
LLVM API Documentation
LLVM files Most of LLVM's function-level documentation is in comments in header files (.h files). This link provides links to each of those files.

Getting Started with LULESH

LULESH (Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics) is the benchmark code you will be using in PA1. Here are the steps for building and executing LULESH.

Download the tar ball at https://codesign.llnl.gov/lulesh/lulesh2.0.3.tgz.

The tar ball untars into separate files instead of a directory. We recommend the following sequence:

  cd ~cs553/cs553x    // change into your cs553 directory
  mkdir lulesh2.0.3
  cd lulesh2.0.3
  tar xzvf lulesh2.0.3.tgz

Edit the Makefile to do the OpenMP compile, but not the MPI compile.

  MPICXX = mpig++ -DUSE_MPI=1
  #CXX = $(MPICXX)
  CXX = $(SERCXX)

Build and execute the program.

  make
  ./lulesh2.0

// You should see the following.
Running problem size 30^3 per domain until completion
Num processors: 1
Num threads: 16
Total number of elements: 27000

To run other sizes, use -s <integer>.
To run a fixed number of iterations, use -i <integer>.
To run a more or less balanced region set, use -b <integer>.
To change the relative costs of regions, use -c <integer>.
To print out progress, use -p
To write an output file for VisIt, use -v
See help (-h) for more options

Run completed:  
   Problem size        =  30 
   MPI tasks           =  1 
   Iteration count     =  932 
   Final Origin Energy = 2.025075e+05 
   Testing Plane 0 of Energy Array on rank 0:
        MaxAbsDiff   = 6.548362e-11
        TotalAbsDiff = 8.615093e-10
        MaxRelDiff   = 1.461140e-12


Elapsed time         =       7.28 (s)
Grind time (us/z/c)  = 0.28938714 (per dom)  (0.28938714 overall)
FOM                  =  3455.5786 (z/s)

The Assignment

You will be submitting a programming assignment writeup using latex. See the assignments page for the latex template the shows how to use latex. Each subsection below indicates what you should put in your report about your exploration of llvm.

Visualizing the Code: Control Flow Graphs and Call Graphs

For this part of the assignment, you will learn about control-flow graphs and call graphs in programs and how to use the llvm executable opt to create them.

The command opt --help shows the analyses and program transformations available. Use the following commands to create a pdf figure of the call graph of a program:

    // NOTE that opt needs an llvm bitcode file as input.
    opt -dot-callgraph foo.bc -o foo.bc
    
    more callgraph.dot   // see what the dot format looks like
    dot -Tpdf -o callgraph.pdf callgraph.dot 
    gv callgraph.pdf

There is also a -dot-cfg option to create control flow graphs.

The main LULESH input file is lulesh.cc.

    clang -emit-llvm -O0 -c lulesh.cc -o lulesh.bc -DUSE_MPI=0

In your report include a figure with a control flow graph for a function that contains a loop. In the description of the figure in your report indicate which function is being displayed and indicate which line in the lulesh.cc file corresponds to one of the llvm assembly statements.

Experiment with Existing Passes

The command opt --help shows the analyses and program transformations available. Select two optimizations from the Optimizations available list, look them up in the llvm documentation, describe them in your own words in two sentences in your report, use them on LULESH, and report the time differences between LULESH with no optimization, opt -O3, and the 2 separate optimizations you selected.

Makefile.llvm creates lulesh2.0opt and lulesh2.0noopt executables when the command make -f Makefile.llvm is used. Edit the LLVMOPT definition in Makefile.llvm to try different optimization passes.

For more information about opt, see its documentation online.

Write a Peephole Optimization Pass: Unnecessary Loads

This section was written by Susan Horowitz and has been slightly edited.

For the second part of this project, you will implement a pass that finds and removes all unnecessary load instructions in each function. An instruction that loads a value from memory into a virtual register %k is unnecessary if the previous instruction in the same basic block stored a value v to the same memory location. You should find all such loads and replace all uses of %k with uses of v. Then you should remove the unnecessary load instruction.

For example, if the original code from llv-dis looks like this:


  store i32 12, i32* %x, align 4    // store the value 12 into the memory location pointed to by %x
  %0 = load i32* %x, align 4        // load the value in the memory location pointed to by %x into %0
  %add = add nsw i32 %0, 22         // set %add to be the value in %0 + 22
  store i32 %add, i32* %y, align 4  // store the value in %add into the memory location pointed to by %y
  %1 = load i32* %y, align 4        // load the value in the memory location pointed to by %y into %1
  %add1 = add nsw i32 %1, 33        // set %add1 to be the value in %1 + 33
  store i32 %add1, i32* %z, align 4 // store the value in %add1 into the memory location pointed to by %z

You would change it to the following:


  store i32 12, i32* %x, align 4     // store the value 12 into the memory location pointed to by %x
                                     // 1st unnecessary load was removed
  %add =  add nsw i32 12, 22         // set %add to be the value 12 + 22
  store i32 %add, i32* %y, align 4   // store the value in %add into the memory location pointed to by %y
                                     // 2nd unnecessary load was removed
  %add1 = add nsw i32 %add, 33       // set %add1 to be the value in %add + 33
  store i32 %add1, i32* %z, align 4  // store the value in %add1 into the memory location pointed to by %z

Note that you can get the above code from the following source code:

int main() {
  int x, y, z;

  x = 12;
  y = x + 22;  /* load value of x that was just stored */
  z = y + 33;  /* load value of y that was just stored */
}

Implement the rmLoads pass as a FunctionPass in the file PA1.cpp, run from opt using the -rmLoads flag. For example:

    clang -emit-llvm -O0 -c foo.c -o foo.bc
    opt -load LLVMPA1.so -rmLoads foo.bc -o foo.rmLoads
    mv foo.rmLoads foo.bc

NOTE that your PATH needs to be set properly to ensure you are using the opt executable that is loading the passes you are writing. There is an opt executable in /bin, but you want to see the following:

    > which opt
    /s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin/opt

with cs553x being your cs553 unix group and directory.

Here is what you should do to implement the rmLoads pass in llvm:

There is some LLVM documentation on how to write an LLVM pass. It will be helpful to read through that information.

Then we recommend copying and editing the two Hello passes that come with the llvm installation.

    cd llvm/lib/Transforms
    cp -r Hello PA1Passes // Copy an existing pass to create a PA1 pass directory.
    cd PA1Passes
    rm -rf Debug+Asserts/
    mv Hello.cpp PA1.cpp
    mv Hello.exports PA1.exports

Change everything specific to hello in PA1.cc to refer instead to rmLoads.
Edit two lines in the Makefile:
```
  LIBRARYNAME = LLVMPA1

  EXPORTED_SYMBOL_FILE = $(PROJ_SRC_DIR)/PA1.exports
```
Type make in the Transforms/PA1/ directory. This creates ../../../Debug+Asserts/lib/LLVMPA1.so, which at this point should include the rmLoads pass (a copy of hello) as well as the hello2 pass. To run the rmLoads pass use:
```
    opt -load LLVMPA1.so -rmLoads foo.bc -o foo.rmLoads
```
Now write the new runOnFunction code:
- Create a map (you can use a DenseMap, or a C++ STL map, or you can define your own Map class) that maps each instruction in the function to a unique integer starting with 1. Do this by iterating over all instructions in the function and mapping each to the next integer value. (See the LLVM documentation about inspection for how to do various kinds of iterations over the parts of a program.) Note:
  - Every instruction in the program should be mapped to a unique value, not just every instruction within a function.
  - The LLVM documentation says that a Function Pass should not maintain state across functions. That would make it very difficult to implement the map from instructions to unique integers. However, it doesn't seem to matter if you violate this; e.g., it works fine to have a class variable that keeps track of the current instruction number. Initialize the instruction counter to 1 for grading purposes.
- Iterate over all basic blocks in the function, and all instructions in each basic block. Look for an instruction that stores a value v to the memory location pointed to by virtual register %m, immediately followed by an instruction in the same basic block that loads from the location pointed to by %m into register %k. The second instruction (the load) is unnecessary. Note: The opcode for a load instruction is Instruction::Load, and the opcode for a store instruction is Instruction::Store. (Instruction.h has methods for getting the opcode and the opcode name, and the User class has methods for getting the number of operands and the operands themselves. Use the hasName and getName methods of the Value class to see whether an operand has a name and if so to get that name.)
- Print (to stderr)
```
      Instruction n is a useless load
```
  where n is the number of the instruction that is the useless load, retrieved from your instruction map (which probably will not be the same as the target virtual register you'll see for that instruction if you look at the output of llvm-dis). This output is for grading.
- Replace ALL uses of %k with a use of v. The Value class includes a replaceAllUsesWith method that you can use.
- Save the (address of the) unnecessary load instruction so that you can remove it when you're done iterating over all basic blocks in the current function (you will mess up the iteration if you remove it now). The Instruction class has an eraseFromParent method that you can use. The documentation is here.
- Change the function to return true or false depending on whether you actually made a change.
Finally, modify the getAnalysisUsage method by giving it an empty body (since your rmLoads pass does modify the program), and change the call to RegisterPass to the following: RegisterPass X("rmLoads", "removing unnecessary loads", false, false);

Once you have implemented and tested the rmLoads pass, apply it to the LULESH executable and put the results in your PA1 report.

Write an Instrumentation Pass: Dynamic Counts of Loads

For this part of PA1 you will implement a -countLoads pass and then put a section in your report that discusses how the dynamic count of loads in LULESH changed after your useless loads pass was performed. Also compare with the dynamic count of loads after -O3 optimization is performed.

There are two main ways of doing this discussed on the internet: (1) write a load counting function in C, link the LLVM for that C code with your original program, and then run a pass that places calls to your load counting function each time you see a load (See Arnamoy's Blog about this), and (2) determine how to add a function into the LLVM representation directly as shown in LLVM Tutorial 1: A First Function and then place calls to your load counting function you just added each time you see a load.

Here we will discuss approach (1). You can use either approach as long as at the end of the execution of the main function in the input code, the output looks as follows:

# Load Instructions

where # is the number of load instructions dynamically executed in the input program.

First read the Blog post and try out the example provided as a CountLoads pass in your PA1Passes/PA1.cpp file.

Then I recommend you create a countLoads function similar to the print function in the blog example, but it will have a static local variable to count the number of loads that have occurred and will take a flag parameter indicating whether the number of loads should be printed. See a Fibonacci example for how to create an ConstantInt and how to pass a parameter into a CallInst::Create call. You will have to learn more about getOrInsertFunction to figure out how to find a function that has a different type signature than the example in the blog posting.

You will also want to modify your pass so that it does not put recursive calls to countLoads in the function itself.


    if (! f->getName().equals(StringRef("countLoads")) ) {
        CountLoads::runOnBasicBlock(BB);

You will want to pass a parameter like the value 1 for example into the counting function only in exit basic blocks in the main function. Checking the function name and use the getTerminator() and getNumSuccessors() methods will help determine when the total count should be printed.

Test your results with the following sequence of commands:

    clang -emit-llvm -O0 -c countLoads.c -o countLoads.bc
    llvm-link countLoads.bc infile.bc -S -o=countLoadsinfile.bc
    opt -load LLVMPA1.so -countLoads countLoadsinfile.bc > countLoadsinfileinst.bc
    lli countLoadsinfileinst.bc

If the interpretation of the new bitcode doesn't work, then do an llvm-dis on the file and figure out issues.

Submitting the Assignment

Create a pdf file of your report and submit it via RamCT. You will also need to submit your PA1.cpp file.

The late policy in the syllabus applies. However, if you show a revision control log and versions of PA1.cpp and your report satisfying the following constraints, then you can have 2 extra days to complete your assignment with no late penalty.

For the PA1Passes/ subdirectory with a working rmLoads pass by September 1st and daily progress on the loadCount instrumentation pass.
For the report, which you might want to save in PA1Passes/ just to simplify revision control, it needs to be in latex and be generating a pdf that reports on all aspects of the project except for the dynamic load counts. You would have to be willing to submit all of the report except for dynamic load counts on September 1st.

You can use any revision control tool with which you are familiar, but if you use something like github make sure your code is not publicly available. If you would like to use subversion, you can create a repository, checkout a working directory of that repository, and use the working directory as follows:

    cd ~cs553/cs553x
    mkdir SVNRepositories    // only need to do this once for the semester
    cd SVNRepositories
    svnadmin create PA1Passes  // creates repository for PA1
    cd ~cs553/cs553x/llvm/lib/Transforms
    mv PA1Passes PA1Passes.bak   // save what you have already done
    
    // This command could be used to check out the code anywhere.
    // 9/1/14: this was wrong and now is fixed
    svn co svn+ssh://austin.cs.colostate.edu//s/bach/b/class/cs553/cs553x/SVNRepositories/PA1Passes

Next copy files from PA1Passes.bak/ into PA1Passes/, add those files to the repository, and then commit them.


    cd PA1Passes
    cp ../PA1Passes.bak/PA1.cc .
    svn add PA1.cc
    svn commit          // When you do commits provide yourself a useful message.
    ...

mstrout@cs.colostate.edu, 9/3/14