CS 553 Programming Assignment #1 — LLVM introduction including CFGs, call graphs, and peephole optimization

(Due Monday September 8th at 2:30pm)
(9/7/14 update): Use Makefile.countLoads to link your countLoads.c file in with lulesh.cc. See the posting in RamCT for more details.

Introduction

Compiler infrastructures enable the analysis and transformation of programs. The goal of this assignment is to learn about compiler infrastructures in general and LLVM in particular. We recommend you read the full assignment through before starting to work through it.

The amount of programming needed for this assignment is on the order of 100 lines of code. The challenge will be learning how to write passes in the LLVM compiler infrastructure. If you get stuck, send me an email or stop by my office to ask what might be going on. I have already implemented the passes for PA1.

Installing LLVM

When writing passes for LLVM, it is easiest to install your own version of it. Here we have instructions for how to do the installation on the CS department linux machines. You are welcome to install llvm on your own machine, but we recommend you start from the same tar ball that the rest of the class will be using. Also, if you have platform specific installation issues, we won't necessarily be able to help.

You will be assigned a unix group such as cs553b, cs553c, etc. Log into a department workstation and then do the following:

  cd ~cs553/cs553x    // where X is your unix group letter
  cp ../public/llvm-rev215820.tgz .
  tar xvzf llvm-rev215820.tgz
  cd llvm
  ./configure --disable-optimized
  make -j 8               // this build will take about 30 minutes

Once the build is done, there should be a Debug+Asserts/ subdirectory. It contains bin/ and lib/ directories. Set your path so that you can use the executables in the bin/ subdirectory. Make sure to change cs553x to your cs553 group.

  // in tcsh or csh
  setenv PATH /s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin:$PATH
  
  // in bash or sh
  export PATH=/s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin:$PATH

Becoming Familiar with LLVM

This section was written by Susan Horowitz.

LLVM is composed of many separate pieces. To use LLVM to turn a C source file into an x86 (or x86_64) executable, we need to:

Just as C files use the extension .c, LLVM bitcode uses the extension .bc, assembly uses the extension .s, and LLVM human-readable assembly uses the extension .ll.

Suppose we have a C program in the file foo.c. Below are the steps needed to create an executable (with no optimization):


    clang -emit-llvm -O0 -c foo.c -o foo.bc    // create bitcode .bc
    llc foo.bc                                 // create assembly .s
    gcc foo.s -o foo                           // create executable "foo"

To run your program:

    ./foo

To turn LLVM bitcode into human-readable LLVM assembly (foo.ll):

    llvm-dis -f foo.bc

LLVM Instructions

Let's get a little more familiar with LLVM's instructions. Consider the following C program, sum.c:

#include <stdio.h>

int main() {
  int n;
  int sum;
  sum = 0;
  for (n = 0; n < 10; n++)
    sum = sum + n*n;
  printf("sum: %d\n", sum);
}

Running clang and llvm-dis produces the following LLVM assembly code (on one of the "capital" machines):


; ModuleID = 'sum.bc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [9 x i8] c"sum: %d\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %n = alloca i32, align 4
  %sum = alloca i32, align 4
  store i32 0, i32* %retval
  store i32 0, i32* %sum, align 4
  store i32 0, i32* %n, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %0 = load i32* %n, align 4
  %cmp = icmp slt i32 %0, 10
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %1 = load i32* %sum, align 4
  %2 = load i32* %n, align 4
  %3 = load i32* %n, align 4
  %mul = mul nsw i32 %2, %3
  %add = add nsw i32 %1, %mul
  store i32 %add, i32* %sum, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %4 = load i32* %n, align 4
  %inc = add nsw i32 %4, 1
  store i32 %inc, i32* %n, align 4
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %5 = load i32* %sum, align 4
  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i32 0, i32 0), i32
 %5)
  %6 = load i32* %retval
  ret i32 %6
}

declare i32 @printf(i8*, ...) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-fra
me-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-
size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-n
on-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe
-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.6.0 (trunk)"}

Some remarks about this assembly code:

You'll want to become comfortable with LLVM assembly (the human readable form of LLVM bitcode) because the first three projects that you will write for this class will accept LLVM bitcode as input and will emit LLVM bitcode as output.

Read About LLVM

Here are links to some LLVM documents that you may find useful during the semester:

Getting Started with LULESH

LULESH (Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics) is the benchmark code you will be using in PA1. Here are the steps for building and executing LULESH.

The Assignment

You will be submitting a programming assignment writeup using latex. See the assignments page for the latex template the shows how to use latex. Each subsection below indicates what you should put in your report about your exploration of llvm.

Visualizing the Code: Control Flow Graphs and Call Graphs

For this part of the assignment, you will learn about control-flow graphs and call graphs in programs and how to use the llvm executable opt to create them.

The command opt --help shows the analyses and program transformations available. Use the following commands to create a pdf figure of the call graph of a program:

    // NOTE that opt needs an llvm bitcode file as input.
    opt -dot-callgraph foo.bc -o foo.bc
    
    more callgraph.dot   // see what the dot format looks like
    dot -Tpdf -o callgraph.pdf callgraph.dot 
    gv callgraph.pdf
There is also a -dot-cfg option to create control flow graphs.

The main LULESH input file is lulesh.cc.

    clang -emit-llvm -O0 -c lulesh.cc -o lulesh.bc -DUSE_MPI=0
In your report include a figure with a control flow graph for a function that contains a loop. In the description of the figure in your report indicate which function is being displayed and indicate which line in the lulesh.cc file corresponds to one of the llvm assembly statements.

Experiment with Existing Passes

The command opt --help shows the analyses and program transformations available. Select two optimizations from the Optimizations available list, look them up in the llvm documentation, describe them in your own words in two sentences in your report, use them on LULESH, and report the time differences between LULESH with no optimization, opt -O3, and the 2 separate optimizations you selected.

Makefile.llvm creates lulesh2.0opt and lulesh2.0noopt executables when the command make -f Makefile.llvm is used. Edit the LLVMOPT definition in Makefile.llvm to try different optimization passes.

For more information about opt, see its documentation online.

Write a Peephole Optimization Pass: Unnecessary Loads

This section was written by Susan Horowitz and has been slightly edited.

For the second part of this project, you will implement a pass that finds and removes all unnecessary load instructions in each function. An instruction that loads a value from memory into a virtual register %k is unnecessary if the previous instruction in the same basic block stored a value v to the same memory location. You should find all such loads and replace all uses of %k with uses of v. Then you should remove the unnecessary load instruction.

For example, if the original code from llv-dis looks like this:


  store i32 12, i32* %x, align 4    // store the value 12 into the memory location pointed to by %x
  %0 = load i32* %x, align 4        // load the value in the memory location pointed to by %x into %0
  %add = add nsw i32 %0, 22         // set %add to be the value in %0 + 22
  store i32 %add, i32* %y, align 4  // store the value in %add into the memory location pointed to by %y
  %1 = load i32* %y, align 4        // load the value in the memory location pointed to by %y into %1
  %add1 = add nsw i32 %1, 33        // set %add1 to be the value in %1 + 33
  store i32 %add1, i32* %z, align 4 // store the value in %add1 into the memory location pointed to by %z

You would change it to the following:


  store i32 12, i32* %x, align 4     // store the value 12 into the memory location pointed to by %x
                                     // 1st unnecessary load was removed
  %add =  add nsw i32 12, 22         // set %add to be the value 12 + 22
  store i32 %add, i32* %y, align 4   // store the value in %add into the memory location pointed to by %y
                                     // 2nd unnecessary load was removed
  %add1 = add nsw i32 %add, 33       // set %add1 to be the value in %add + 33
  store i32 %add1, i32* %z, align 4  // store the value in %add1 into the memory location pointed to by %z

Note that you can get the above code from the following source code:

int main() {
  int x, y, z;

  x = 12;
  y = x + 22;  /* load value of x that was just stored */
  z = y + 33;  /* load value of y that was just stored */
}

Implement the rmLoads pass as a FunctionPass in the file PA1.cpp, run from opt using the -rmLoads flag. For example:

    clang -emit-llvm -O0 -c foo.c -o foo.bc
    opt -load LLVMPA1.so -rmLoads foo.bc -o foo.rmLoads
    mv foo.rmLoads foo.bc
  
NOTE that your PATH needs to be set properly to ensure you are using the opt executable that is loading the passes you are writing. There is an opt executable in /bin, but you want to see the following:
    > which opt
    /s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin/opt
  
with cs553x being your cs553 unix group and directory.

Here is what you should do to implement the rmLoads pass in llvm:
  1. There is some LLVM documentation on how to write an LLVM pass. It will be helpful to read through that information.

  2. Then we recommend copying and editing the two Hello passes that come with the llvm installation.

        cd llvm/lib/Transforms
        cp -r Hello PA1Passes // Copy an existing pass to create a PA1 pass directory.
        cd PA1Passes
        rm -rf Debug+Asserts/
        mv Hello.cpp PA1.cpp
        mv Hello.exports PA1.exports
    

  3. Change everything specific to hello in PA1.cc to refer instead to rmLoads.

  4. Edit two lines in the Makefile:

      LIBRARYNAME = LLVMPA1
    
      EXPORTED_SYMBOL_FILE = $(PROJ_SRC_DIR)/PA1.exports
    
    Type make in the Transforms/PA1/ directory. This creates ../../../Debug+Asserts/lib/LLVMPA1.so, which at this point should include the rmLoads pass (a copy of hello) as well as the hello2 pass. To run the rmLoads pass use:

        opt -load LLVMPA1.so -rmLoads foo.bc -o foo.rmLoads

  5. Now write the new runOnFunction code:

  6. Finally, modify the getAnalysisUsage method by giving it an empty body (since your rmLoads pass does modify the program), and change the call to RegisterPass to the following: RegisterPass X("rmLoads", "removing unnecessary loads", false, false);

Once you have implemented and tested the rmLoads pass, apply it to the LULESH executable and put the results in your PA1 report.

Write an Instrumentation Pass: Dynamic Counts of Loads

For this part of PA1 you will implement a -countLoads pass and then put a section in your report that discusses how the dynamic count of loads in LULESH changed after your useless loads pass was performed. Also compare with the dynamic count of loads after -O3 optimization is performed.

There are two main ways of doing this discussed on the internet: (1) write a load counting function in C, link the LLVM for that C code with your original program, and then run a pass that places calls to your load counting function each time you see a load (See Arnamoy's Blog about this), and (2) determine how to add a function into the LLVM representation directly as shown in LLVM Tutorial 1: A First Function and then place calls to your load counting function you just added each time you see a load.

Here we will discuss approach (1). You can use either approach as long as at the end of the execution of the main function in the input code, the output looks as follows:

# Load Instructions
where # is the number of load instructions dynamically executed in the input program.

First read the Blog post and try out the example provided as a CountLoads pass in your PA1Passes/PA1.cpp file.

Then I recommend you create a countLoads function similar to the print function in the blog example, but it will have a static local variable to count the number of loads that have occurred and will take a flag parameter indicating whether the number of loads should be printed. See a Fibonacci example for how to create an ConstantInt and how to pass a parameter into a CallInst::Create call. You will have to learn more about getOrInsertFunction to figure out how to find a function that has a different type signature than the example in the blog posting.

You will also want to modify your pass so that it does not put recursive calls to countLoads in the function itself.


    if (! f->getName().equals(StringRef("countLoads")) ) {
        CountLoads::runOnBasicBlock(BB);

You will want to pass a parameter like the value 1 for example into the counting function only in exit basic blocks in the main function. Checking the function name and use the getTerminator() and getNumSuccessors() methods will help determine when the total count should be printed.

Test your results with the following sequence of commands:

    clang -emit-llvm -O0 -c countLoads.c -o countLoads.bc
    llvm-link countLoads.bc infile.bc -S -o=countLoadsinfile.bc
    opt -load LLVMPA1.so -countLoads countLoadsinfile.bc > countLoadsinfileinst.bc
    lli countLoadsinfileinst.bc
If the interpretation of the new bitcode doesn't work, then do an llvm-dis on the file and figure out issues.

Submitting the Assignment

Create a pdf file of your report and submit it via RamCT. You will also need to submit your PA1.cpp file.

The late policy in the syllabus applies. However, if you show a revision control log and versions of PA1.cpp and your report satisfying the following constraints, then you can have 2 extra days to complete your assignment with no late penalty.

You can use any revision control tool with which you are familiar, but if you use something like github make sure your code is not publicly available. If you would like to use subversion, you can create a repository, checkout a working directory of that repository, and use the working directory as follows:

    cd ~cs553/cs553x
    mkdir SVNRepositories    // only need to do this once for the semester
    cd SVNRepositories
    svnadmin create PA1Passes  // creates repository for PA1
    cd ~cs553/cs553x/llvm/lib/Transforms
    mv PA1Passes PA1Passes.bak   // save what you have already done
    
    // This command could be used to check out the code anywhere.
    // 9/1/14: this was wrong and now is fixed
    svn co svn+ssh://austin.cs.colostate.edu//s/bach/b/class/cs553/cs553x/SVNRepositories/PA1Passes 
Next copy files from PA1Passes.bak/ into PA1Passes/, add those files to the repository, and then commit them.
    cd PA1Passes
    cp ../PA1Passes.bak/PA1.cc .
    svn add PA1.cc
    svn commit          // When you do commits provide yourself a useful message.
    ...


mstrout@cs.colostate.edu, 9/3/14