countLoads.c
file in with lulesh.cc.
See the posting in RamCT for more details.
Compiler infrastructures enable the analysis and transformation of programs. The goal of this assignment is to learn about compiler infrastructures in general and LLVM in particular. We recommend you read the full assignment through before starting to work through it.
The amount of programming needed for this assignment is on the order of 100 lines of code. The challenge will be learning how to write passes in the LLVM compiler infrastructure. If you get stuck, send me an email or stop by my office to ask what might be going on. I have already implemented the passes for PA1.
You will be assigned a unix group such as cs553b, cs553c, etc. Log into a department workstation and then do the following:
cd ~cs553/cs553x // where X is your unix group letter cp ../public/llvm-rev215820.tgz . tar xvzf llvm-rev215820.tgz cd llvm ./configure --disable-optimized make -j 8 // this build will take about 30 minutes
Once the build is done, there should be a Debug+Asserts/ subdirectory. It contains bin/ and lib/ directories. Set your path so that you can use the executables in the bin/ subdirectory. Make sure to change cs553x to your cs553 group.
// in tcsh or csh setenv PATH /s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin:$PATH // in bash or sh export PATH=/s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin:$PATH
LLVM is composed of many separate pieces. To use LLVM to turn a C source file into an x86 (or x86_64) executable, we need to:
Just as C files use the extension .c
, LLVM bitcode
uses the extension .bc
, assembly uses the
extension .s
, and LLVM human-readable assembly uses the
extension .ll
.
Suppose we have a C program in the file foo.c
. Below
are the steps needed to create an executable (with no optimization):
clang -emit-llvm -O0 -c foo.c -o foo.bc // create bitcode .bc
llc foo.bc // create assembly .s
gcc foo.s -o foo // create executable "foo"
To run your program:
./foo
To turn LLVM bitcode into human-readable LLVM assembly
(foo.ll
):
llvm-dis -f foo.bc
Let's get a little more familiar with LLVM's instructions. Consider the
following C program, sum.c
:
#include <stdio.h>
int main() {
int n;
int sum;
sum = 0;
for (n = 0; n < 10; n++)
sum = sum + n*n;
printf("sum: %d\n", sum);
}
Running clang
and llvm-dis
produces
the following LLVM assembly code (on one of the "capital" machines):
; ModuleID = 'sum.bc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@.str = private unnamed_addr constant [9 x i8] c"sum: %d\0A\00", align 1
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
entry:
%retval = alloca i32, align 4
%n = alloca i32, align 4
%sum = alloca i32, align 4
store i32 0, i32* %retval
store i32 0, i32* %sum, align 4
store i32 0, i32* %n, align 4
br label %for.cond
for.cond: ; preds = %for.inc, %entry
%0 = load i32* %n, align 4
%cmp = icmp slt i32 %0, 10
br i1 %cmp, label %for.body, label %for.end
for.body: ; preds = %for.cond
%1 = load i32* %sum, align 4
%2 = load i32* %n, align 4
%3 = load i32* %n, align 4
%mul = mul nsw i32 %2, %3
%add = add nsw i32 %1, %mul
store i32 %add, i32* %sum, align 4
br label %for.inc
for.inc: ; preds = %for.body
%4 = load i32* %n, align 4
%inc = add nsw i32 %4, 1
store i32 %inc, i32* %n, align 4
br label %for.cond
for.end: ; preds = %for.cond
%5 = load i32* %sum, align 4
%call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i32 0, i32 0), i32
%5)
%6 = load i32* %retval
ret i32 %6
}
declare i32 @printf(i8*, ...) #1
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-fra
me-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-
size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-n
on-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe
-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.6.0 (trunk)"}
Some remarks about this assembly code:
;
is a comment.entry
,
for.cond
,
for.body
,
for.inc
, and
for.end
,
entry
and for.cond
blocks,
as well as the for.body
and for.inc
blocks are each really a single basic
block (the first block ends with an unconditional branch to the
second block).%0
, %1
, %n
,
and %for.cond
)
are either virtual register names (more about this later)
or block labels.
You'll want to become comfortable with LLVM assembly (the human readable form of LLVM bitcode) because the first three projects that you will write for this class will accept LLVM bitcode as input and will emit LLVM bitcode as output.
Here are links to some LLVM documents that you may find useful during the semester:
cd ~cs553/cs553x // change into your cs553 directory mkdir lulesh2.0.3 cd lulesh2.0.3 tar xzvf lulesh2.0.3.tgz
MPICXX = mpig++ -DUSE_MPI=1 #CXX = $(MPICXX) CXX = $(SERCXX)
make ./lulesh2.0 // You should see the following. Running problem size 30^3 per domain until completion Num processors: 1 Num threads: 16 Total number of elements: 27000 To run other sizes, use -s <integer>. To run a fixed number of iterations, use -i <integer>. To run a more or less balanced region set, use -b <integer>. To change the relative costs of regions, use -c <integer>. To print out progress, use -p To write an output file for VisIt, use -v See help (-h) for more options Run completed: Problem size = 30 MPI tasks = 1 Iteration count = 932 Final Origin Energy = 2.025075e+05 Testing Plane 0 of Energy Array on rank 0: MaxAbsDiff = 6.548362e-11 TotalAbsDiff = 8.615093e-10 MaxRelDiff = 1.461140e-12 Elapsed time = 7.28 (s) Grind time (us/z/c) = 0.28938714 (per dom) (0.28938714 overall) FOM = 3455.5786 (z/s)
For this part of the assignment, you will learn about control-flow
graphs and call graphs in programs and how to use
the llvm executable opt
to create them.
The command opt --help
shows the analyses and program
transformations available.
Use the following commands to create a pdf figure of
the call graph of a program:
// NOTE that opt needs an llvm bitcode file as input. opt -dot-callgraph foo.bc -o foo.bc more callgraph.dot // see what the dot format looks like dot -Tpdf -o callgraph.pdf callgraph.dot gv callgraph.pdfThere is also a
-dot-cfg
option to create control flow graphs.
The main LULESH input file is lulesh.cc.
clang -emit-llvm -O0 -c lulesh.cc -o lulesh.bc -DUSE_MPI=0In your report include a figure with a control flow graph for a function that contains a loop. In the description of the figure in your report indicate which function is being displayed and indicate which line in the lulesh.cc file corresponds to one of the llvm assembly statements.
The command opt --help
shows the analyses and program
transformations available.
Select two optimizations from the Optimizations available
list, look them up in the llvm documentation, describe them
in your own words in two sentences in your report,
use them on LULESH, and report the time differences between
LULESH with no optimization, opt -O3
, and the 2
separate optimizations you selected.
Makefile.llvm creates lulesh2.0opt
and
lulesh2.0noopt
executables when the command
make -f Makefile.llvm
is used. Edit the LLVMOPT
definition in Makefile.llvm to try different optimization passes.
For more information about opt
, see its
documentation online.
For the second part of this project, you will implement a pass that
finds and removes all unnecessary load instructions in each function.
An instruction that loads a value from memory into a
virtual register %k
is unnecessary if the
previous instruction in the same basic block stored a value v
to the same memory location.
You should find all such loads and replace all uses
of %k
with uses of v
.
Then you should remove the unnecessary load instruction.
For example, if the original code from llv-dis looks like this:
store i32 12, i32* %x, align 4 // store the value 12 into the memory location pointed to by %x
%0 = load i32* %x, align 4 // load the value in the memory location pointed to by %x into %0
%add = add nsw i32 %0, 22 // set %add to be the value in %0 + 22
store i32 %add, i32* %y, align 4 // store the value in %add into the memory location pointed to by %y
%1 = load i32* %y, align 4 // load the value in the memory location pointed to by %y into %1
%add1 = add nsw i32 %1, 33 // set %add1 to be the value in %1 + 33
store i32 %add1, i32* %z, align 4 // store the value in %add1 into the memory location pointed to by %z
You would change it to the following:
store i32 12, i32* %x, align 4 // store the value 12 into the memory location pointed to by %x
// 1st unnecessary load was removed
%add = add nsw i32 12, 22 // set %add to be the value 12 + 22
store i32 %add, i32* %y, align 4 // store the value in %add into the memory location pointed to by %y
// 2nd unnecessary load was removed
%add1 = add nsw i32 %add, 33 // set %add1 to be the value in %add + 33
store i32 %add1, i32* %z, align 4 // store the value in %add1 into the memory location pointed to by %z
Note that you can get the above code from the following source code:
int main() { int x, y, z; x = 12; y = x + 22; /* load value of x that was just stored */ z = y + 33; /* load value of y that was just stored */ }
Implement the rmLoads
pass as a FunctionPass
in the file PA1.cpp
, run from opt
using the -rmLoads
flag.
For example:
clang -emit-llvm -O0 -c foo.c -o foo.bc opt -load LLVMPA1.so -rmLoads foo.bc -o foo.rmLoads mv foo.rmLoads foo.bcNOTE that your PATH needs to be set properly to ensure you are using the
opt
executable
that is loading the passes you are writing.
There is an opt
executable in /bin
, but you want to see the following:
> which opt /s/bach/b/class/cs553/cs553x/llvm/Debug+Asserts/bin/optwith
cs553x
being your cs553 unix group and directory.
Here is what you should do to implement the rmLoads
pass
in llvm:
There is some LLVM documentation on how to write an LLVM pass. It will be helpful to read through that information.
Then we recommend copying and editing the two Hello passes that come with the llvm installation.
cd llvm/lib/Transforms cp -r Hello PA1Passes // Copy an existing pass to create a PA1 pass directory. cd PA1Passes rm -rf Debug+Asserts/ mv Hello.cpp PA1.cpp mv Hello.exports PA1.exports
Change everything specific to hello
in PA1.cc
to
refer instead to rmLoads
.
Edit two lines in the Makefile:
LIBRARYNAME = LLVMPA1 EXPORTED_SYMBOL_FILE = $(PROJ_SRC_DIR)/PA1.exportsType
make
in the Transforms/PA1/ directory.
This creates ../../../Debug+Asserts/lib/LLVMPA1.so
,
which at this point should include
the rmLoads
pass (a copy of hello)
as well as the hello2
pass.
To run the rmLoads
pass use:
opt -load LLVMPA1.so -rmLoads foo.bc -o foo.rmLoads
Now write the new runOnFunction
code:
Create a map (you can use a
DenseMap
, or a
C++ STL map,
or you can define your own Map
class)
that maps each instruction in the function to a unique integer
starting with 1.
Do this by iterating over all instructions in the
function and mapping each to the next integer value.
(See the LLVM documentation about inspection
for how to do various kinds of iterations over the parts of a program.)
Note:
Iterate over all basic blocks in the function, and all
instructions in each basic block.
Look for an instruction that stores a value
v
to the memory location pointed to by virtual
register %m
,
immediately followed by an instruction in the same basic block
that loads from the
location pointed to by %m
into
register %k
.
The second instruction (the load) is unnecessary.
Note: The opcode for a load instruction is
Instruction::Load
, and the opcode for
a store instruction is Instruction::Store
.
(Instruction.h
has methods for getting the opcode and the opcode name, and
the User
class has methods for getting the number of operands and the
operands themselves.
Use the hasName
and getName
methods of the
Value
class to see whether an operand has a name and if so to get
that name.)
Print (to stderr
)
Instruction n is a useless loadwhere
n
is the number of the instruction that is
the useless load, retrieved from
your instruction map (which probably will not be the same as the
target virtual register you'll see for that instruction if you
look at the output of llvm-dis
). This output is for
grading.
Replace ALL uses of %k
with a use of
v
.
The
Value class includes a
replaceAllUsesWith
method that you can use.
Save the (address of the) unnecessary load instruction so
that you can remove it when you're done iterating over all
basic blocks in the current function (you will mess up the
iteration if you remove it now).
The
Instruction class has an eraseFromParent
method
that you can use.
The documentation is
here.
Change the function to return true
or
false
depending on whether you actually made
a change.
Finally, modify the getAnalysisUsage
method by
giving it an empty body (since your rmLoads
pass
does modify the program),
and change the call to RegisterPass
to the following:
RegisterPass
Once you have implemented and tested the rmLoads
pass,
apply it to the LULESH executable and put the results in your PA1 report.
For this part of PA1 you will implement a -countLoads
pass and then put a section in your report that discusses how
the dynamic count of loads in LULESH changed after your
useless loads pass was performed. Also
compare with the dynamic count of loads after -O3
optimization is performed.
There are two main ways of doing this discussed on the internet: (1) write a load counting function in C, link the LLVM for that C code with your original program, and then run a pass that places calls to your load counting function each time you see a load (See Arnamoy's Blog about this), and (2) determine how to add a function into the LLVM representation directly as shown in LLVM Tutorial 1: A First Function and then place calls to your load counting function you just added each time you see a load.
Here we will discuss approach (1). You can use either approach
as long as at the end of the execution of the main
function
in the input code, the output looks as follows:
# Load Instructionswhere # is the number of load instructions dynamically executed in the input program.
First read the Blog post and try out the example
provided as a CountLoads
pass in your PA1Passes/PA1.cpp
file.
Then I recommend you create a countLoads
function
similar to the print
function in the blog example, but
it will have a static local variable to count the number of loads
that have occurred and will take a flag parameter indicating whether the number
of loads should be printed.
See a Fibonacci example for how to create an ConstantInt
and how to pass a parameter into a
CallInst::Create
call.
You will have to learn more about getOrInsertFunction
to figure out how to find a function that has a different type signature than the example in the blog posting.
You will also want to modify your pass so that it does not put recursive
calls to countLoads
in the function itself.
if (! f->getName().equals(StringRef("countLoads")) ) {
CountLoads::runOnBasicBlock(BB);
You will want to pass a parameter like the value 1 for example into
the counting function only in exit basic blocks in the main function.
Checking the function name and use the getTerminator()
and getNumSuccessors()
methods will help determine
when the total count should be printed.
Test your results with the following sequence of commands:
clang -emit-llvm -O0 -c countLoads.c -o countLoads.bc llvm-link countLoads.bc infile.bc -S -o=countLoadsinfile.bc opt -load LLVMPA1.so -countLoads countLoadsinfile.bc > countLoadsinfileinst.bc lli countLoadsinfileinst.bcIf the interpretation of the new bitcode doesn't work, then do an
llvm-dis
on the file and figure out issues.
Create a pdf file of your report and submit it via RamCT.
You will also need to submit your PA1.cpp
file.
The late policy in the syllabus applies. However, if you show a revision control log and versions of PA1.cpp and your report satisfying the following constraints, then you can have 2 extra days to complete your assignment with no late penalty.
PA1Passes/
subdirectory
with a working rmLoads
pass by September 1st and
daily progress on the loadCount
instrumentation pass.
PA1Passes/
just to simplify revision control, it needs to be in latex and be generating
a pdf that reports on all aspects of the project except for the dynamic load
counts. You would have to be willing to submit all of the report
except for dynamic load counts on September 1st.
You can use any revision control tool with which you are familiar, but
if you use something like github make sure your code is not publicly available.
If you would like to use subversion, you can create a repository,
checkout a working directory of that repository, and use the
working directory as follows:
Next copy files from
cd ~cs553/cs553x
mkdir SVNRepositories // only need to do this once for the semester
cd SVNRepositories
svnadmin create PA1Passes // creates repository for PA1
cd ~cs553/cs553x/llvm/lib/Transforms
mv PA1Passes PA1Passes.bak // save what you have already done
// This command could be used to check out the code anywhere.
// 9/1/14: this was wrong and now is fixed
svn co svn+ssh://austin.cs.colostate.edu//s/bach/b/class/cs553/cs553x/SVNRepositories/PA1Passes
PA1Passes.bak/
into PA1Passes/
,
add those files to the repository, and then commit them.
cd PA1Passes
cp ../PA1Passes.bak/PA1.cc .
svn add PA1.cc
svn commit // When you do commits provide yourself a useful message.
...