Understanding the LC3 Runtime Stack

Stack Frames

A stack frame (aka activation record) contains all of the data required by a function (aka method, subroutine) during its execution. At a minimum, it includes:

space for the parameters
space to hold the return address/value and other book keeping value(s)
space for the local variables

The stack frame is created before function execution begins, and deleted after the function execution is complete. As such, it satisfies the requirement of only allocating space during the execution of the routine. It also handles recursion nicely, as a new stack frame is created for every call. All "data" is physically grouped together which aids caching.

Conceptually, the stack frame is a C struct. A struct is a complex data type declaration the defines a physically grouped list of variables placed together in a contiguous block of memory. (See Wikipedia for additional information.) Given a pointer to a struct, the individual elements are referenced with the notation ptr→field.

Conceptual Runtime Stack

The stack frames are manged as a stack of stack frames. The stack frame of the currently executing function is on the top of the stack. Its caller is in the next stack frame. Its caller is in the following stack frame and so on. A new stack frame is pushed on the stack when the function begins, and popped off when the function returns. This corresponds to what happens when a function is called. The currently executing function is suspended and the called function begins execution. When it terminates, execution resumes in the calling function.

One way of implementing a stack is via a singly linked list. The head points to the top of the stack and each element of the stack contains a next pointer to successive elements.

So the entry to a function looks like:

  StackFrame* sf = malloc(sizeof(StackFrame));
  sf->next       = head;  // link it into the list
  head           = sf;    // reset the head (push)
                          // initialize parameters

And exit from the function looks like:

                             // grab return value (if present)
  StackFrame* sf = head;
  head           = sf->next; // reset the head (pop)
  free(sf);                  // clean up memory

In diagram form, the runtime stack looks like the following:

This represents three nested function calls, with the currently executing function (function3) at the top of the stack (and pointed to by head). Notice that you can access all of the fields of the stack frame (e.g. head→param1 or head→local1. And by following the next field you can access the fields of the function2 stack frame (e.g. head→next→local1), and so on.

While this diagram and the sample code is a good conceptual view of what is happening, there are several problems:

Stack frames differ in size, so how can the space be allocated?
The size of the stack frame for a function may differ from call to call. Think about printf(), which has a variable number of parameters.
Only the caller of a function knows the values passed as parameters. Hence the idea of call-by-value.
Only the function knows how many locals it requires.

So building the activation record requires cooperation between the caller/callee. And dynamically allocating space (i.e. malloc) is too much overhead for function calls.

The Hardware Stack

Some computers have special hardware devoted to the stack. However, many simply choose to use a portion of memory and use it as a stack. In this case, one need only maintain a pointer to the top of the stack. Allocating/deallocating space on the stack only requires moving the stack pointer (i.e. changing its value). Thus, it is fast and allows one to allocate/deallocate variable amounts of space easily and quickly. It also facilitates dividing the effort to create the activation record between the caller and callee.

In most cases, the stack is placed in high memory and grows downward towards lower memory addresses. Placing the stack in high memory and dynamic memory (i.e. malloc()) in lower memory, the two grow/shrink towards/away from each other and only if they meet are we "out of memory". Pushing values onto the stack involves decrementing the stack pointer, while popping values involves incrementing the stack pointer. A simple add/subtract is all that is required. Thus the allocation/deallocation of space is fast. The stack is referenced so often that a register of the CPU is dedicated to holding the current value. In the case of the LC3 computer, that register is R6 (SP).

Building the LC3 stack runtime stack

The stack frame is built as a cooperative effort by the calling function and the called function. Two registers are dedicated to managing the runtime stack. Register R6 (SP) is used as the stack pointer. Its value contains the address of the word at the top of the stack. Register R5 contains the "head" pointer of the linked list of stack frames and is referred to as the frame pointer (FP).

Since the caller know the values of the parameters, it passes them to the called function by pushing them onto the stack, one at a time. The values could be pushed either left to right, or right to left. The convention is to push the values right to left. As a result, the first parameter ends up at the top of the stack. Thus, the first parameter is at a known location regardless of how many parameters were actually passed. Think of the printf() function. The format parameter (the first) tells how many other parameters there are (count of the % specifiers). By putting it at the top of the stack, the printf() function knows where to find the format parameter and can thus find the others.

A function call begins by having the caller push its parameters right to left onto the stack. The result is that just before the JSR to the function, the stack looks like:

Then the code uses JSR to being execution of the function. Now it is time for the called function to take over. If the function has a return value, it must allocate some space to store it. By placing it next on the stack, upon return, the caller can get the value by simply popping it. It should also store the return address so that this function may call other functions. To reserve space for the return value, the function need only adjust the stack pointer (ADD R6,R6,#-1). The return address may be saved by PUSH R7. After these two steps, the stack looks like:

To complete the creation of the stack frame, two things are required. One is to make the new stack frame the new head of the list. The other is to allocate space for local variables. The first step requires saving head as the "next" pointer of the new stack frame This is accomplished by PUSH R5. Then, the head is reset. The head could point anywhere in the stack frame. Elements of the stack frame are accessed at fixed offsets from the frame pointer. Since LC3 instructions can have both positive and negative offsets, it make sense that the frame pointer points somewhere in the middle of the stack frame to handle the largest possible stack frame. The convention used is to have the head (the frame pointer) point to the first local variable.

Finally, space of the locals is allocated by decrementing the stack pointer. At this point, the stack looks like:

The stack with three nested function calls

Note how the three functions are contiguous on the stack. That is, the last parameter of the called function is adjacent to the last local of the calling function.

Unwinding the runtime stack

On completion of the function, the work of building the runtime stack must be undone. Again, it is a cooperative effort split between the called and calling functions. The work is done in the opposite order to the building of the stack. The called function cleans up its part, and after RET, the calling function cleans up its part. Note that after this work is complete, the memory used for the stack is not cleared. It still contains whatever was stored there. Logically, however, it is no longer part of the stack and will be overwritten on the next function call(s).

A summary of function call/return

There are a lot of steps involved in using the runtime stack. However, the steps are very mechanical and straightforward if you follow the steps precisely. Normally the code is created by a compiler.

caller pushes arguments right to left
caller executes JSR
callee makes space for return value (ADD R6,R6,#-1)
callee pushes return address (PUSH R7)
callee pushes old frame pointer (PUSH R5)
callee resets frame pointer (ADD R5, R6 #-1) This makes the frame pointer point to the first local. When the locals are removed in step 10 the old frame pointer is at the top of the stack and can be popped.
callee allocates locals by decrementing stack pointer(ADD R6, R6, #-numLocals)
callee does its work (includes saving return value STR Rx,R5,#3)
callee removes locals by incrementing stack pointer (ADD R6, R6, #numLocals)
callee restores old frame pointer (POP R5)
callee pops return address (POP R7)
callee executes RET
caller pops return value
caller removes parameters by incrementing stack pointer (ADD R6, R6, #numParams)
caller continues its execution

Notice the symmetry of the steps. The actions taken in the beginning to build the stack frame are done in the reverse order to remove the stack frame. See steps 7/9, 5&6/10, 4/11. Note that steps 2,3 and 13,14 are not symmetric. If they were, one would have the caller allocate space for the return value before the JSR. But, it is easier to put the code in the callee once, rather that before each call to the function each place it is called.

Takeaways

The runtime stack is a linked list of stack frames.
The elements of the stack frame are located at fixed offsets from the frame pointer for that frame, regardless of the actual address of the stack frame.
Allocating and deallocating the stack frame is a cooperative effort with some of the code in the caller and some of the code in the callee.
The stack frames occupy contiguous addresses in memory.