Stack Frames
A stack frame (aka activation record) contains all
of the data required by a function (aka method, subroutine) during
its execution. At a minimum, it includes:
-
space for the parameters
-
space to hold the return address/value and other book keeping value(s)
-
space for the local variables
The stack frame is created before function execution begins, and
deleted after the function execution is complete. As such, it satisfies the
requirement of only allocating space during the execution of the routine. It
also handles recursion nicely, as a new stack frame is created for
every call. All "data" is physically grouped together which aids caching.
Conceptually, the stack frame is a C struct.
A struct is a complex data type declaration the defines a
physically grouped list of variables placed together in a contiguous block of
memory. (See
Wikipedia
for additional information.) Given a pointer to a struct, the individual
elements are referenced with the notation ptr→field.
Conceptual Runtime Stack
The stack frames are manged as a stack of stack frames. The
stack frame of the currently executing
function is on the top of the stack. Its caller is in the next stack frame. Its
caller is in the following stack frame and so on. A new stack frame is
pushed on the stack when the function begins, and popped off when the
function returns. This corresponds to what happens when a function is called.
The currently executing function is suspended and the called function begins
execution. When it terminates, execution resumes in the calling function.
One way of implementing a stack is via a singly linked list.
The head points to the top of the stack and each element of the
stack contains a next pointer to successive elements.
So the entry to a function looks like:
StackFrame* sf = malloc(sizeof(StackFrame));
sf->next = head; // link it into the list
head = sf; // reset the head (push)
// initialize parameters
And exit from the function looks like:
// grab return value (if present) StackFrame* sf = head; head = sf->next; // reset the head (pop) free(sf); // clean up memory
In diagram form, the runtime stack looks like the following:
Consider a program that calls function1() and that function calls
function2(), which in turn calls function3().
This represents three nested function calls, with the currently executing
function (function3) at the top of the stack (and pointed to by head).
Notice that you can access all of the fields of the stack frame (e.g.
head→param1 or head→local1. And by following the next field you can
access the fields of the function2 stack frame (e.g. head→next→local1),
and so on.
While this diagram and the sample code is a good conceptual view of what is happening, there are several problems:
-
Stack frames differ in size, so how can the space be allocated?
-
The size of the stack frame for a function may differ from call to call. Think about
printf(), which has a variable number of parameters. -
Only the caller of a function knows the values passed as parameters. Hence the idea of call-by-value.
-
Only the function knows how many locals it requires.
So building the activation record requires cooperation between the
caller/callee. And dynamically allocating space (i.e. malloc)
is too much overhead for function calls.
The Hardware Stack
Some computers have special hardware devoted to the stack. However, many simply choose to use a portion of memory and use it as a stack. In this case, one need only maintain a pointer to the top of the stack. Allocating/deallocating space on the stack only requires moving the stack pointer (i.e. changing its value). Thus, it is fast and allows one to allocate/deallocate variable amounts of space easily and quickly. It also facilitates dividing the effort to create the activation record between the caller and callee.
In most cases, the stack is placed in high memory and grows downward
towards lower memory addresses. Placing the stack in high memory and dynamic
memory
(i.e. malloc()) in lower memory, the two grow/shrink towards/away from
each other and only if they meet are we "out of memory". Pushing values onto the
stack involves decrementing the stack pointer, while popping values involves
incrementing the stack pointer. A simple add/subtract is all that is required.
Thus the allocation/deallocation of space is fast. The stack is referenced
so often that a register of the CPU is dedicated to holding the current
value. In the case of the LC3 computer, that register is R6 (SP).
Building the LC3 stack runtime stack
The stack frame is built as a cooperative effort by
the calling function and the called function. Two registers are dedicated to
managing the runtime stack. Register R6 (SP) is used as the stack pointer.
Its value contains the address of the word at the top of the stack. Register
R5 contains the "head" pointer of the linked list of stack frames and is
referred to as the frame pointer (FP).
Since the caller know the values of the parameters, it passes them to
the called function by pushing them onto the stack, one at a time. The values
could be pushed either left to right, or right to left. The convention is to
push the values right to left. As a result, the first parameter ends
up at the top of the stack. Thus, the first parameter is at a known
location regardless of how many parameters were actually passed. Think of
the printf() function. The format parameter (the first)
tells how many other parameters there are (count of the % specifiers).
By putting it at the top of the stack, the printf() function knows
where to find the format parameter and can thus find the others.
A function call begins by having the caller push its parameters right
to left onto the stack. The result is that just before the JSR to the
function, the stack looks like:
Then the code uses JSR to being execution of the function. Now it is
time for the called function to take over. If the function has a return value,
it must allocate some space to store it. By placing it next on the stack, upon
return, the caller can get the value by simply popping it. It should also store
the return address so that this function may call other functions. To
reserve space for the return value, the function need only adjust the stack
pointer (ADD R6,R6,#-1). The return address may be saved by
PUSH R7. After these two steps, the stack looks like:
To complete the creation of the stack frame, two things are
required. One is to make the new stack frame the new
head of the list. The other is to allocate space for local variables. The first
step requires saving head as the "next" pointer of the new stack frame
This is accomplished
by PUSH R5. Then, the head is reset. The head could point anywhere
in the stack frame. Elements of the stack frame are accessed
at fixed offsets from the frame pointer. Since LC3 instructions can have both
positive and negative offsets, it make sense that the frame pointer points
somewhere in the middle of the stack frame to handle the largest possible
stack frame. The convention used is to have the head (the frame pointer) point
to the first local variable.
Finally, space of the locals is allocated by decrementing the stack pointer. At this point, the stack looks like:
The stack with three nested function calls
Note how the three functions are contiguous on the stack. That is, the last parameter of the called function is adjacent to the last local of the calling function.
Unwinding the runtime stack
On completion of the function, the work of building the runtime stack must be
undone. Again, it is a cooperative effort split between the called and calling
functions. The work is done in the opposite order to the building of the
stack. The called function cleans up its part, and after RET, the
calling function cleans up its part. Note that after this work is complete, the
memory used for the stack is not cleared. It still contains whatever was
stored there. Logically, however, it is no longer part of the stack and will be
overwritten on the next function call(s).
A summary of function call/return
There are a lot of steps involved in using the runtime stack. However, the steps are very mechanical and straightforward if you follow the steps precisely. Normally the code is created by a compiler.
-
caller pushes arguments right to left
-
caller executes
JSR -
callee makes space for return value (
ADD R6,R6,#-1) -
callee pushes return address (
PUSH R7) -
callee pushes old frame pointer (
PUSH R5) -
callee resets frame pointer (
ADD R5, R6 #-1) This makes the frame pointer point to the first local. When the locals are removed in step 10 the old frame pointer is at the top of the stack and can be popped. -
callee allocates locals by decrementing stack pointer(
ADD R6, R6, #-numLocals) -
callee does its work (includes saving return value
STR Rx,R5,#3) -
callee removes locals by incrementing stack pointer (
ADD R6, R6, #numLocals) -
callee restores old frame pointer (
POP R5) -
callee pops return address (
POP R7) -
callee executes
RET -
caller pops return value
-
caller removes parameters by incrementing stack pointer (
ADD R6, R6, #numParams) -
caller continues its execution
Notice the symmetry of the steps. The actions taken in the beginning to build
the stack frame are done in the reverse order to remove the stack frame.
See steps 7/9, 5&6/10, 4/11. Note that steps 2,3 and 13,14 are not symmetric.
If they were, one would have the caller allocate space for the return value
before the JSR. But, it is easier to put the code in the callee once, rather
that before each call to the function each place it is called.
Other protocalls
The previous section detailed one possible call/return protocol. There are others and we will consider some of them now. Which protocol is not important as long as the the caller and callee agree.
Using a register for 1st parameter
A typical entry to a subroutine looks like the following:
LD someReg, addr # get the 1st/only parameter PUSH someReg # save it on the stack JSR mySubroutine # execute the subroutine
In this prototocol, the caller and collee agree that the first/only
parameter will be passed in an agreed upon register (say R0). All other
parameters will be passed as before. Now, the entry code looks like:
LD R0, value # get the 1st/only parameter JSR mySubroutine # execute the subroutine
One then modifies the callee entry code to push R0 as its very first
action. We have moved the push from the caller to the callee. As a result, the
callers code is shortened by a single instruction. A small, but easy change.
Using a register for the return value
Consider the code typically following a subroutine call
JSR mySubroutine POP someRegister ST someRegister, addr # Save the return value ADD R6,R6,#params # clean up the stack
Again, by returning the value via a register, we eliminate a POP after
every JSR. One woud most likely choose to use the same reqister used for
the first parameter. After all, we have completed the subroutine call and
no longer have any need of the parameter. Another small, but easy change.
JSR mySubroutine ST R0, addr # Save the return value ADD R6,R6,#params # clean up the stack
Because the return value is in a register, it no longer takes space on the stack.
Subroutines with a fixed number of parameters
When the number of parameters is fixed, one can actually move the cleanup of
the parameters from the caller to the calee. Furthermore, in the callee,
the cleanup of the entire stack frame (locals/book-keeping/paramters) can
be collapsed into a single ajustment of the stack pointer. This assumes that
the return value is in a register. In some architectures, the RET statement
encodes a stack-adjustment value directly in the RET instruction itself.
Then executing the RET changes both the PC to the return address, and
updates the stack pointer.
Incorporating these ideas, the subroutine call/return code looks like:
# in the *caller*
LD R0,addr
JSR mySuboutine
# now in the *callee*
PUSH R0 # save the parameter
... # the rest of the subroutine
# restore the return address/frame pointer
ADD R6,R6,size # clean up everything
RET
# back in the *caller*
ST R0,addr # save the return value
Takeaways
-
The runtime stack is a linked list of stack frames.
-
The elements of the stack frame are located at fixed offsets from the frame pointer for that frame, regardless of the actual address of the stack frame.
-
Allocating and deallocating the stack frame is a cooperative effort with some of the code in the caller and some of the code in the callee.
-
The stack frames occupy contiguous addresses in memory.
(c) Fritz Sieker - 2016-2026