CS 270
cs270 Programming Assignment - Symbol Table

Essentials

There are two submissions for this assignment: P4A and P4B.

The most challenging part of these assignments is becoming comfortable with the way the data structures are organized. The following resources from previous semesters will help you understand the big picture. Some of the details may be different.

  • A video about symbol tables and hash tables.
  • A video about data structure organization.
  • A visualization of the data structure organization. It also shows you the declarations of the C structures used in this assignment. It is very important to understand how the declarations relate to the drawing. You will find it useful to have this page open while you code.

See the class Progress page for due dates of these submissions.


P4A

P4A uses some basic commands that allow you to add and list entries in the symbol table. The following commands are tested in this part of the assignment.
  • init capacity
  • add name address
  • list
  • count
  • exit/quit
These commands require the implementation of any functions used to perform these operations, such as:
  • symbol_init()
  • symbol_add()
  • symbol_iterate()
You might find it useful to also implement symbol_search() for use in the symbol_add() function.

P4B

P4B uses all of the remaining commands. You can run the program and type help to view a list of all commands which will be tested. This will require you to implement any functions in symbol.c that were not implemented in P4A.

Goals of Assignment

In this assignment you will build a symbol table. This assignment serves several purposes:
  • Learn about C structures (struct) and how to embed one structure within another.
  • Learn how to use C pointers and the address-of operator &
  • Learn how to allocate and free memory using malloc()/calloc()/free()
  • Learn a little bit about hash tables
  • Learn about function pointers
  • Learn to use a simple Debugging framework
  • Build a module that will be resused in future assignment(s).

Overview

A symbol table is a data structure that relates an identifier to information about that identifier. For this assignment, the identifiers are the labels on lines of LC3 assembly language statements. The information associated with the label is its address. For more information on symbol tables see this wikipedia entry.

In this assignment, you will use a hash table with chaining. You will learn more about hash tables in your Data Structures course. The data structures you use are already defined for you in symbol.h/symbol.c.

The hash function you will use computes an integer value from a string. More than one string may hash to the same value. However, a good hash function will minimize this. You can think of the hash value as an "alias" for the string. Thus, when searching for a string, you first search for the hash value (using integer comparison, which is fast) and only when you find an equal hash value, use strcasecmp() (much slower than integer comparison) to verify you have found the string of interest.

The symbol table contains a hash_table which is an array. The index into this array is calculated as index = hash-value % size-of-hash-table. A given index in the hash table contains a pointer to the head of a singly linked list of nodes containing symbols whose hash values share the same index.

There is a scond table in the symbol table structure. It is a dynmacilly allocated array of char* (C's version of strings), that allow a fast lookup of a symbols name given its address.

Here is a graphical representation of the data structure. Make sure you understand how the declarations relate to the drawing.


Getting Started

  1. Create a directory for this assignment and cd to it
  2. Copy the following files to the directory you created. It is easiest to right click on the link, and do a Save Target As.. for each of the files.
  3. Build the excutable by typing make
  4. You now have a functional program, but it will not produce the corrrect results.
  5. Type ./testSymbol to start the program. Then type help to print the usage message. This program is like a shell in that you type commands, and the program performs the selected action (sometimes silently). Study the commands to understand how to test individual functions. NOTE The init must be the first command used if you want to test your code. Type exit to end the program.
  6. You are now ready to begin implementation. You have a number of functions to implement. In the reference implmentation, approximately 110 line of code were added (including blank lines). Implement functions one at a time and test as you go.

Implementing symbol_init()

In this function, you must allocate space for your symbol table and initialize it approrpiately. Compare malloc()/calloc(). You will also allocate and initialize addr_table[].

Here is a graphical representation of the data structure. Make sure you understand how the declarations relate to the drawing.


Implementing symbol_search()

This function will do a lot of the work needed for the functions symbol_add() and symbol_find_by_name(). The basic algorithm to search for a symbol by name is:
  1. compute the hash value for the name using symbol_hash()
  2. compute the index based on (hash-value % table-capacity)
  3. search the linked list to find the symbol. To speed thing up search first for the hash value. Only when the hash values match do you need to do a case independent string comparison.
  4. return the information if the name is found, or NULL otherwise
You can test implementation by using the command search in the test driver.

Implementing symbol_add()

This function adds a symbol to the symbol table if it does not already exist. If it does not exist, you must allocate memory for the appropriate structure, initialize the fields of the structure appropriately and store in at the correct index in the symbol table. The entry at a given index of the symbol table is a pointer to the head of a linked list of nodes. The symbol just created should become the new head of the list. You may test your implementation using the add command with the test driver.

A hash table is designed to spread out the entries, often sacrificing space for speed. To thoroughtly test your function, you will need to force multiple names to occupy the same index in the hash table. Think about the pigeon hole princple from CS165, study the code and you should understand what to modify to test your code. Your code will be tested with a different values.


Implementing the remaining functions

Implement the remaining functions one by one and test them as you go.
  1. The function symbol_iterate() is only a few (10) lines of code. It consists of nested loops with the outer loop varying over the symbol table and the inner loop varying over the linked list at the current index. Using a function pointer is really straight forward. The actual line of code you will use is:
    
        (*fnc)(symbol, data); 
    This is almost identical to calling a normal function named fnc() except that you would write:
    
        fnc(symbol, data);
    You can test your function using the commands count/list in the test driver
  2. The function symbol_find_by_name() is only a few lines of code. Most of the work is done by symbol_search(). Test by using the command get in the test driver.
  3. The function symbol_reset() consist of two nested loops as you wrote for symbol_iterate(). For each node you will need to free any allocated memory. Test it with the command reset in the test driver.
  4. The function symbol_term() may use symbol_reset to do most of the work. Be sure and deallocate the main symbol table as well. This will be tested when the test driver terminates.
  5. For symbol_order, here is an example of using qsort.
  6. Implement the remaining functions defined in the TODO list.

Thoroughly test your code using the test driver. You may find it convenient to build a file containing a series of commands to execute, then use this file to test your program without having to type in commands over and over. Execute

   ./testSymbol [-debug] < testFile

Checking in Your Code

You will submit the single file symbol.c using the checkin program. Use the key SYMBOL. At the terminal type:

    ~cs270/bin/checkin P4A symbol.c