My Project
CS270 Programming Assignment P4 - Symbol Table
Programming Assignment P4: CS270 Computer Organization

Part A due Tuesday, Feb. 14 at 10:00pm, late submission until 11:59pm.

Part B due Sunday, Feb. 19 at 10:00pm, late submission until 11:59pm.


Goals of Assignment

In this assignment you will build a symbol table. This assignment serves several purposes:

Errata

If we find errors in the assignment, we will post them here. Make sure you check this often.


Overview

A symbol table is a data structure that relates an identifier to information about that identifier. For this assignment, the identifiers are the labels on lines of LC3 assembly language statements. The information associated with the label is its address. For more information on symbol tables see this wikipedia entry.

In this assignment, you will use a hash table with chaining. You will learn more about hash tables in your Data Structures course. The data structures you use are already defined for you in symbol.h/symbol.c.

The hash function you will use computes an integer value from a string. More than one string may hash to the same value. However, a good hash function will minimize this. You can think of the hash value as an "alias" for the string. Thus, when searching for a string, you first search for the hash value (using integer comparison, which is fast) and only when you find an equal hash value, use strcasecmp() (much slower than integer comparison) to verify you have found the string of interest.

The symbol table contains a hash_table which is an array. The index into this array is calculated as index = hash-value % capacity-of-hash-table. A given index in the hash table contains a pointer to the head of a singly linked list of nodes containing symbols whose hash values share the same index.

There is a second table in the symbol table structure called the address table. It is a dynamically allocated array of char* (C's version of strings) that allows for a fast lookup of a symbol's name given its address.


Getting Started

  1. Create a directory for this assignment and cd to it
  2. Copy the following files to the directory you created. It is easiest to right click on the link, and do a Save Target As.. for each of the files.
  3. Build the excutable by typing make
  4. You now have a functional program, but it will not produce the correct results
  5. Enter ./testSymbol. You will see a usage message which tells you how to use the program. You may also use help within the program to reprint the message. The program takes only one argument: the size of the hash table. All functions are invoked through commands. For example (user input is shown in blue):
        ./testSymbol 3
        add label1 12
        OK
        add label2 35
        OK
        quit
  6. You are now ready to begin implementation. You have nine functions to implement (four in part A and five in part B).

Part A

In part A, you will only implement the following functions (you should do so in the following order):

The approximate line counts shown above are based on our solution and include empty lines and comments.

Please refer to the Files tab for details about each function (you are mostly interested in the documentation for symbol.h).

The most challenging part of this assignment is becoming comfortable with the way the data structures are organized. The following resources will help you understand the big picture:


Part B

In part B, you will implement the remaining functionality (you should do so in the following order):

The approximate line counts shown above are based on our solution and include empty lines and comments.


Debugging

We encourage you to use Fritz Sieker's debugging framework. He has provided a function called debug(). You don't have to include any extra header files. You can use this function like a printf. The difference is that the debug() function will print stuff only if you run your program with the -debug option as the first argument. This way, if you don't want to see the debugging output, you don't have to remove the calls to debug(). You can simply remove the -debug option from the command line. We have provided you with an example of how to call the debug() function in the symbol_init function. To see the output, run the program as follows:

./testSymbol -debug 20
DEBUG symbol.c[59] symbol_init() symbol_init was called with table_size = 20

You don't have to remove the debug() calls before submitting your program because we will run it without the -debug option. However, if you choose not to use this framework, make sure to remove all statements that print something before submitting. In other words, if I were to run your program without the -debug option, I should not see any extra output. Otherwise, the autograder won't be able to match your output to the expected output.


Using Valgrind

Valgrind is a popular program that allows you to track memory errors (such as segmentation faults) and memory leaks. To run your program through Valgrind and check for memory leaks, use the following command:

valgrind --leak-check=yes ./testSymbol size

Here is an example. I ran my program and got the following:

./testSymbol 20
add askbjs 22
Segmentation fault (core dumped)

A segmentation fault usually indicates that you tried to access memory you weren't supposed to access. I ran my program using Valgrind and got the following (only part of the output is shown):

valgrind --leak-check=yes ./testSymbol 20
==28069== Memcheck, a memory error detector
==28069== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28069== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28069== Command: ./testSymbol 20
==28069== 
add askbjs 22
==28069== Invalid read of size 8
==28069==    at 0x400C93: symbol_search (symbol.c:106)
==28069==    by 0x400D0D: symbol_add (symbol.c:118)
==28069==    by 0x401165: main (testSymbol.c:154)
==28069==  Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd
==28069==    at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==28069==    by 0x400A81: symbol_init (symbol.c:60)
==28069==    by 0x4010C7: main (testSymbol.c:138)
==28069== 
==28069== Invalid read of size 8
==28069==    at 0x400B31: symbol_add_unique (symbol.c:75)
==28069==    by 0x400D28: symbol_add (symbol.c:119)
==28069==    by 0x401165: main (testSymbol.c:154)
==28069==  Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd
==28069==    at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==28069==    by 0x400A81: symbol_init (symbol.c:60)
==28069==    by 0x4010C7: main (testSymbol.c:138)
==28069== 
==28069== Invalid write of size 8
==28069==    at 0x400B54: symbol_add_unique (symbol.c:78)
==28069==    by 0x400D28: symbol_add (symbol.c:119)
==28069==    by 0x401165: main (testSymbol.c:154)
==28069==  Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd
==28069==    at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==28069==    by 0x400A81: symbol_init (symbol.c:60)
==28069==    by 0x4010C7: main (testSymbol.c:138)
==28069== 
OK

(You can quit from Valgrind prematurely using Ctrl-C)

It gave me a bunch of invalid reads and writes. I started by looking into the first invalid read. It shows me the stack trace at the point where the invalid read occurred: main called symbol_add and symbol_add called symbol_search. So, the invalid read occurred on line 106 of my symbol.c file. It also shows me the cause for the invalid read. It says: "Address 0x51fa0b8 is 16 bytes after a block of size 8 alloc'd." This is basically saying that I previously allocated a block of 8 bytes and I'm trying to access something out of bounds. It even tells me where in my code I allocated that block: line 60 of symbol.c. I went to that line and, sure enough, I realized that I hadn't allocated enough space for my hash table. I corrected the error and re-ran with Valgrind:

valgrind --leak-check=yes ./testSymbol 20
==28136== Memcheck, a memory error detector
==28136== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28136== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28136== Command: ./testSymbol 20
==28136== 
add askbjs 22
OK
quit
==28136== 
==28136== HEAP SUMMARY:
==28136==     in use at exit: 0 bytes in 0 blocks
==28136==   total heap usage: 6 allocs, 6 frees, 525,535 bytes allocated
==28136== 
==28136== All heap blocks were freed -- no leaks are possible
==28136== 
==28136== For counts of detected and suppressed errors, rerun with: -v
==28136== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I no longer get memory errors. Also, note that it says "All heap blocks were freed -- no leaks are possible." This means that my symbol_term function freed all dynamically allocated memory correctly. It's important to note that you have to try your program with different test cases. One single run of Valgrind does not guarantee that your program is correct. Also, note that the program must terminate before Valgrind shows you information about memory leaks.

Valgrind will come in handy when you get a segmentation fault or when you want to test your symbol_reset and symbol_term functions.

Here's what a run with memory leaks would look like:

valgrind --leak-check=yes ./testSymbol 20
==28299== Memcheck, a memory error detector
==28299== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28299== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28299== Command: ./testSymbol 20
==28299== 
add test 1
OK
add test2 2
OK
quit
==28299== 
==28299== HEAP SUMMARY:
==28299==     in use at exit: 11 bytes in 2 blocks
==28299==   total heap usage: 8 allocs, 6 frees, 525,571 bytes allocated
==28299== 
==28299== 11 bytes in 2 blocks are definitely lost in loss record 1 of 1
==28299==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==28299==    by 0x4EC0079: strdup (in /usr/lib64/libc-2.23.so)
==28299==    by 0x400AFF: symbol_add_unique (symbol.c:72)
==28299==    by 0x400D2B: symbol_add (symbol.c:119)
==28299==    by 0x401158: main (testSymbol.c:154)
==28299== 
==28299== LEAK SUMMARY:
==28299==    definitely lost: 11 bytes in 2 blocks
==28299==    indirectly lost: 0 bytes in 0 blocks
==28299==      possibly lost: 0 bytes in 0 blocks
==28299==    still reachable: 0 bytes in 0 blocks
==28299==         suppressed: 0 bytes in 0 blocks
==28299== 
==28299== For counts of detected and suppressed errors, rerun with: -v
==28299== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

This is telling me that I didn't free something that I allocated in line 72 of my symbol.c.


Testing

Thoroughly test your code using the test driver and Valgrind. You may find it convenient to build a file containing a series of commands to execute, then use this file to test your program without having to type in commands over and over. Execute

   ./testSymbol size < testFile
The < operator in this context means that the contents of testFile are going to be redirected to the stdin stream of the program. You are effectively automating user input.


Grading Criteria

Part A

Preliminary testing (15 points):

Final testing (additional 35 points):

Part B

In part B, we will re-test part A. Hence, the total number of points for P4 is 150: 50 for part A which was due on 2/14, 50 for the re-test of part A, and 50 for the part B functions.

Preliminary testing (27 points):

Final testing (additional 73 points): Extra credit (additional 10 points):

Final testing includes all preliminary testing.


Checking in Your Code

You will submit the single file symbol.c in the Checkin tab of the course web page.