cs270 Programming Assignment PAx - floating point math

## Essentials

Due: xx/xx/xxxx @ 11:59PM
Key: Use the key PAx for checkin

This assignment is designed to teach you how to do several floating point point operations in C without using either the `float` or `double` types. You will learn:
• how to use the C language operators for binary and (&), binary or (|), and binary not (~).
• how to use the C language bit shift operators (<< and >>).
• how to perfom simple pointer operations using C's address-of operator (&) and dereference operator (*).
• how to create and use bit masks
• how to use symbolic constants in your code
• the relationship between the C language left shift operator (<<) and 2n.

When you have completed this assignment you will understand how floating point values are stored in the computer, and how to perform several operations in the case where the underlying hardware/software does not provide floating point support. For example, the `LC3` computer you will use later in this course has no floating point support.

First read the Getting Started section below and then study the documentation for iFloat.h in the Files tab to understand the details of the assignment.

## Basic Bit manipulation

Binary and (&) will be used to extract the value of a bit and to set a bit to 0. This relies on the fact that the binary and (&) of a value and 1 results in the original value. Binary and (&) of a value and 0 results in 0. Binary or (|) is use to set a bit to 1. This relies on the the fact that binary or (|) of 1 and anything results in a 1.

You will create masks. A mask is a bit pattern that contains 0's and 1's in appropriate places so when the binary and/or operation is performed, the result has extracted/modified the bits of interest. In the following examples B stands for bits of interest while x stands for a bit that is not of interest. Note that, in general, the bits of interest need not be consecutive. In this code, we will be dealing with consecutive sets of bits.

``````
value:    xxxBBBBxxxxx  value: xxxBBBBxxxxx   value: xxxBBBBxxxxx
-------   ------------         ------------          ------------
and(&)    000BBBB0000   and(&) xxx0000xxxxx   or(|)  xxx1111xxxxx
result:   isolate field        clear field           set field
``````

Bit positions are numbered from 31 to 0 with 0 being the least significant bit. The bit position corresponds to the power of 2 in the binary representation.

You will need to create masks to extract the sign/exponent/mantissa fields and use the shift operators to convert them to values you can use. When you have computed the answerm you will use sift operations to reassemble the parts into the correct format.

## Getting Started

Perform the following steps
1. Create a directory for this assignment. A general scheme might be to have a directory for each CS class you are taking and beneath that, a directory for each assignment. The name of the directory is arbitrary, but you may find it useful to name it for the assignment (e.g. PAx).
2. Copy appropriate file into this directory. It is easiest to right click on the link, and do a `Save Target As..` for each of the files.

Unpack the tar using `tar -xvcf FLOAT.C.tar`. This will unpack the files into a subdirectory `src`

3. Open a terminal and make sure you are in the directory you created in step 1. The `cd` command can be used for this.
4. In the terminal type the following command to build the executable.
``````
make
``````
You should see the following output:
``````
gcc -g -std=c11 -Wall -c -DDEBUG -DHALF Debug.c
gcc -g -std=c11 -Wall -c -DDEBUG -DHALF iFloat.c
gcc -g -std=c11 -Wall -c -DDEBUG -DHALF testFloat.c
gcc -o testFloat -g -std=c11 -Wall Debug.o iFloat.o testFloat.o convert.a
``````
5. In the terminal type `testFloat` and read how to run the the program.
6. In the terminal type `testFloat bin -3.625` and you should see the output:
``````
dec: -1066926080  hex: 0xC0680000  bin: 1100-0000-0110-1000-0000-0000-0000-0000
``````
What you are seeing it the internal bit pattern of the floating point value `-3.625` expressed as an integer, as hex, and as binary.

You now have a functioning program. All the commands work, however, only `bin` will produce correct results at this point.

## Completing the Code

Before attempting to write any of the functions of `iFloat.c`, study the documentation in found in the files tab. Plan what you need to do before writing code.

The best way to complete the code is to follow a write/compile/test sequence. Do not attempt to write everything at once. Rather choose one function and do the following steps.

1. Write some/all of one function in `iFloat.c` using your favorite editor.
2. Save your changes and recompile `iFloat.c` using `make`. You will find it convenient to work with both a terminal and editor window at the same time.
3. Repeat steps 1 and 2 until there are no errors or warnings.
4. Test the function you have been working on. Do not attempt to move on until you complete and thoroughly test a function.
5. Repeat steps 1 thru 5 for the remaining functions.

You may work on the functions in any order, but most are very simple and are support functions for the meat of the code. A sample solution prepared by the author contained the following:

• `floatGetSign()` - 1 line of code
• `floatGetExp()` - 1 line of code
• `floatGetVal()` - 1 line of code
• `floatAbs()` - 1 line of code
• `floatNegate()` - 1 lines of code
• `floatSub()` - 1 line of code
• `floatGetAll()` - 3 lines of code
• `floatLeftMost1()` - 10 lines of code
• `floatAdd()` - 60 lines of code
Your code may be a little longer, but in every case, these methods are quite simple. If you find any of your solution is much longer that stated, you will want to think about how you are approaching the problem.

The single function `floatAdd()` is the only complex function in this assignment. Many of the things you need to do can be done by calling the support methods you have already written and thoroughly tested.

The general algorithm for floating point addition is as follws:

1. Extract the sign, exponent, and value for each of the numbers
2. Allign the decimal points using shift operations. When this is complete both numbers have the same exponent which is the initial value for the exponent of the sum.
3. Convert the sign/magnitude representation to two's complement
5. Convert the two's complement back to sign/magnitude
6. Normalize the result using shift operations and increment/decrement the exponent appropriately.
7. Reassemble the sign, exponent and value into a 16/32 bit value.

## Simple Example in decimal

In the following examples we will use the numbers 0.75, 9.5 and 10.25. Note that the work begins with the scientific notation of these numbers because this is how the numbers are stored.
``````
9.5 + 0.75        9.5 - 10.25  sample addition/subtraction

(9.5 * 100)       (9.5   * 100) original values
+ (7.5 * 10-1)    - (1.025 * 101)

(9.5  * 100)     ( .95  * 101)  exponents equal
+ (0.75 * 100)   - (1.025 * 101)

( 9.5            (  0.95
+0.75) * 100     -1.025) * 101 factor out exponent

10.25  * 100      -0.75 * 101 after addition

1.025 * 101      -7.5  * 10-1renormalize
``````

You will submit the single file `iFloat.c` using the `checkin` program. Use the name PAx. At the terminal type:
```    ~cs270/bin/checkin PAx iFloat.c