cs270 Programming Assignment PAx - floating point math

Essentials

Due: xx/xx/xxxx @ 11:59PM
Key: Use the key PAx for checkin

About The Assignment

This assignment is designed to teach you how to do several floating point point operations in C without using either the float or double types. You will learn:

When you have completed this assignment you will understand how floating point values are stored in the computer, and how to perform several operations in the case where the underlying hardware/software does not provide floating point support. For example, the LC3 computer you will use later in this course has no floating point support.

First read the Getting Started section below and then study the documentation for iFloat.h in the Files tab to understand the details of the assignment.


Basic Bit manipulation

Binary and (&) will be used to extract the value of a bit and to set a bit to 0. This relies on the fact that the binary and (&) of a value and 1 results in the original value. Binary and (&) of a value and 0 results in 0. Binary or (|) is use to set a bit to 1. This relies on the the fact that binary or (|) of 1 and anything results in a 1.

You will create masks. A mask is a bit pattern that contains 0's and 1's in appropriate places so when the binary and/or operation is performed, the result has extracted/modified the bits of interest. In the following examples B stands for bits of interest while x stands for a bit that is not of interest. Note that, in general, the bits of interest need not be consecutive. In this code, we will be dealing with consecutive sets of bits.


  value:    xxxBBBBxxxxx  value: xxxBBBBxxxxx   value: xxxBBBBxxxxx
  mask:     000111100000  mask:  111000011111   mask:  000111100000
  -------   ------------         ------------          ------------
  and(&)    000BBBB0000   and(&) xxx0000xxxxx   or(|)  xxx1111xxxxx
  result:   isolate field        clear field           set field

Bit positions are numbered from 31 to 0 with 0 being the least significant bit. The bit position corresponds to the power of 2 in the binary representation.

You will need to create masks to extract the sign/exponent/mantissa fields and use the shift operators to convert them to values you can use. When you have computed the answerm you will use sift operations to reassemble the parts into the correct format.


Getting Started

Perform the following steps
  1. Create a directory for this assignment. A general scheme might be to have a directory for each CS class you are taking and beneath that, a directory for each assignment. The name of the directory is arbitrary, but you may find it useful to name it for the assignment (e.g. PAx).
  2. Copy appropriate file into this directory. It is easiest to right click on the link, and do a Save Target As.. for each of the files.

    Unpack the tar using tar -xvcf FLOAT.C.tar. This will unpack the files into a subdirectory src

  3. Open a terminal and make sure you are in the directory you created in step 1. The cd command can be used for this.
  4. In the terminal type the following command to build the executable.
    
        make
        
    You should see the following output:
    
        gcc -g -std=c11 -Wall -c -DDEBUG -DHALF Debug.c
        gcc -g -std=c11 -Wall -c -DDEBUG -DHALF iFloat.c
        gcc -g -std=c11 -Wall -c -DDEBUG -DHALF testFloat.c
        gcc -o testFloat -g -std=c11 -Wall Debug.o iFloat.o testFloat.o convert.a
        
  5. In the terminal type testFloat and read how to run the the program.
  6. In the terminal type testFloat bin -3.625 and you should see the output:
    
        dec: -1066926080  hex: 0xC0680000  bin: 1100-0000-0110-1000-0000-0000-0000-0000
        
    What you are seeing it the internal bit pattern of the floating point value -3.625 expressed as an integer, as hex, and as binary.

You now have a functioning program. All the commands work, however, only bin will produce correct results at this point.


Completing the Code

Before attempting to write any of the functions of iFloat.c, study the documentation in found in the files tab. Plan what you need to do before writing code.

The best way to complete the code is to follow a write/compile/test sequence. Do not attempt to write everything at once. Rather choose one function and do the following steps.

  1. Write some/all of one function in iFloat.c using your favorite editor.
  2. Save your changes and recompile iFloat.c using make. You will find it convenient to work with both a terminal and editor window at the same time.
  3. Repeat steps 1 and 2 until there are no errors or warnings.
  4. Test the function you have been working on. Do not attempt to move on until you complete and thoroughly test a function.
  5. Repeat steps 1 thru 5 for the remaining functions.

You may work on the functions in any order, but most are very simple and are support functions for the meat of the code. A sample solution prepared by the author contained the following:

Your code may be a little longer, but in every case, these methods are quite simple. If you find any of your solution is much longer that stated, you will want to think about how you are approaching the problem.

Floating Point Addition

The single function floatAdd() is the only complex function in this assignment. Many of the things you need to do can be done by calling the support methods you have already written and thoroughly tested.

The general algorithm for floating point addition is as follws:

  1. Extract the sign, exponent, and value for each of the numbers
  2. Allign the decimal points using shift operations. When this is complete both numbers have the same exponent which is the initial value for the exponent of the sum.
  3. Convert the sign/magnitude representation to two's complement
  4. Do an integer addition
  5. Convert the two's complement back to sign/magnitude
  6. Normalize the result using shift operations and increment/decrement the exponent appropriately.
  7. Reassemble the sign, exponent and value into a 16/32 bit value.

Simple Example in decimal

In the following examples we will use the numbers 0.75, 9.5 and 10.25. Note that the work begins with the scientific notation of these numbers because this is how the numbers are stored.

    9.5 + 0.75        9.5 - 10.25  sample addition/subtraction

    (9.5 * 100)       (9.5   * 100) original values
  + (7.5 * 10-1)    - (1.025 * 101)

    (9.5  * 100)     ( .95  * 101)  exponents equal
  + (0.75 * 100)   - (1.025 * 101)

   ( 9.5            (  0.95
    +0.75) * 100     -1.025) * 101 factor out exponent

    10.25  * 100      -0.75 * 101 after addition

     1.025 * 101      -7.5  * 10-1renormalize

Checking in Your Code

You will submit the single file iFloat.c using the checkin program. Use the name PAx. At the terminal type:

    ~cs270/bin/checkin PAx iFloat.c
  

Relax, you are done with your assignment!