Colorado State University Computer Science Department

CS-200, Fall 1998 
Assignment #9


Hey Mom, It's Learning!

In the last assignment, you wrote a simulation of a robot finding its way through a maze. Many of you commented that the use of the stack and queue to determine the next state to visit is not realistic; the robot often "jumps" from the current state to another state that is not close to its current state. Therefore, if we were using this approach to control a real robot, we would have to do our search off-line, using a model of the maze world. Once the entire path is determined by our software, the robot can start moving.

In this assignment, we make the simulation much more realistic and more general. It is more realistic, because the robot only moves to neighboring states. It is more general, because a model of the maze is not required. Of course we are doing everything in simulation, so we must have the maze model, but the robot's decisions about which state to go to next does not use the model.

You will be implementing a form of reinforcement learning. This is a class of machine learning algorithms that work without the knowledge of a model. They learn a lot like animals learn, through rewards and punishments. For this assignment, you will be punishing your robot for every step that it takes that does not result in the robot being at the goal position. Technically, the algorithm finds the sequence of actions that minimizes the sum of these punishments. Each punishment has a value of 1, so minimizing the sum of these punishments means finding the shortest path. To learn more about reinforcement learning, check out this site, where you can find some online tutorials.

Reinforcement learning works by maintaining a memory in the robot that predicts how many steps it will take to get to the goal from each state. The predictions are used to decide which action to take from any state.  This is done by trying each of the four actions and checking the memory to see what the prediction is for each resulting state.  The action that takes the robot to the state with the smallest prediction of remaining steps is the action that is taken.

These predictions are learned by experimentation---taking various actions from a state and updating the prediction for that state by considering the prediction currently stored for the state you arrive in. Let Pred(s) be the prediction of remaining steps to get to the goal from state s. Say that the robot is in state s and takes an "up" action and arrives in state t. The prediction for state s is changed to

Pred(s) = 0.1 (1 + Pred(t)) + 0.9 Pred(s)

So, the new prediction for state s is 0.9 of its old value, plus 0.1 of what our current estimate is of what it should be. It should be equal to the prediction of remaining steps from state t plus 1, because it takes 1 step to get to t from s. It's kind of counter-intuitive that this works, because you are updating Pred(s) based only on a current guess at what Pred(t) is, and all predictions are being updated as the robot moves around the maze.

We could use an array for the memory to store the predictions, but to allocate such an array the robot must know how many rows and columns will be needed.  If the robot doesn't know this beforehand, then an array can't be used.  Instead, you will use a hash table to implement the memory.  Use Table.h to implement the hash table.  It is a modified version of the hash table code described in the book which we downloaded from the author's web site.  You will have to carefully read the comments and code in Table.h to learn how to use it. Each item to be stored is simply the state (row and column) and the floating-point prediction.  Form the key for the hash table by combining the row and column in this fashion

f(row,column) = k x row + column + 1

where k is some integer.  For this assignment, you will evaluate how well the robot learns for different values of k.
 

The Search and Learning Algorithm

A summary of the algorithm that you must implement is given below.

For k = 5, 6, ..., 20
    Allocate new hash table of  capacity 300.
    Repeat 1000 times (call this 1000 trials)
        Set current state to be maze's start state.
        While robot is not at goal state AND number of steps taken this trial is less than 10,000, do
             For each possible action (that does not run into a wall)
                 Try the action from the current state.
                 Get next state.
                 Retrieve any memory of next state's prediction from the hash table
                  using the hash table find method..
                 If no memory is found, say the prediction for next state is 0.
                 If this prediction is lower than current lowest prediction, remember it.
             Action is the one with lowest prediction for next state.
             Update the prediction for the current state using the equation

Pred(current state) = 0.1(1+Pred(next state)) + 0.9 Pred(current state)
             and calling the hash table insert method for the current state.
             Set the current state to the next state resulting from the chosen action.
      After repeating 1000 times, print the value of k, the number of steps taken on the last trial,
      and the number of storage places used in the hash table.
      Also print a table of predicted values for every state.
      Delete the hash table.
end of loop trying different values of k.

Changes to Code from Assignment 8

You should still use the Maze class, but you no longer need the method for marking visited states.  In this assignment, the robot is free to revisit states.

You are not using stacks and queues in this assignment.  Remember, the robot is simply taking steps, one at a time, from where it currently is.

Required Variations

To help you debug your program, and to understand how the algorithm works, add these two variations on the output provided by your program.
  1. Print the maze with an * marking the current state of the robot.
  2. Print a graph showing the number of steps to reach the goal for each trial
Use a command line argument to select one of these variations.  Thus, you will have three ways of running your executable.  Let's say your executable is named a.out.  Then you can run To use command line arguments, you must declare your main function as
int main(int argc, char *argv[])
You can convert the first command line argument to an integer by doing
int option;

if (argc > 1)
   option = atoi(argv[1]);

Printing the maze with an * at current state is much more fun to watch if you clear the screen before you draw it each time.  You can do this by calling the function
system("clear")
The function system allows you to run any unix command, like clear.  To use system, you must
#include <stdlib.h>
To graph the steps on each trial, use C++ iomanipulators to space over a number of spaces that is proportional to the number of steps, and print an *.  To calculate this, you should know the maximum number of steps that will result and the width of your screen, so you can place the farthest right * within the width of the screen.  You probably will not know the maximum number of steps, so just guess and adjust it after you have seen some of the output.  If your screen is 80 characters wide, and you suspect the maximum number of steps will be 100, then you should scale the number of steps with the expression  steps * 80 / 100.  Use the iomanipulator setw to do this.  Once the goal is reached, print a line using code like
cout << setw(steps * 80 / 20 ) << "*" << endl;
To use iomanipulators, remember to
#include <iomanip.h>

The Input Data File

Use the same input data file as you used for the last assignment. Click here to get the input data file.

Example Program Output

The following are just the first things printed out after running the executable with no arguments, an argument of 1, and an argument of 2. To print the table of values as shown, I used
  cout.setf(ios::fixed, ios::floatfield);
  cout.precision(1);
before printing, and then printed each value using
      cout << setw(7) << val ;
a.out < maze.data

k = 5 steps last trial = 8 Table spaces used = 24
    2.0    2.3    2.1    1.0    0.0
    5.0    4.0    3.0    2.0    1.0
    6.0    3.1    2.8    2.4    1.8
    7.0    2.8    2.8    2.8    2.7
    8.0    2.7    2.7    2.7    2.7
 ... then again for k = 6, k = 7, etc...
a.out 1 < maze.data
+---------+
|     |  G|
|---- |   |
|         |
| |       |
| |       |
| | ------|
| |       |
| |       |
|*|       |
+---------+
+---------+
|     |  G|
|---- |   |
|         |
| |       |
| |       |
| | ------|
|*|       |
| |       |
| |       |
+---------+
+---------+
|     |  G|
|---- |   |
|         |
| |       |
|*|       |
| | ------|
| |       |
| |       |
| |       |
+---------+
+---------+
|     |  G|
|---- |   |
|*        |
| |       |
| |       |
| | ------|
| |       |
| |       |
| |       |
+---------+
+---------+
|     |  G|
|---- |   |
|  *      |
| |       |
| |       |
| | ------|
| |       |
| |       |
| |       |
+---------+
   ... and on and on until G is reached ...
a.out 2 < maze.data

  0                                                                                                                                                                                                       *
  1                               *
  2                                       *
  3                                                               *
  4                                                                                                                                                               *
  5                               *
  6                                       *
  7                                                               *
  8                                       *
  9                                       *
 10                               *
 11                               *
 12                                                                                                                                                                               *
 13                                                               *
 14                                       *
 15                               *
 16                                       *
 17                               *
 18                                       *
 19                                                                               *
 20                                                                                                                                                       *
 21                               *
 22                                       *
 23                               *
 24                               *
 25                                       *
 26                               *
 27                                       *
 28                                                                               *
 29                                                                                                                                                                                               *
 30                               *
 31                               *
 32                                       *
 33                               *
 34                                       *
 35                               *
 36                               *
 37                                       *
 38                               *
 39                                                                                                                                                                               *
 40                               *
 41                               *
 42                                                                                       *
 43                                       *
 44                               *
 45                                       *
 46                               *
 47                               *
 48                                       *
 49                               *
 50                               *
 51                                       *
 52                               *
 53                               *
 54                                       *
 55                               *
 56                               *
 57                               *
 58                                                                                                                                                                                                               *
 59                               *
 60                               *
 61                               *
 62                               *
 63                               *
 64                               *
 65                                       *
 66                               *
 67                               *
 68                               *
 69                               *
 70                                       *
 71                               *
 72                               *
 73                               *
 74                               *
 75                               *
 76                                       *
 77                               *
 78                               *
 79                               *
 80                               *
 81                               *
 82                               *
 83                               *
 84                               *
 85                               *
 86                                       *
 87                               *
 88                               *
 89                               *
 90                               *
 91                               *
 92                               *
 93                               *
 94                               *
 95                               *
 96                               *
 97                               *
 98                               *
 99                               *
100                               *
101                               *
102                               *
103                               *
104                               *
105                               *
106                               *
107                               *
108                               *
109                               *
110                               *
111                                       *
112                               *
113                               *
114                               *
115                               *
116                               *
117                               *
118                               *
119                               *
120                               *
121                               *
122                               *
123                               *
124                               *
125                               *
126                               *
127                               *
128                               *
129                               *
130                               *
131                               *
132                               *
133                               *
134                               *
135                               *
136                               *
137                               *
138                               *
139                               *
140                               *
141                               *
142                               *
143                               *
144                               *
145                               *
146                               *
147                               *
148                               *
149                               *
150                               *
    and on and on until 1,000 trials are reached.  The position that the asterisk is in above corresponds to 8 steps.

Written Discussion

Discuss your observations about how the final number of steps to goal  varies or doesn't vary with the value of k in the hash function for each maze.  Try to explain the causes of what you observe.

Describe what you see in the printed tables of prediction values.  Discuss how and why the number of places used in the hash table varies with k.

Explain how the algorithm for choosing next actions results in picking short paths, given the final learned predictions.

Explain how Table.h implements a hash table. Does it statically or dynamically allocate the table. How does it handle collisions.

WHAT TO TURN IN


Return to CS-200 Home Page

Copyright © 1998: Colorado State University for CS200. All rights reserved.