Link to
Colorado State University Home Page

Reinforcement Learning and Control


Contents

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Back to Chuck Anderson's Home Page


Current project members (faculty and CS students)

Past members:

Back to Contents


Funding

This work has been funded by NSF Grants

Back to Contents


State prediction to develop useful state-action representations

We presented at IJCNN, 2015 the following paper, which won the Best Paper Award.

Back to Contents


Wind Energy

We are currently investigating applications of reinforcement learning to the control of wind turbines. On August 13th, we presented a poster titled On-Line Optimization of Wind Turbine Control using Reinforcement Learning at the 2nd Annual CREW Symposium at Colorado School of Mines. CREW stands for the Center for Research and Education in Wind.

In 2010, we received a grant from the Colorado State University Clean Energy Supercluster titled "Predictive Modeling of Wind Farm Power and On-Line Ooptimization of Wind Turbine Control". This grant is described in the CES Supercluster 2009-2010 Annual Report

Resources we have found useful:

Back to Contents


Recurrent Networks

We are investigating the use of recurrent neural networks to approximate value functions when state cannot be completely observed. Part of our work is based on the Echo State Network formulation.

Back to Contents


Function Approximation

During an extended visit to Colorado State University, Andre Barreto developed a modified gradient-descent algorithm for training networks of radial basis functions. His modification is a more robust approach for learning value functions for reinforcement learning problems. The following publication describes this work.

Jilin Tu completed his MS thesis in 2001. The following is an excerpt from his abstract.

This thesis studies how to integrate statespace models of control systems with reinforcement learning and analyzes why one common reinforcement learning ar chitecture does not work for control systems with Proportional-Integral (PI) controllers. As many control problems are best solved with continuous state and control signals, a continuous reinforcement learning algorithm is then developed and applied to a simulated control problem involving the refinement of a PI controller for the control of a simple plant. The results show that a learning architecture based on a statespace model of the control system outperforms the previous reinforcement l earning architecture, and that the continuous reinforcement learning algorithm ou tperforms discrete reinforcement learning algorithms.

In 1999, Baxter and Bartlett developed their direct-gradient class of algorithms for learning policies directly without also learning value functions. This intrigues me from the viewpoint of function approximation, in that there may be many problems for which the policy is easier to represent than is the value function. It is well known that a value function need not exactly reflect the true value of state-action pairs, but must only value the optimal actions for each state higher than the rest. A function approximator that strives for minimum error may waste valuable function approximator resources. We devised a simple Markov chain task and a very limited neural network that demonstrates this. When applied to this task, Q-learning tends to oscillate between optimal and suboptimal solutions. However, using the same restricted neural network, Baxter and Bartlett's direct-gradient algorithm converges to the optimal policy. This work is described in:

We have experimented with ways of approximating the value and policy functions in reinforcement learning using radial basis functions. Gradient descent does not work well for adjusting the basis functions unless they are close to the correct positions and widths a priori. One way of dealing with this is to "restart" the training of a basis function that has become useless. It is restarted by setting its center and width to values for which the basis function will enable the network as a whole better fit the target function. This is described in:

Matt Kretchmar and I also experimented with different basis functions, as described in

and are adapting methods for matching data probability distributions, such as Kohonen's self-organizing maps approach, to the temporal-difference paradigm of reinforcement learning. My interest in efficient ways of learning good representations for reinforcement learning systems started during my graduate school days with my advisor, Andy Barto, at the University of Massachusetts:

Back to Contents


Combining Reinforcement Learning with Feedback Controllers

One domain in which we are developing applications of reinforcement learning is the heating and cooling of buildings. In some initial work we have investigated reinforcement learning, and some other neural-net ways of learning to control, on an accurate simulation of a heating coil:

Back to Contents


Robust Reinforcement Learning

Robust control theory can be used to prove the stability of a control system for which unknown, noisy, or nonlinear parts are "covered" with particular uncertainties. We have shown that a reinforcement learning agent can be added to such a system if its nonlinear and time-varying parts are covered by additional uncertainties. The resulting theory and techniques guarantee stability of a system undergoing reinforcement learning control, even while learning!

Here is a link to a web site for our NSF-funded project on Robust Reinforcement Learning for HVAC Control.

Back to Contents


Mixture of Experts

"Mixture of experts" networks have been shown to automatically decompose difficult mappings into a combination of simple mappings. We extended these techniques for reinforcement learning and tested them with the pole-balancing problem, as reported in

Back to Contents


Multigrid Methods

In complex, delayed-reward problems, a considerable amount of experience is required to propagate reward information back through the sequence of states that might affect that reward. We are exploring one way to speed up this propagation of information by adapting the multigrid approach from solving large distributed systems of PDE's to the reinforcement learning paradigm. Robert Heckendorn and I have tested this using a multigrid version of value iteration: and Stew Crawford-Hines and I have worked with a multigrid form of Q learning:

Back to Contents


Traffic Light Control

Another domain in which we have applied reinforcement learning is the control of traffic lights. This work applies SARSA to a simulation of traffic flow through intersections:

Back to Contents


Comparison of Reinforcement Learning and Genetic Algorithms

With Darrell Whitley, we have compared reinforcement learning algorithms with genetic algorithms for learning to solve the inverted pendulum problem. In our experiments, we found that the genetic algorithm resulted in more robust solutions:

Back to Contents


Smart Sensing in Automotive Engines

In other control work unrelated to reinforcement learning, we have shown that expensive sensors for air-fuel ratio can be replaced by inexpensive cylinder pressure sensors by using neural networks to learn a mapping from the pressure trace to the actual air-fuel ratio. This work is in collaboration with Bryan Willson.

Back to Contents


Other Applications

Several applications for which reinforcement learning is suggested to be a good solution are described here:

Back to Contents



Reinforcement Learning Research in CS at CSU, Charles W. Anderson / anderson@cs.colostate.edu