Automated test generation for some software systems can be formulated like the AI Planning process. In our case, the system being tested is a command interface to a robot controlled mass storage (tape) library. The basic idea is that a test case consists of a series of operations intended to change the software system state from some initial conditions to a desired final state. The "plan" takes the place of the human operator who is trying to move and read tapes and otherwise maintain the system. As most human operators have goals in mind as they control the system, a goal oriented focus on testing seems only natural.
Our application, Storage Technology's Robot Tape Library interface, involves moving tapes around, into and out of a large silo, reading them in a tape drive and monitoring the status of the system. The basic domain theory contains 11 actions and 25 predicates. The tape library can be configured as connected silos with a tape drive in each and positions for tapes designated by panels, rows and columns. The configuration is described in the initial conditions of problems along with identifiers and initial positions for the tapes. We created three variants on the basic domain theory to recode some problematic aspects of the original (i.e., conditional effects and disjunctive preconditions) and six core problems whose goals required use of different actions on two tapes.
In building a prototype test generator with planning at its core, we discovered that it was remarkably difficult for planners to solve the problems that we were constructing for this application. Consequently, we embarked on a series of experiments to test existing planners on a suite of problems, including some from this application. The idea was two-fold: 1) to help determine which planner to embed in our system and 2) to forward understanding of performance differences between existing planners.
To date, no planner has demonstrated clearly superior performance. Although researchers have hypothesized that this should be the case, no one has performed a large study to test its limits. In this research, we tested performance of a set of planners to determine which is best on what types of problems. The study included six planners and over 200 problems. We found that performance, as measured by number of problems solved and computation time, varied with no one planner solving all the problems or being consistently fastest. Furthermore, no planner was consistently poor either with each planner posting the best time on some problem in the test set.
Analysis of the data also showed that most planners either fail or succeed quickly and the the performance depends at least in part on some easily observable problem/domain features. Based on these results, we implemented a meta-planner that interleaves execution of six planners on a problem until one solves it. The control strategy for ordering the planners and allocating time is derived from the performance study data. We found that our meta-planner is able to solve more problems than any single planner, but at the expense of computation time.
The BUS Meta-Planner interleaves the execution of several component planners in an effort to take advantage of the different strengths of these component planners. The scheduling of the component planners is determined by a linear regressions model of the effects of a set of static domain/problem features. This model is very crude but it was able to make an impressive improvement in the average time for finding a solution. Another feature of the planners used here which is exploited by this approach is the fact that many times a planner will either succeed or fail very quickly. This allows BUS to either solve the problem quickly or to concentrate its efforts on another planner which may succeed.
A more thorough treatment of the comparative analysis and the BUS meta-planner can be found in (Howe 99).
As a service to the community, we are making available some of the problems from our generator, i.e., those problems used in our studies. The domain and problems are described in PDDL and are restricted to STRIPS representation.
The domain theory contains the operators from a subset of the command language. The problems are typical testing examples in which the operator might want to transform the robot library state from the initial to the goal state. There are three versions of the domain theory which use slightly different encodings. The first encoding is the one used for our comparative work as the others could only be processed by a couple of planners.
|stek domain, problems, results||stek.tar|
|Domain 1||This domain theory uses only the simplest features of PDDL. This domain can be processed by all of the planners in the study.|
|Domain 2||This domain modifies the previous domain by making use of
advanced features like conditional effects in the operators
of the domain. However, the
|Domain 3||This domain is similar to Domain 2 except that it also includes a fully conditional move operator.|
|Problem Description||Instance Size|
|Dismount a tape from the drive||prob01a.pddl||prob01b.pddl||prob01c.pddl||prob01d.pddl|
|Transfer a tape into the silo||prob02a.pddl||prob02b.pddl||prob02c.pddl||prob02d.pddl|
|Move a tape within the silo||prob03a.pddl||prob03b.pddl||prob03c.pddl||prob03d.pddl|
|Move and transfer tapes||prob04a.pddl||prob04b.pddl||prob04c.pddl||prob04d.pddl|
|Different move and transfer||prob05a.pddl||prob05b.pddl||prob05c.pddl||prob05d.pddl|
|Move a tape and dismount a tapes||prob06a.pddl||prob06b.pddl||prob06c.pddl||prob06d.pddl|
Usage Note: All of the domains and problems are keyed to
the same domain name (
stek-all_action) to allow
them to be mixed in any combination. Care must be taken with
planners like UCPOP to only load one domain at a time.
This research was made possible by National Science Foundation grant CCR-9619787. The PIs for the grant were Anneliese vonMayrhauser and Adele Howe. Graduate students who worked on these problems were: Michael Scheetz and Eric Dahlman.