Benchmark Planning Problems from Storage Technology Software Test Generation Domain

Automated test generation for some software systems can be formulated like the AI Planning process. In our case, the system being tested is a command interface to a robot controlled mass storage (tape) library. The basic idea is that a test case consists of a series of operations intended to change the software system state from some initial conditions to a desired final state. The "plan" takes the place of the human operator who is trying to move and read tapes and otherwise maintain the system. As most human operators have goals in mind as they control the system, a goal oriented focus on testing seems only natural.

Application: Robot Tape Library Command Interface

Our application, Storage Technology's Robot Tape Library interface, involves moving tapes around, into and out of a large silo, reading them in a tape drive and monitoring the status of the system. The basic domain theory contains 11 actions and 25 predicates. The tape library can be configured as connected silos with a tape drive in each and positions for tapes designated by panels, rows and columns. The configuration is described in the initial conditions of problems along with identifiers and initial positions for the tapes. We created three variants on the basic domain theory to recode some problematic aspects of the original (i.e., conditional effects and disjunctive preconditions) and six core problems whose goals required use of different actions on two tapes.

Study of Planners' Performance

In building a prototype test generator with planning at its core, we discovered that it was remarkably difficult for planners to solve the problems that we were constructing for this application. Consequently, we embarked on a series of experiments to test existing planners on a suite of problems, including some from this application. The idea was two-fold: 1) to help determine which planner to embed in our system and 2) to forward understanding of performance differences between existing planners.

Performance Results

To date, no planner has demonstrated clearly superior performance. Although researchers have hypothesized that this should be the case, no one has performed a large study to test its limits. In this research, we tested performance of a set of planners to determine which is best on what types of problems. The study included six planners and over 200 problems. We found that performance, as measured by number of problems solved and computation time, varied with no one planner solving all the problems or being consistently fastest. Furthermore, no planner was consistently poor either with each planner posting the best time on some problem in the test set.

The BUS Meta-Planner

Analysis of the data also showed that most planners either fail or succeed quickly and the the performance depends at least in part on some easily observable problem/domain features. Based on these results, we implemented a meta-planner that interleaves execution of six planners on a problem until one solves it. The control strategy for ordering the planners and allocating time is derived from the performance study data. We found that our meta-planner is able to solve more problems than any single planner, but at the expense of computation time.

The BUS Meta-Planner interleaves the execution of several component planners in an effort to take advantage of the different strengths of these component planners. The scheduling of the component planners is determined by a linear regressions model of the effects of a set of static domain/problem features. This model is very crude but it was able to make an impressive improvement in the average time for finding a solution. Another feature of the planners used here which is exploited by this approach is the fact that many times a planner will either succeed or fail very quickly. This allows BUS to either solve the problem quickly or to concentrate its efforts on another planner which may succeed.

A more thorough treatment of the comparative analysis and the BUS meta-planner can be found in (Howe 99).

Domain and Problem Descriptions

As a service to the community, we are making available some of the problems from our generator, i.e., those problems used in our studies. The domain and problems are described in PDDL and are restricted to STRIPS representation.

The domain theory contains the operators from a subset of the command language. The problems are typical testing examples in which the operator might want to transform the robot library state from the initial to the goal state. There are three versions of the domain theory which use slightly different encodings. The first encoding is the one used for our comparative work as the others could only be processed by a couple of planners.

Tarred Archive File
stek domain, problems, results stek.tar

Domain File Description
Domain 1 This domain theory uses only the simplest features of PDDL. This domain can be processed by all of the planners in the study.
Domain 2 This domain modifies the previous domain by making use of advanced features like conditional effects in the operators of the domain. However, the move operator is not changed from the original version.
Domain 3 This domain is similar to Domain 2 except that it also includes a fully conditional move operator.

Associated problems:

Problem Description Instance Size
Small Medium Large Largest
Dismount a tape from the drive prob01a.pddl prob01b.pddl prob01c.pddl prob01d.pddl
Transfer a tape into the silo prob02a.pddl prob02b.pddl prob02c.pddl prob02d.pddl
Move a tape within the silo prob03a.pddl prob03b.pddl prob03c.pddl prob03d.pddl
Move and transfer tapes prob04a.pddl prob04b.pddl prob04c.pddl prob04d.pddl
Different move and transfer prob05a.pddl prob05b.pddl prob05c.pddl prob05d.pddl
Move a tape and dismount a tapes prob06a.pddl prob06b.pddl prob06c.pddl prob06d.pddl

Usage Note: All of the domains and problems are keyed to the same domain name (stek-all_action) to allow them to be mixed in any combination. Care must be taken with planners like UCPOP to only load one domain at a time.


This research was made possible by National Science Foundation grant CCR-9619787. The PIs for the grant were Anneliese vonMayrhauser and Adele Howe. Graduate students who worked on these problems were: Michael Scheetz and Eric Dahlman.

  1. Adele Howe, Eric Dahlman, Christopher Hansen, Michael Scheetz, and Anneliese von Mayrhauser. Exploiting Competitive Planner Performance. In Proceedings of The Fifth European Conference on Planning, September 1999. (Postscript Version)

Eric Dahlman
Last modified: Thu Aug 24 08:51:08 MDT 2000