IPC 2014 Learning Track Rules

The IPC 2014 Learning Track Rules

Domains and Modeling language
See the domains page.

Evaluation Schema

For both tracks, an award will only be granted for tracks with three or more competitors. If five or more competitors compete in a track, then runner up and third place awards will be given. All participants will be expected to release their code and provide an abstract for the conference presentation.

The Quality Subtrack

Evaluation Score: Planners will be evaluated based on the quality of the plans they produce. The scoring metric will be identical to that used in the recent deterministic tracks.

Competition Stages: The competition will be run in two stages that are identical to the IPC 2011 Learning Track (in fact, this is exactly the same wording from that competition):

The learning stage. Participants will be given two weeks, after which they must sent to the organizers the resulting Domain-specific Control Knowledge (DCK) folders together with the used sets of training files.
1. The learning stage will begin after the participants deliver the final version of their code to the organizers. At this point the participants must freeze their code.
2. After the code freeze, the organizers will distribute for each competition domain:
  1. The domain definition file
  2. The problem generator to produce the training set.
  3. The problem test set. A set of problems from the target distribution. The ultimate goal of the competition is to learn DCK that allows a planner to perform well on problems drawn from this distribution. Naturally, the set of target problems used in the actual evaluation will not be made available to the learners during the learning stage.
3. Once domains are distributed, each participant will generate their training set and run their learner program to produce a DCK folder for each domain. At this step, participants must run the same learner program that was submitted during the code freeze. To ensure that the frozen learner produces the same DCK with the same training set as submitted by participants, the organizers will randomly select domains in which to run the learner programs locally.
The testing stage. The planner programs will be evaluated in each domain of the competition with and without the learned DCK on the same problem set. The no-knowledge evaluation will help provide insight into the impact that learning had for each participant.
1. The organizers will conduct the evaluation stage on their local machines. Right now, the amount of time and memory allocated for each planner run is not finalized yet. Those numbers partly depend on how many planners enter the competition and the computing resources available to us at the competition time. At the moment, the numbers we have in mind for each run are: 15 minutes for each run and 4 GB RAM.

The Quality Subtrack Awards

The overall award can include any approach that fits within the above competition framework (CPU, memory, disk space, etc.), so there will be few, if any, restrictions for this award.

The best learner award will be given to the planner that makes the most improvement. However, there will be an elimination round to ensure that planners cannot simply fail without learning knowledge to improve their learning delta. All approaches (overall or basic) will be eligible for this award.

The basic solver award will be given to the planner that obtains the highest quality score while competing with the following restriction.

A basic solver is any single (meta) algorithm that does not leverage more than one general purpose solver at its core. It can be a meta-algorithmic approach (e.g., Iterated Local Search, Iterated WA*, managing calls to one SAT solver, randomized restarting A*, etc.) but it can only use one solver at its core. Parametrized variants of the same algorithm are still one solver. The core general purpose solver cannot itself be an ensemble of other solvers.

To make this concrete we will use some examples: A fallback strategy that uses a different algorithm is considered more than one general-purpose solver, so FF would be excluded unless only the EHC phase or the BFS phase were used, but not both. Using different heuristics (with possibly different open lists) is no different in spirit than making iterated calls of WA* with different weights, so Fast Downward and LAMA are included so long as they do not switch algorithms. A randomized restart algorithm that adjusts its restart strategy is included; iterated solvers that select (or adjust) parameters for a single base algorithm is similar in spirit to other iterated meta-algorithmic approaches. A planner that re-encodes the task for a single core solver (while possibly learning to select among distinct encodings) is considered a basic solver; so SAT planning approaches with a single SAT solver at their core fall in this category.

Competitors certify their systems as a basic solver and will be strongly encouraged to involve the organizer(s) and other participants concerning whether the system will be counted as a basic solver. If a discussion does not lead to a consensus, a final decision will be justified by the organizer(s). Please keep in mind that that any definition can be perceived as somewhat arbitrary. The intent is to make a succinct definition available before the competition, give participants a communal voice in whether their system fits that definition, and, in the end, take one small step toward encouraging fundamental research in basic solvers without limiting the "overall" approaches.

The Integrated Execution Subtrack

This track was cancelled due to too few competitors.

In many applications a planner generates plans as part of a much larger system and this is an important area to assess, thus extending the reach of classical planning systems. In such applications, learning can aid in producing better solutions faster and possibly reducing execution anomalies. This inaugural subtrack will focus on learning and planning within the context of a simple execution loop. The competition will focus on fully observable, discrete, non-adversarial, deterministic, single-agent domains. Example domains might include Snake (http://en.wikipedia.org/wiki/Snake_%28video_game%29), a restricted variant of Civilization based on the Settlers domain, or the Satellite domain. Demo problems and a simulator/controller for several domains will be available by December 15, if not sooner.

Evaluation Score. Each domain will have a single metric by which the planner will be measured, though this metric may be a weighted sum of multiple objectives. For the example domains, the metrics could be the number of mice eaten in the Snake domain, for Civilization, a combination of the total number of buildings constructed while minimizing pollution, and, for Satellite, some combination of maximizing image quality while minimizing camera slew. The final metrics will be clearly stated prior to the beginning of the learning stage.

For each domain, a simulator will report state changes to a controller that will calculate the current score, identify new goals, identify when an existing plan is needing adjustment, plus how much time is allowed before a new plan must be returned. The controller will produce typed PDDL for the planner, which can then decide to either repair the plan or replan from these points. Competitors are encouraged to apply learning to a variety of places in the execution loop (e.g., in reformulating the domain, in selecting replanning over plan repair, in adjusting parameters to improve the anytime planning of the system, etc.).

The Execution Subtrack Awards

A best overall learner award will be given to the planner that achieves the best cumulative score in all domains.

A most adaptable learner award will be given to the planner achieving the highest cumulative score in the face of increasingly frequent changes or limitations in the execution environment

A best anytime learner award highlights the planner that best adapts itself to the contractual time limit given by the controller.