Evaluation Schema
For both tracks, an award will only be granted for tracks with three or more competitors. If five or more competitors compete in a track, then runner up and third place awards will be given. All participants will be expected to release their code and provide an abstract for the conference presentation.The Quality Subtrack
Evaluation
Score: Planners
will be evaluated based on the quality of the plans they produce.
The scoring metric will be identical to that used in the recent
deterministic tracks.
Competition Stages: The competition will be run in two stages that are identical to the IPC 2011 Learning Track (in fact, this is exactly the same wording from that competition):
The learning stage. Participants will be given two weeks, after which they must sent to the organizers the resulting Domain-specific Control Knowledge (DCK) folders together with the used sets of training files.
The domain definition file
The problem generator to produce the training set.
The problem test set. A set of problems from the target distribution. The ultimate goal of the competition is to learn DCK that allows a planner to perform well on problems drawn from this distribution. Naturally, the set of target problems used in the actual evaluation will not be made available to the learners during the learning stage.
Once domains are distributed, each participant will generate their training set and run their learner program to produce a DCK folder for each domain. At this step, participants must run the same learner program that was submitted during the code freeze. To ensure that the frozen learner produces the same DCK with the same training set as submitted by participants, the organizers will randomly select domains in which to run the learner programs locally.
The testing stage. The planner programs will be evaluated in each domain of the competition with and without the learned DCK on the same problem set. The no-knowledge evaluation will help provide insight into the impact that learning had for each participant.
The organizers will conduct the evaluation stage on their local machines. Right now, the amount of time and memory allocated for each planner run is not finalized yet. Those numbers partly depend on how many planners enter the competition and the computing resources available to us at the competition time. At the moment, the numbers we have in mind for each run are: 15 minutes for each run and 4 GB RAM.
The Quality Subtrack Awards
The overall award can
include any approach that fits within the above competition framework
(CPU, memory, disk space, etc.), so there will be few, if any,
restrictions for this award.
The best learner award
will be given to the planner that makes the most improvement.
However, there will be an elimination round to ensure that
planners cannot simply fail without learning knowledge to improve their
learning delta. All approaches (overall or basic) will be
eligible for this award.
The basic solver award
will be given to
the planner that obtains the highest quality score while competing with
the following restriction.
A basic solver is any single (meta)
algorithm that does not leverage more than one general purpose solver
at its core. It can be a meta-algorithmic approach (e.g.,
Iterated Local Search, Iterated WA*, managing calls to one SAT solver,
randomized restarting A*, etc.) but it can only use one solver at its
core. Parametrized variants of the same algorithm are still one
solver. The core general purpose solver cannot itself be an
ensemble of other solvers.
To make this concrete we will use some
examples: A fallback strategy that uses a different algorithm is
considered more than one general-purpose solver, so FF would be
excluded unless only the EHC phase or the BFS phase were used, but not
both. Using different heuristics (with possibly different open
lists) is no different in spirit than making iterated calls of WA* with
different weights, so Fast Downward and LAMA are included so long as
they do not switch algorithms. A randomized restart algorithm
that adjusts its restart strategy is included; iterated solvers that
select (or adjust) parameters for a single base algorithm is similar in
spirit to other iterated meta-algorithmic approaches. A planner
that re-encodes the task for a single core solver (while possibly learning
to select among distinct encodings) is considered a basic solver; so
SAT planning approaches with a single SAT solver at their core fall in
this category.
Competitors certify their systems as a basic solver and will be strongly encouraged to involve the organizer(s) and other participants concerning whether the system will be counted as a basic solver. If a discussion does not lead to a consensus, a final decision will be justified by the organizer(s). Please keep in mind that that any definition can be perceived as somewhat arbitrary. The intent is to make a succinct definition available before the competition, give participants a communal voice in whether their system fits that definition, and, in the end, take one small step toward encouraging fundamental research in basic solvers without limiting the "overall" approaches.
The Integrated Execution Subtrack
This track was cancelled due to too few competitors.
In many applications a planner generates plans as part of a much
larger system and this is an important area to assess, thus extending
the reach of classical planning systems. In such applications, learning
can aid in producing better solutions faster and possibly reducing
execution anomalies. This inaugural subtrack will focus on learning
and planning within the context of a simple execution loop. The
competition will focus on fully observable, discrete, non-adversarial,
deterministic, single-agent domains. Example domains might
include Snake (http://en.wikipedia.org/wiki/
Evaluation Score. Each
domain will have a single metric by which the planner will be measured,
though this metric may be a weighted sum of multiple objectives.
For the example domains, the metrics could be the number of mice eaten
in the Snake domain, for Civilization, a combination of the total
number of buildings constructed while minimizing pollution, and, for
Satellite, some combination of maximizing image quality while
minimizing camera slew. The final metrics will be clearly stated
prior to the beginning of the learning stage.
For each domain, a simulator will report state changes to a
controller that will calculate the current score, identify new goals,
identify when an existing plan is needing adjustment, plus how much
time is allowed before a new plan must be returned. The
controller will produce typed PDDL for the planner, which can then
decide to either repair the plan or replan from these points.
Competitors are encouraged to apply learning to a variety
of places in the execution loop (e.g., in reformulating the domain, in
selecting replanning over plan repair, in adjusting parameters to
improve the anytime planning of the system, etc.).
The Execution Subtrack Awards
A best overall learner award will be given to the planner that achieves the best cumulative score in all domains.
A most adaptable learner award will be given to the planner achieving the highest cumulative score in the face of increasingly frequent changes or limitations in the execution environment
A best anytime learner award highlights the planner that best adapts itself to the contractual time limit given by the controller.