Dependency Detection applies contingency table analysis to build a set of simple models of causes of a particular event's occurrence. It was designed to be the first step in a partially automated debugging process, which was originally developed for the Phoenix planning system . Consequently, the algorithm and the examples refer to failure recovery actions of the Phoenix planner and the failures that occurred during the planner's execution. It has been used as well for detecting sources of search inefficiency in another planning system.
Dependency detection searches Phoenix execution traces for statistically significant patterns: co-occurrences of particular precursors (events or combinations of events) leading to a target event (i.e., a failure). The execution traces are sequences of events: alternating failures and the recovery actions that repaired them, e.g.,
The idea is that the appearance of repetitive sequences (which will be called patterns to distinguish potential models from the data), such as , may designate early warnings of severe failures or indicate partial causes of the failures.
Dependency detection exhaustively tested the significance of a well-defined set of patterns using contingency table analysis. The set of patterns is crucial because the patterns form the hypothesized models of causality for the data; a model that is not in the set will not be tested. The pattern set defines the size of the search space. To test each pattern, the algorithm needs a method of matching the pattern to the data and, to reduce redundancy and potentially conflicting models, needs a method of reconciling similar models. Dependency detection exploits characteristics of the Phoenix data; the set of patterns included all combinations of a precursor followed immediately by a failure; the precursor could be a single failure, a single recovery action or a two event sequence of a failure and a recovery action. (The precursor-failure patterns are written as [F F], [R F] and [FR F], respectively.) While [R F] and [FR F] match the event streams exactly, the [F F] pattern matches by ignoring the value of the intervening recovery action. Similar patterns are reconciled by exploiting a property of the underlying statistic.
The dependency detection algorithm has three steps.
Table 1.1: Sample contingency table from Dependency Detection for sequence [ ]
For each significant pattern ( ), the third step prunes out similar (overlapping) patterns using a Homogeneity G-test. Two patterns overlap when one pattern contains either a particular failure or a particular recovery action as precursor (i.e., are either [R F] or [F F]) and the other contains both (i.e., [FR F] where one of the events in the precursor is the same as in the singleton). The process exploits the additivity property of the G-test by treating the data for the [FR F] pattern as a partition of the [F F] or [R F] (whichever is being compared) data and results in a list of all significant patterns found in the execution traces. (Details of the pruning process can be found in .)
Computationally, dependency detection requires a complete sweep of the execution traces to test each pattern. The number of patterns equals the number of types of failures squared (for FF) plus the number of failures squared times the number of recovery actions (for FRF) plus the number of recovery actions times the number of failures (for RF). In total, this is roughly where M is the length of the event traces, N is the number of types of events in the traces (i.e., the total number of types of recovery actions and failures). N is cubed to subsume the FR F case in which the precursor has both a failure and a recovery action; if we wished to consider longer patterns of events, then the 3 would be replaced by the length of the desired pattern.