The problem of inferring causality from empirical observations has been well studied. Several approaches are notable for efficiently constructing complex causal models of the inter-relationships between variables (e.g., [8, 3, 2]). These approaches tend to rely on correlations and co-variances among the variables as the basis for inferring causality. However, for some applications, the available data are categorical observations over time (e.g., event streams or execution traces of programs); for example, patterns in execution traces form the basis of several methods of debugging software (e.g., [1, 4]). These applications are less amenable to solution by methods based on correlation and co-variance.
An alternative approach, called Dependency Detection, searches the event streams for often recurring sequences . The set of recurring sequences (called dependencies) indicates events that commonly co-occur and forms a weak model of causality.
Although promising, dependency detection is limited in several ways. First, the underlying search is exhaustive, looking for all possible dependencies in the data. As a consequence, the computational complexity increases exponentially with the length of the sequences. Second, the sequences are rigid, which means that only exact matches count and that any noise (i.e., insertion of some other unrelated event into the stream) will not count as an example of the sequence. Third, the technique considers only a single stream rather than multiple streams of parallel events; Oates et al.\ have developed a technique for multi-stream dependency detection . This paper describes how local search with flexible matching has been used to overcome the first two limitations.