G-Test Sensitivity to Execution Trace Size.

Next: Empirically Testing for Data Up: Sensitivity of Dependency Detection Previous: Sensitivity of Dependency Detection

G-Test Sensitivity to Execution Trace Size.

We selected the G-test over the more common Chi-square test because the G-test is additive. Additivity means that G values for subsets of the sample can be added together to get a G value for the superset. If the ratios remain the same but the total number of counts in the contingency table double, then the G value for the contingency table doubles as well. For example, the G value for the contingency table in Table 1 is 42.86; the G value for the contingency table with 10 times fewer counts (i.e., a contingency table with 5, 3, 24 and 64 in its cells) is 4.319 or roughly (as close as one gets when rounding the counts to the nearest integer) one tenth of 42.86. Additivity means that the value of G increases linearly with the amount of data (or in this case, the number of patterns in the execution traces).

A linear relationship between the number of patterns in the execution traces and the value of G is convenient for several reasons. First, the additivity property is exploited for the second step in dependency detection: pruning overlapping dependencies. We can divide the patterns into their subparts (e.g., a precursor with both a failure and recovery method in it) and add the resulting G values to get the same value as if we had calculated a G for all the subsets together. Second, a linear relationship is predictable. We know that the more patterns in the execution traces, the more likely we are to detect dependencies. Linearity is convenient because we are unlikely to be surprised by new dependencies suddenly showing up if we gather a few more execution traces (meaning the new dependencies were not even close to being dependencies before the additions). The bottom line is that given execution traces with few patterns, the G-test can find strong dependencies, but given more patterns, it will also find rare dependencies. If a user of FRA is interested in detecting any dependencies, then a few execution traces will be adequate to do so; if the user wishes to find rare or obscure dependencies, then it will be necessary to gather more execution traces. The level of effort expended in gathering execution traces depends on what kinds of dependencies one wishes to find.

Next: Empirically Testing for Data Up: Sensitivity of Dependency Detection Previous: Sensitivity of Dependency Detection

adele howe
Wed Oct 23 13:21:04 MDT 1996