Demonstration implementations of position-heap algorithms

Please report any problems to Ross McConnell, rmm@cs.colostate.edu

More code is forthcoming in the near future

naiveBuildNaiveSearch.tar Creates a directory called naiveBuildNaiveSearch that contains C++ code that demonstrates the implementation of the naive algorithm for constructing the position heap from a text, and the naive searching algorithm for using the position heap to search for a substring in the text. A makefile is provided; the algorithms of interest are in heap.cpp. The space requirement is two words per character of text.

fastBuildNaiveSearch.tar Creates a directory called fastBuildNaiveSearch that contains C++ code that demonstrates the implementation of the linear-time algorithm for constructing the position heap from a text. The space requirement is three words per position of text. We think that this version or the naive one will be the most useful version for most practical applications, depending on how scarce space is. The worst-case O(m^2+k) bound for searching, where m is the length of a pattern and k is the number of occurrences, is pessimistic. It takes O(m+k) expected time when either the text or the pattern string is randomly generated, for example. Getting this down to O(m+k) in the worst case requires five words per text.

naiveBuildFastSearch.tar Creates a directory called naiveBuildFastSearch. This gives a naive way of implementing the position heap so that it supports a search in O(m+k) worst-case time, but does not build it in O(n) time. The space requirement is five words per character of text, since the fast search requires the addition of a "maximal-reach pointer" and a DFS discovery- and finishing-time label to each node.

fastBuildFastSearch.tar Creates a directory called fastBuildFastSearch. This gives an O(n) construction algorithm and an O(m+k) search algorithm, where n is the length of the text, m is the length of the pattern string, and k is the number of occurrences to be reported.

largeDNA.txtLarge sample input file containing a DNA sequence

largeEnglish.txtLarge sample input file containing English text