Probing


Introduction

Probing algorithms, as described below, represent one common type of Automatic Target Recognition (ATR) algorithm. The specific LADAR probing algorithm presented here was originally developed by Alliant Techsystems in Minneapolis, MN. It came to us at Colorado State as part of a DARPA funded project to develop a multisensor ground-based ATR capability. The LADAR probing algorithm itself should no longer be considered state-of-the-art. However, more advanced algorithms have been developed which utilize a very similar pattern of computation. In the context of the Cameron Project, the LADAR Probing algorithm is an excellent stand in for these other algorithms and allows us to explore how these algorithms map to Adaptive Computing Hardware. For more on probing and its applications, click here.

The basic idea behind probing couldn't be simpler: given a template of a target, arrange "probes" (pairs of pixels) in a window around the silhouette of the target. Then, to search for the target in a new image, slide the window across the image, as shown in the figure below. The likelihood of the target being at given window position is a function of the number of probes that have an edge between them (i.e. where the difference between the pixel values exceeds a threshold).

Probes overlaid on 12-bit LADAR data. In the image on the left, the probe window is in the correct to recognize the target, and all 13 probes "fire". In the image on the right, the probe window is too far to the left, and only 6 of 13 probes "fire".

Of course, this approach only works if the target is at a fixed orientation. To overcome this limitation, templates are generated by sampling the viewing sphere around the target. For each viewing position, a different probe set is generated. In our data set, we have four targets, with eighty-one views of each target, for a total of 324 probe sets. There are a total of 19,585 probes in these probe sets, embedded in image windows that do not exceed 15x45 pixels.

The result is a brute force algorithm. At every possible location for a 15x45 window in a source image, the system computes an average of 60 pair-wise pixel comparisons for each of 324 probe sets, and counts the number of times the difference exceeds a threshold for each. The output at that image position is the probe set that produced the most responses, or nothing if the maximum response count was below a threshold.


SA-C Implementation

The SA-C source code for probing is available below. The general structure is:

This is "inside out" from the way probing is usually described. Many systems implement probing as:

Although probing can be written either way in SA-C, the first method is more efficient. This is because the second method requires passing over the image many times. In the first method, on the other hand, the inner loops can be unrolled, because we know at compile time how many probe sets there are. As a result, the system can calculate all the probe sets in a single pass over the image (up to the size capacity of the FPGA).

click here for source code....


Optimizations

The SA-C compiler applies a number of optimizations to the probing code in response to pragmas. (Click here for more about SA-C optimizations.) The first and most obvious optimization is spatial (i.e. traditional) common subexpression elimination. It is common for a single probe -- i.e. a single pair of pixels -- to be part of more than one probe set. For example, consider the two probe sets shown below. Three of the probes overlap. As a result, there is redundant computation that the compiler can avoid through common subexpression elimination. (Note that this includes both the code/circuitry to test the duplicated probes, and to sum their results.)

One probe set. The probes in yellow spatially overlap probes in the image to the right; CSE eliminates the redundancy
The other probe set.

Even more probes can be eliminated through a new technique called temporal common subexpression elimination. Look again at the figures above. Three of the probes are left-shifted versions of the three other probes. (See the figure below to see the redundant probes color coded.) This implies that the pixels under the left-shifted probes were already compared to each other at an earlier time step, when the window itself was still more to the left. (Remember that the image window slides left to right.) Instead of recomparing these pixels, the results of the earlier comparison can be saved in shift registers and reused. This greatly reduces the number of separate probes to be compared.

Six probes from a probe set. The color coded pairs represent probes that are temporarlly redundant, because the window slides left to right. By insertint shift registers to "hold" values from one time step to the next, we can avoid recomputing the left-hand probe from each pair on the left.

Finally, the number of registers required to implement probing can be reduced by window narrowing. In order to implement sliding windows, the SA-C compiler uses shift registers. As the image window slides to the right, new pixel values are put at the head (right side) of the shift registers, and all of the old values are shifted one to the left. Pixels that fall off the end are no longer in the image window. Unfortunately, the image windows in this application are fairly large (15x45), and the individual pixels are 12 bit values. Unoptimized, the FPGA would be shifting 8,100 bits of information per cycle in order to implement the sliding window.

This is well within the capacity of a modern FPGA, but it does waste space. It is much cheaper to compare probe pixels as soon as they are available in the image window and then shift the (one bit) result, rather than the raw (twelve bit) pixels. This is what the window narrowing optimization does, and it reduces the size of the sliding window from 15x45 to 15x4.


Performance Results

The following performance results for SA-C are based on the Sept. 2001 version of the SA-C compiler, executated on an AMS WildStar with three Xilinx Virtex2000E FPGAs. The comparison numbers were for hand-optimized C code executed on a 800 MHz Pentium-III. There are two sets of performance numbers for the Pentium: one compiled using VC++ version 6.0 (optimized for speed) and running under Windows2000, the other for g++ (-O6) and running under Linux (RedHat 7.2). The image size was 512x1024, and each pixel was 12 bits.

SA-C on WildStare
C on 800MHz P3
MHz LUT
Flip-Flops
% Slices
Time (sec)
41.16
0.08
VC++ time (sec) G++ time (sec)
65
119