![]()
Wavelets are commonly used for multi-scale analysis in computer vision, as well as for image compression. Honeywell has defined a set of benchmarks for reconfigurable computing systems, including a wavelet-based image compression algorithm. This algorithm is based on the Cohen-Daubechies-Feauveau wavelet (1992). We have translated the CDF wavelet program into SA-C, and generalized it to operate on any size image.
In general, wavelets separate the high-frequency components of images from their low frequency components. The low-order components can then be resampled into a smaller image, while the high frequency components look like derivatives in dx, dy and d(xy). In the SA-C program below, the input is an 8-bit grey scale image. There are four outputs: a reduced image (low frequencies), and three images of high frequency components. The range of the output images exceeds 8 bits, however. The output pixels can be positive or negative, and require ten bits of magnitude information. SA-C takes advantage of the FPGAs flexibility with regard to bit precisions to produce 11-bit signed output images. Below, we show a source image and the four corresponding output images. Readers should be aware that the output images were rescaled to 0-255 for viewing: the actual ranges were -2 to 272 (output image 1), -87 to 93 (output image 2), -160 to 110 (output image 3) and -113 to 96 (output image 4).
|
Aerial image of Fort Hood, TX (used as input for the CDF wavelet)
|
|
The four output images from the CDF Wavelet
|
export main;
int10, int10 Valsuint8(uint8 col[5])
{
int10 mask[3] = {-1,2,-1};
int11 d0 = for p in col[0:2] dot m in mask return(sum((int11)p*m));
int11 d1 = for p in col[2:4] dot m in mask return(sum((int11)p*m));
int11 d01 = d0 + d1;
// what I need is (d0+d1)/8. The next five lines compute it.
int11 ud01 = if (d01 < 0) return(-d01) else return(d01);
bits11 b01 = ud01;
bits11 bdiv8 = b01 >> 3;
int11 udiv8 = bdiv8;
int11 adj = if (d01 < 0) return(-udiv8) else return(udiv8);
} return(col[2]+adj, d0);
int11, int11 Valsint10(int10 col[5])
{
int11 mask[3] = {-1,2,-1};
int12 d0 = for p in col[0:2] dot m in mask return(sum((int12)p*m));
int12 d1 = for p in col[2:4] dot m in mask return(sum((int12)p*m));
int12 d01 = d0 + d1;
// what I need is (d0+d1)/8. The next five lines compute it.
int12 ud01 = if (d01 < 0) return(-d01) else return(d01);
bits12 b01 = ud01;
bits12 bdiv8 = b01 >> 3;
int12 udiv8 = bdiv8;
int12 adj = if (d01 < 0) return(-udiv8) else return(udiv8);
} return(col[2]+adj, d0);
uint8 clip8(int11 src)
{
uint10 upix = if (src < 0) return(-src) else return (src);
uint8 pix = if (upix > 255) return(255) else return(upix);
} return(pix);
int11[:,:], int11[:,:], int11[:,:], int11[:,:] main(uint8 src[:,:])
{
int11 s[:,:], int11 dx2[:,:], int11 dy2[:,:], int11 dxy[:,:] =
// PRAGMA (nextify_cse,part_unroll(8,1))
for window w[5,5] in src step(2,2)
{
int10 sy[5], int10 dy[5] =
for uint3 colnum in [0~4]
{
int10 sval, int10 dval = Valsuint8(w[:,colnum]);
} return(array(sval), array(dval));
int11 s, int11 dx2 = Valsint10(sy);
int11 dy2, int11 dxy = Valsint10(dy);
} return(array(s), array(dx2), array(dy2), array(dxy))
} return(s, dx2, dy2, dxy);
Performance was measured by compiling the SA-C CDF wavelet routine with the May 13, 2001 version of the SA-C compiler and executing them on an Annapolis Microsystems StarFire with an Xilinx XV-1000 FPGA. The test image was an 8-bit 512x512 images (of Fort Hood, TX, shown above). Compiler optimizations included temporal CSE, loop unrolling, pipelining (6 stages), stripmining (16x), and using multiple output memories to avoid contention.
|
Seconds (800MHx P3)
|
Seconds (WildStar)
|
LUTs
|
Flip-Flops
|
Slices | Frequency |
|
0.075
|
0.002
|
54%
|
69%
|
99% | 35.1 MHz |
For comparison purposes, we ran Honeywell's original wavelet code (written in C) on an 800 MHz Pentium III running Windows2000 and compiled with VC++ version 6, optimized for speed.
For previous results, click here