Solving Systems of Tridiagonal Matrices


Introduction

A tridiagonal matrix system is an equation of the form Ax=b, where x and b are vectors, and A is a tridiagonal matrix. In other words, A is necessarily square, and has non-zero entries only along its diagonal and immediately adjacent to its diagonal. The goal is to find x, given A and b.

This is not the type of application SA-C was originally designed for, but two of our colleagues from the Colorado State University Department of Atmospheric Science (Graeme Stephens and Philip Gabriel) suggested that the SA-C compiler could be applied to natural resource problems, and suggested that we look at solving systems of tridiagonal matrices as a start. We thank them for their interest and advice.

We have implemented a simple iterative algorithm (suggested by Philip Gabriel) for solving diagonally dominant tridiagonal matrices. The source code below contains initial code for reading the input matrix and vector, and for computing various terms for debugging purposes. The piece that solves the linear system and runs on the FPGA is clearly commented in the middle of the code. The performance results below are with regard to this section of the code.


Source Code

#define SIZE 8
#define ITERATIONS 10

fix16.14[SIZE]  main (float A[SIZE,SIZE], float X[SIZE]) {
 bool debug = true;
 uint8 rA, uint8 cA = extents (A);
 uint8 rX           = extents (X);
 assert ((rA==cA), "A not square");
 assert ((rA==rX), "Sizes A and X not compliant");

 print(debug,"A: ", A);
 print(debug,"X: ", X);

 float B[:] =
   for VA (~,:) in A {
   float ip = for a in VA dot x in X
               return (sum (a * x));
   } return ( vector(ip));
 print(debug,"B = A.X: ", B);

 float Dinv[:]  =
   for uint8 i in [rA] return(array((float)(1.0)/A[i,i]));

 float L[:], float U[:] =
   for uint8 i in [rA] {
    float Li = i==0    ? (float)(0.0) : A[i,i-1];
    float Ui = i==rA-1 ? (float)(0.0) : A[i,i+1];
   } return(array(Li),array(Ui) );

 // DiB = Dinv*B  DiL,DiU =  Dinv*(L+U)
 float DiB[:], float DiL[:], float DiU[:] =
   for  d in Dinv dot b in B dot l in L dot u in U
   return( array(d*b), array(d*l), array(d*u) );
 print(debug,"DiB: ", DiB);
 print(debug,"DiL: ", DiL);
 print(debug,"DiU: ", DiU);
 fix16.14 DiBtc[:] = DiB;
 fix16.14 DiUtc[:] = DiU;
 fix16.14 DiLtc[:] = DiL;
 print(debug,"DiBtc: ", DiBtc);
 print(debug,"DiLtc: ", DiLtc);
 print(debug,"DiUtc: ", DiUtc);
 fix16.14 Y[:] = DiB;

// This goes to the FPGA
 fix16.14 res[:] =
   // PRAGMA(no_unroll)
   for uint8 iter in [ITERATIONS] {
    fix16.14 Yperim[:] =
     for uint8 i in [rA+2]
     return(array(i==0 || i==(rA+1) ? (fix16.14)(0.0) : Y[i-1]));
    // Compute  next Y = DiB - (DiL+DiU) * Y
    next Y =
     for b in DiBtc dot l in DiLtc dot u in DiUtc dot window WY[3] in Yperim
     return(array(b - (l*WY[0] + u*WY[2])));
   }return(final(Y));

// End of FPGA code.
 print(debug,"Iterative solution of A.X = B: ");
}return(res);

Performance Results

Performance was tested on an 8x8 diagonally dominant tridiagonal matrix. The reconfigurable processor was an Annapolis Microsystems (AMS) Starfire, containing a single Xilinx Virtex 1000 series FPGA. For comparison purposes, we ran an equivalent C program on an 800MHz Pentium III.

seconds (800 MHz P3)
microseconds (AMS)
Frequency
Logic Blocks
2.3x10-6
0.5x10-6
25.2 MHz
35%