Thanks to ISL, we AlphaZ now have two schedulers. Multi-dimensional farkas scheduler by Paul Feautrier, and PLuTo scheduling by Uday Bondhugula. For details of the schedulers, refer to the corresponding articles.
@article{feautrier1992some,
title={Some efficient solutions to the affine scheduling problem. part II. multidimensional time},
author={Feautrier, P.},
journal={International Journal of Parallel Programming},
volume={21},
number={6},
pages={389--420},
year={1992},
publisher={Springer}
}
@article{bondhugula2008practical,
title={A practical automatic polyhedral parallelizer and locality optimizer},
author={Bondhugula, U. and Hartono, A. and Ramanujam, J. and Sadayappan, P.},
journal={ACM SIGPLAN Notices},
volume={43},
number={6},
pages={101--113},
year={2008},
publisher={ACM}
}
Schedulers take PRDG (Polyhedral Reduced Dependence Graph) as inputs and returns affine functions for each node in the PRDG. Following commands are related to scheduling.
Lets take a simple program, matrix multiply, as an example. Matrix multiplication has a 3D iteration space, and the only dependence is along the k-axis for accumulating.
affine matrix_product {P, Q, R|P>1 && Q>1 && R>1}
given float A {i,k| 0<=i<P && 0<=k<Q};
float B {k,j| 0<=k<Q && 0<=j<R};
returns float C {i,j,k| 0<=i<P && 0<=j<R && k==Q+1};
using // Using an accumulator locally
float temp_C {i,j,k|0<=i<P && 0<=j<R && 0<=k<=Q};
through
temp_C[i,j,k] = case
// For the computation of the value of temp_C at (i,j,k),
// we need the value of temp_C at (i,j,k-1), A at (i,k) and B at (k,j)
{|k>0} : temp_C[i,j,k-1] + A[i,k-1]*B[k-1,j];
{|k==0} : 0; // Initialization of the accumulator
esac;
C[i,j,k] = temp_C[i,j,k-1];
.
The following scripts uses FarkasMD scheduling to find schedules for the above program.
# Load Program and store as 'prog'
prog = ReadAlphabets("../../testcases/matrix_product/matrix_product.ab");
# Define a variable 'system' to store the system name
system = "matrix_product";
prdg = BuildPRDG(prog, system, 1);
schedules = FarkasMDScheduler(prdg);
setSchedule(prog, system, schedules);
listSpaceTimeMaps(prog, system);
The output should look like the following:
C (i,j,k->k,j,i) temp_C (i,j,k->k,j,i) ===SpaceTime Maps=== SpaceTimeMap: C (i,j,k->k,j,i) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: temp_C (i,j,k->k,j,i) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL]
Since Farkas scheduler tries to satisfy as much dependencies as possible in the outer dimensions, the k-axis has became the outer most dimension in the resulting schedule.
The following is a similar script for matrix multiply, but uses PLuTo scheduling.
# Load Program and store as 'prog'
prog = ReadAlphabets("../../testcases/matrix_product/matrix_product.ab");
# Define a variable 'system' to store the system name
system = "matrix_product";
prdg = BuildPRDG(prog, system, 1);
schedules = PlutoScheduler(prdg);
setSchedule(prog, system, schedules);
listSpaceTimeMaps(prog, system);
The output should look like the following:
C (i,j,k->i,j,k) temp_C (i,j,k->i,j,k) ===SpaceTime Maps=== SpaceTimeMap: C (i,j,k->i,j,k) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: temp_C (i,j,k->i,j,k) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL]
Note that now the k-axis is the innermost dimension of the resulting schedules. Because PLuTo scheduler have a different cost function, the resulting schedule can be different.
Here is another example to illustrate the difference between Farkas and PLuTo scheduling.
The alphabets program below is a Jacobi-style stencil computation for 1D data.
Because of the dependence (t,i->t,i+1), this program is not tilable with the identity schedule.
affine jacobi_1d {TSTEPS,N | TSTEPS > 2 && N > 5}
given
double A {i|0<=i<N};
returns
double B {t,i|0<=i<N && t==TSTEPS};
using
double temp_B {t,i|0<=i<N && 0<=t<TSTEPS};
through
temp_B[t,i] = case
{|t == 0} : A[i];
{|t > 0 && 1<=i<N-1} : (temp_B[t-1,i-1] + temp_B[t-1,i] + temp_B[t-1,i+1])*0.33333;
{|t > 0 && 0==i } || {|t > 0 && i==N-1 } : temp_B[t-1,i];
esac;
B[t,i] = temp_B[t-1,i];
.
PLuTo Scheduling, which tries to find tilable shcedules, gives the following. Note that the second dimension of the schedules are shifted by t. This is an instance of “skewing”, and this schedule is tilable.
temp_B (t,i->t,i+t) B (t,i->t,i+t) ===SpaceTime Maps=== SpaceTimeMap: temp_B (t,i->t,i+t) [SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: B (t,i->t,i+t) [SEQUENTIAL, SEQUENTIAL]
Farkas Scheduling does not try to produce tilable schedule, and thus gives the following schedule.
temp_B (t,i->t,i) B (t,i->t,i) ===SpaceTime Maps=== SpaceTimeMap: temp_B (t,i->t,i) [SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: B (t,i->t,i) [SEQUENTIAL, SEQUENTIAL]