Message passing via MPI is widely used in single-program, multiple-data (SPMD) parallel programs. Existing data-flow frameworks do not model the semantics of message-passing SPMD programs, which can result in less precise and even incorrect analysis results. We present a data-flow analysis framework for performing interprocedural analysis of message-passing SPMD programs. The framework is based on the MPI-ICFG representation, which is an interprocedural control-flow graph (ICFG) augmented with communication edges between possible send and receive pairs and partial context sensitivity. We show how to formulate nonseparable data-flow analyses within our framework using reaching constants as a canonical example. We also formulate and provide experimental results for the nonseparable analysis, activity analysis. Activity analysis is a domain-specific analysis used to reduce the computation and storage requirements for automatically differentiated MPI programs. Automatic differentiation is important for application domains such as climate modeling, electronic device simulation, oil reservoir simulation, medical treatment planning and computational economics to name a few. Our experimental results show that using the MPI-ICFG data-flow analysis framework improves the precision of activity analysis and as a result significantly reduces memory requirements for the automatically differentiated versions of a set of parallel benchmarks, including some of the NAS Parallel Benchmarks.
Linearity Analysis for Automatic DifferentiationLinearity analysis determines which variables depend on which other variables and whether the dependence is linear or nonlinear. One of the many applications of this analysis is determining whether a loop involves only linear loop-carried dependences and therefore the adjoint of the loop may be reversed and fused with the computation of the original function. This paper specifies the data-flow equations that compute linearity analysis. In addition, the paper describes using linearity analysis with array dependence analysis to determine whether a loop-carried dependence is linear or nonlinear.
Hybrid Static/Dynamic Activity AnalysisIn forward mode Automatic Differentiation, the derivative program computes a function f and its derivatives, f'. Activity analysis is important for AD. Our results show that when all variables are active, the runtime checks required for dynamic activity analysis incur a significant overhead. However, when as few as half of the input variables are inactive, dynamic activity analysis enables an average speedup of 28% on a set of benchmark problems. We investigate static activity analysis combined with dynamic activity analysis as a technique for reducing the overhead of dynamic activity analysis.
Representation-Independent Program AnalysisProgram analysis has many applications in software engineering and high-performance computation, such as program understanding, debugging, testing, reverse engineering, and optimization. A ubiquitous compiler infrastructure does not exist; therefore, program analysis is essentially reimplemented for each compiler infrastructure. The goal of the OpenAnalysis toolkit is to separate analysis from the intermediate representation (IR) in a way that allows the orthogonal development of compiler infrastructures and program analysis. Separation of analysis from specific IRs will allow faster development of compiler infrastructures, the ability to share and compare analysis implementations, and in general quicker breakthroughs and evolution in the area of program analysis. This paper presents how we are separating analysis implementations from IRs with analysis-specific, IR-independent interfaces. Analysis-specific IR interfaces for alias/pointer analysis algorithms and reaching constants illustrate that an IR interface designed for language dependence is capable of providing enough information to support the implementation of a broad range of analysis algorithms and also represent constructs within many imperative programming languages.
Metrics and Models for Reordering TransformationsIrregular applications frequently exhibit poor performance on contemporary computer architectures, in large part because of their inefficient use of the memory hierarchy. Run-time data- and iteration-reordering transformations have been shown to improve the locality and therefore the performance of irregular benchmarks. This paper describes models for determining which combination of run-time data- and iteration-reordering heuristics will result in the best pe rformance for a given dataset. We propose that the data- and iteration-reordering transformations be viewed as approximating minimal linear arrangements on two separate hypergraphs: a spat ial locality hypergraph and a temporal locality hypergraph. Our results measure the efficacy of locality metrics based on these hypergraphs in guiding the selection of data- and iteration-reordering heuristics. We also introduce new iteration- and data-reordering heuristics based on the hypergraph models that result in better performance than do previous heuristics.
Combining Performance Aspects of Irregular Gauss-Seidel via Sparse TilingFinite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.
Rescheduling for Locality in Sparse Matrix ComputationsIn modern computer architecture the use of memory hierarchies causes a program's data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. However, sparse matrix computations have non-affine loop bounds and indirect memory references which prohibit the use of compile time loop transformations. This paper describes an algorithm to tile at runtime called serial sparse tiling. We test a runtime tiled version of sparse Gauss-Seidel on 4 different architectures where it exhibits speedups of up to 2.7. The paper also gives a static model for determining tile size and outlines how overhead affects the overall speedup.
Schedule-Independent Storage Mapping in LoopsThis paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related dependences, but do so at the expense of increased storage. This paper introduces the universal occupancy vector (UOV) for loops with a regular stencil of dependences. The UOV provides a schedule-independent storage reuse pattern that introduces no further dependences (other than those implied by true flow dependences). OV-mapped code requires less storage than full array expansion and only slightly more storage than schedule-dependent minimal storage. We show that determining if a vector is a UOV is NP-complete. However, an easily constructed but possibly non-minimal UOV can be used. We also present a branch and bound algorithm which finds the minimal UOV, while still maintaining a legal UOV at all times. Our experimental results show that the use of OV-mapped storage, coupled with tiling for locality, achieves better performance than tiling after array expansion, and accommodates larger problem sizes than untilable, storage-optimized code. Furthermore, storage mapping based on the UOV introduces negligible runtime overhead.
Sparse Tiling for Stationary Iterative MethodsIn modern computers, a program's data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss-Seidel operating on sparse matrices. In scientific applications such as Finite Element Analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparse tiled Gauss-Seidel generates a solution that is bitwise identical to traditional Gauss-Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al.
Compile-time Composition of Run-time Data and Iteration ReorderingsMany important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compile-time. Prior work has developed run-time reorderings of data and computation that enhance locality in such applications.
This paper presents a compile-time framework that allows the explicit composition of run-time data and iteration-reordering transformations. Our framework builds on the iteration-reordering framework of Kelly and Pugh to represent the effects of a given composition. To motivate our extension, we show that new compositions of run-time reordering transformations can result in better performance on three benchmarks.
We show how to express a number of run-time data and iteration-reordering transformations that focus on improving data locality. We also describe the space of possible run-time reordering transformations and how existing transformations fit within it. Since sparse tiling techniques are included in our framework, they become more generally applicable, both to a larger class of applications, and in their composition with other reordering transformations. Finally, within the presented framework data need be remapped {\em only once} at runtime for a given composition thus exhibiting one example of overhead reductions the framework can express.
@inproceedings{IISWC07,
Month = {September},
Year = {2007},
Title = {FacePerf: Benchmarks for Face Recognition Algorithms.},
Booktitle = {Proceedings of The IEEE International Symposium on Workload Characterization (IISWC)},
Author = {David Bolme and Michelle Mills Strout and Ross Beveridge}}
@inproceedings{SC07,
Month = {November},
Year = {2007},
Title = {Multi-level Tiling: m for the Price of One},
Booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)},
Author = {Daegon Kim and Lakshminarayanan Renganarayana and Dave Rostron and Sanjay Rajopadhye and Michelle Mills Strout}}
@inproceedings{PLDI07,
Month = {June},
Year = {2007},
Title = {Parameterized Tiled Loops for Free},
Booktitle = {Proceedings of the {ACM} {SIGPLAN} Conference on
Programming Language Design and Implementation (PLDI)},
Author = {Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye, and Michelle Mills Strout}}
@InProceedings{Scidac2007,
author = "Erik G Boman, Doruk Bozdag, Umit V Catalyurek, Karen D Devine, Assefaw H Gebremedhin, Paul D Hovland, Alex Pothen, and Michelle Mills Strout",
title = "Enabling high performance computational science through combinatorial algorithms",
booktitle = "Journal of Physics: Conference Series ({SciDAC})",
volume = "78",
month = "June" ,
year = "2007",
}
@InProceedings{StroutICPP2006,
author = "Michelle Mills Strout and Barbara Kreaseck and Paul D. Hovland",
title = "Data-Flow Analysis for MPI Programs",
booktitle = "Proceedings of the International Conference on Parallel
Processing (ICPP)",
month = "August" ,
year = "2006",
}
@InProceedings{StroutADTA2006,
author = "Michelle Mills Strout and Paul D. Hovland",
title = "Linearity Analysis for Automatic Differentiation",
booktitle = "Proceedings of the 3rd International Workshop on
Automatic Differentiation Tools and Applications
(ADTA'06), Reading, England",
month = "May" ,
year = "2006",
}
@InProceedings{KreaseckADTA06,
author = "Barbara Kreaseck and Luis Ramos and Scott
Easterday and Michelle Mills Strout and
Paul Hovland",
title = "Hybrid Static/Dynamic Activity Analysis",
booktitle = "Proceedings of the 3rd International Workshop on
Automatic Differentiation Tools and Applications
(ADTA'06), Reading, England",
month = "May" ,
year = "2006",
}
@inproceedings{StroutPASTE05,
Year = {2005},
Title = {Representation-Independent Program Analysis},
Month = {September 5-6},
Booktitle = {Proceedings of the The sixth {ACM SIGPLAN-SIGSOFT}
Workshop on Program Analysis for Software Tools and
Engineering (PASTE)},
Author = {Michelle Mills Strout and John Mellor-Crummey
and Paul Hovland},
}
@inproceedings{StroutMSP04,
Year = {2004},
Title = {Metrics and Models for Reordering Transformations},
Month = {June 8},
Pages = {23-34},
Booktitle = {Proceedings of the The Second {ACM SIGPLAN} Workshop on
Memory System Performance (MSP)},
Author = {Michelle Mills Strout and Paul D. Hovland},
}
@article{StroutIJHPCA,
Journal = {International Journal of High Performance Computing
Applications},
Year = {2004},
Title = {Sparse Tiling for Stationary Iterative Methods},
Month = {February},
Publisher = {Sage Publications},
Pages = {95-114},
Volume = {18},
Number = {1},
Author = {Michelle Mills Strout and Larry Carter and Jeanne
Ferrante and Barbara Kreaseck}}
@inproceedings{StroutPLDI03,
Month = {June},
Year = {2003},
Title = {Compile-time Composition of Run-time Data and Iteration
Reorderings},
Booktitle = {Proceedings of the 2003 {ACM} {SIGPLAN} Conference on
Programming Language Design and Implementation (PLDI)},
Author = {Michelle Mills Strout and Larry Carter and Jeanne
Ferrante}}
@InProceedings{StroutLCPC2002,
author = "Michelle Mills Strout and Larry Carter and Jeanne
Ferrante and Jonathan Freeman and Barbara Kreaseck",
title = "Combining Performance Aspects of Irregular
Gauss-Seidel via Sparse Tiling",
booktitle = "15th Workshop on Languages and Compilers for
Parallel Computing (LCPC)",
address = "College Park, Maryland",
month = "July 25-27,",
year = "2002",
}
@InProceedings{StroutICCS01,
author = "Michelle Mills Strout and Larry Carter and Jeanne
Ferrante",
title = "Rescheduling for Locality in Sparse Matrix Computations",
booktitle = "Proceedings of the 2001 International Conference on
Computational Science",
address = "San Francisco, CA, USA",
publisher = "Springer-Verlag",
editor = "V.N.Alexandrov and J.J. Dongarra and C.J.K.Tan",
series = "Lecture Notes in Computer Science",
month = "May 28-30,",
year = "2001",
}
@InProceedings{StroutEtAl98,
author = "Michelle Mills Strout and Larry Carter and Jeanne
Ferrante and Beth Simon",
title = "Schedule-Independent Storage Mapping for Loops",
booktitle = "Proceedings of the Eighth International Conference on
Architectural Support for Programming Languages and
Operating Systems",
address = "San Jose, California",
organization = "ACM SIGARCH, SIGOPS, SIGPLAN, and the IEEE Computer
Society",
month = "October 3--7,",
year = "1998",
pages = "24--33",
}
@article{sara99,
author = {Alan Su and Francine Berman and Richard Wolski
and Michelle Mills Strout},
title = {Using {AppLeS} to Schedule a Distributed Visualization
Tool on the Computational Grid},
journal = {International Journal of High Performance Computing
Applications},
year = 1999,
volume = 13
}
@techreport{techProof,
Year = {2003},
Title = {Proof of Correctness for Sparse Tiling of
{Gauss-Seidel}},
Number = {CS2003-0741},
Institution = {University of California, San Diego},
Author = {M. M. Strout and L. Carter and J. Ferrante}}