CS-680
PROJECT:
Experiment
with an Empircal Program Optimizer
Michel SIKA
04-03-22
Abstract:
Conventional
compilers use code optimization techniques and architecture models to
maximize program performance. Another approach is to generate multiple
versions of a program, and establish the optimal performance by running
the different implementations on the architecture; this is the work of
Empirical Program Optimizers. One such tool is ATLAS. ATLAS
(Automatically Tuned Linear Algebra Software) is a software package
designed to automatically generate highly optimized Linear Algebra
kernels for arbitrary cache-based architectures. ATLAS has an interface
to a group of high quality
elementary
routines for performing basic vector and matrix operations called the BLAS (Basic Linear
Algebra Subprograms).
In this project will attempt to find the optimal block size for simple Matrix
Multiplication on two architectures
(Sparc SunOs & Intel based Linux). We will compare the performance
for analytically predicted parameters against that of emprically predicted parameters obtained
by ATLAS. In the process we will attempt to establish a
correlation between the empirically determined parameters and the
features of the architecture. We will compare our
findings against published results. Time permitting, we will attempt to
quantify any performance enhancements contributed by emprical program
optimization relative to simple compiler code optimization.
Projected
Experimentation:
I - Building an Analytical
Model
1) Matrix Multiplication using elementary level3 BLAS
2) Prototyping ATLAS Optimization Parameters for the
BLAS Matrix Multiplication
II - SPARC/SunOS
(QUEEN)
1) Analytical
Model
2) Empirical Model
2.1) ATLAS Setup and Installation
2.2) ATLAS Optimization Parameters
2.3) Performance of gemm using
ATLAS Optimization Parameters
3) Compiler Optimized BLAS (Plain BLAS)
- Instruction Count
- Cache Miss Rate
4) Analysis
4.1) Empirical versus simple compiler
optimized
4.2) Analytical model versus Empirical
model
III - Intel
Xeon/Linux
IV - Conclusions
V - Appendices
Appendix A: Intel System Properties
Appendix B: Sun
System Properties
Appendix C: BLAS
& ATLAS setup/install on the Intel System
Appendix D: BLAS & ATLAS setup/install on the
Sun System
Appendix E: Sun Source Code
Appendix F: Intel Source Code
References:
[1] Kamen
Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong,
Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng
Wu. A Comparison of
Empirical and Model-driven Optimization. Programming Language Design and
Implementation, 6/9/2003
[2] B.B. Fraguela,
R. Doallo, E.L. Zapata. Automatic
Analytical Modeling for the Estimation of Cache Misses. Intl. Conf.
on Parallel Architectures and Compilation Techniques, PACT'99, pp.
221-231. Newport Beach, October 1999.
[3] Hennessy, Patterson. Computer Architecture: A Quantitative
Approach, 2nd Edition, Chapter 5.
Morgan Kaufmann, 1996
[4] R.
Whaley, A. Petitet, and J. Dongarra. Automated Empirical
Optimization of Software and the ATLAS Project. UT-CS-00-448, September 2000.
[5] IA-32 Intel(R) Architecture
Optimization Reference Manual (Order Number 248966) Intel®
Xeon™
[6] UltraSparc
IIi(R) User's Manual (805-0087)
ATLAS:
Documentation:
http://math-atlas.sourceforge.net
faq
errata
Source Code:
http://www.netlib.org
atlas3.6.0
BLAS:
Documentation:
faq
blas-forum
API reference:
http://math-atlas.sourceforge.net/psdoc/cblasqref.ps
BLAS implementation used:
A
Reference Implementation for Extended and Mixed
Precision BLAS
http://crd.lbl.gov/~xiaoye/XBLAS/
GCC:
gcc-3.3.3
(Feb 24th 2004)
http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/index.html
http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Option-Index.htm