CS-680 PROJECT:

Experiment with an Empircal Program Optimizer

Michel SIKA
04-03-22

Abstract:

Conventional compilers use code optimization techniques and architecture models to maximize program performance. Another approach is to generate multiple versions of a program, and establish the optimal performance by running the different implementations on the architecture; this is the work of Empirical Program Optimizers. One such tool is ATLAS. ATLAS (Automatically Tuned Linear Algebra Software) is a software package designed to automatically generate highly optimized Linear Algebra kernels for arbitrary cache-based architectures. ATLAS has an interface to a group of  high quality elementary routines for performing basic vector and matrix operations called the BLAS (Basic Linear Algebra Subprograms).

In this project will attempt to find the optimal block size for
simple Matrix Multiplication on two architectures (Sparc SunOs & Intel based Linux). We will compare the performance for analytically predicted parameters against that of emprically predicted parameters obtained by ATLAS. In the process we will attempt to establish a correlation between the empirically determined parameters and the features of the architecture. We will compare our findings against published results. Time permitting, we will attempt to quantify any performance enhancements contributed by emprical program optimization relative to simple compiler code optimization.

Projected Experimentation:

I - Building an Analytical Model
    1) Matrix Multiplication using elementary level3 BLAS
    2) Prototyping ATLAS Optimization Parameters for the BLAS Matrix Multiplication

II - SPARC/SunOS (QUEEN)
    1) Analytical Model

   
    2) Empirical Model
        2.1) ATLAS Setup and Installation
        2.2) ATLAS Optimization Parameters
        2.3) Performance of gemm using ATLAS Optimization Parameters
   
    3) Compiler Optimized BLAS (Plain BLAS)
        - Instruction Count
        -
Cache Miss Rate

    4) Analysis
       4.1) Empirical versus simple compiler optimized
       4.2) Analytical model versus Empirical model


III - Intel Xeon/Linux  

IV - Conclusions

V - Appendices
  
Appendix A: Intel System Properties
    Appendix B: Sun System Properties
    Appendix C: BLAS & ATLAS setup/install on the Intel System

    Appendix D: BLAS & ATLAS setup/install on the Sun System
    Appendix E: Sun Source Code
    Appendix F: Intel Source Code

References:

[1] Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. A Comparison of Empirical and Model-driven Optimization. Programming Language Design and Implementation, 6/9/2003

[2] B.B. Fraguela, R. Doallo, E.L. Zapata. Automatic Analytical Modeling for the Estimation of Cache Misses. Intl. Conf. on Parallel Architectures and Compilation Techniques, PACT'99, pp. 221-231. Newport Beach, October 1999.


[3] Hennessy, Patterson. Computer Architecture: A Quantitative Approach,
2nd Edition, Chapter 5.  Morgan Kaufmann, 1996

[4] R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimization of Software and the ATLAS Project. UT-CS-00-448, September 2000.

[5]
IA-32 Intel(R) Architecture Optimization Reference Manual (Order Number 248966) Intel® Xeon™

[6] UltraSparc IIi(R) User's Manual (805-0087)

ATLAS:
Documentation:
   
http://math-atlas.sourceforge.net
    faq
    errata

Source Code:
    http://www.netlib.org
    atlas3.6.0
BLAS:
Documentation:
    faq
    blas-forum
API reference:
    http://math-atlas.sourceforge.net/psdoc/cblasqref.ps

BLAS implementation used:
    A Reference Implementation for Extended and Mixed Precision BLAS
    http://crd.lbl.gov/~xiaoye/XBLAS/

GCC:
gcc-3.3.3 (Feb 24th 2004)
http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/index.html
http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Option-Index.htm