To: opensource@cs.utexas.edu
From: field@cs.utexas.edu
Subject: Co-op Open Source Software Competition submission

---------------------------------------------------------------------

Category:

Faculty/Staff

---------------------------------------------------------------------

Team members:

Ernie Chan
Graduate Research Assistant (student, PhD program)
Department of Computer Sciences
echan@cs.utexas.edu
(Enrolled in PhD program since Fall 2005.)

Robert A. van de Geijn
Professor
Department of Computer Sciences
rvdg@cs.utexas.edu
(Original founder and Principle Investigator of FLAME project.)

Field G. Van Zee
Research Engineering/Scientist Associate II
Department of Computer Sciences
field@cs.utexas.edu
(Employed under current title since June 2006 and previously as GRA since 2004.)

---------------------------------------------------------------------

Name of submitted software:

libflame

---------------------------------------------------------------------

Brief overview:

The objective of the FLAME project is to transform the development of
dense linear algebra libraries from an art reserved for experts to a
science that can be understood by novice and expert alike. Rather than
being only a library, the project encompasses a new notation for
expressing algorithms, a methodology for systematic derivation of
algorithms, Application Program Interfaces (APIs) for representing the
algorithms in code, and tools for mechanical derivation, implementation
and analysis of algorithms. Interfaces to high-performance sequential
and multithreaded implementations are also provided.

---------------------------------------------------------------------

Justification:

Real-world instances of linear algebra are ubiquitous, with non-trivial 
applications found in countless disciplines, including (but certainly not 
limited to) most of the classic natural sciences such as chemistry and 
physics, virtually every flavor of engineering, acoustics and signal 
processing, applied statistics, and even social sciences and economics. 
Our reliance on computers to help us perform linear algebra computations
is undeniable.
But our productivity hinges upon our ability to effectively implement these
operations in software.

Linear algebra software has been available under various free and 
public domain licenses since the 1970's.
The University of Tennessee at Knoxville led the effort to standardize
interfaces to linear algebra software routines, resulting in so-called
"reference" implementations to the Basic Linear Algebra Subprograms (BLAS),
the Linear Algebra Package (LAPACK), and the Scalable Linear Algebra
Package (ScaLAPACK), with each serving as a building block to the next.
Over time, these packages became widely accepted both inside and outside
the scientific computing community.
However, in the decades since their inception, the face of computing has
changed considerably.
Fortran-77 is no longer the dominant language for programming scientific
applications (let alone applications that employ linear algebra).
Vector computers have been replaced by architectures with multiple levels
of cache.
CPU speeds have increased much faster than memory, making it more difficult
to amortize the cost of memory transactions over subsequent computation.
And recently, chips with multiple processing cores have become the norm
due to the growing infeasibility of speeding up a single microprocessor.
This paradigm shift has significant programmability implications, as it
will force developers to parallelize their code if they wish to realize
performance gains with newer processors.

In recent years, the FLAME research group has identified several critical 
shortcomings in the products of the BLAS, LAPACK, and ScaLAPACK projects, 
which we have documented extensively in our various publications.
The FLAME project addresses these shortcomings by providing a modern
alternative library package, libflame, released as free software under the
Lesser General Public License (LGPL).
What follows is an enumeration of the features of libflame, with
comparisons to the traditional BLAS/LAPACK libraries when appropriate.

* A solution based on fundamental computer science
The FLAME project advocates a new approach to developing linear algebra
libraries. 
Algorithms are obtained systematically according to rigorous principles
of formal derivation. 
These methods are based on fundamental theorems of computer science to
guarantee that the resulting algorithm is also correct. 
In addition, the FLAME methodology uses a new, more stylized notation 
for expressing loop-based linear algebra algorithms. 
This notation closely resembles how algorithms are naturally illustrated 
with pictures. 
(See Figures 1 and 2(left) at http://www.cs.utexas.edu/users/flame/co-op/) 

* Object-based abstractions and API
The BLAS, LAPACK, and ScaLAPACK projects place backward compatibility 
as a high priority, which hinders progress towards adopting modern
software engineering principles such as object abstraction. 
libflame is built around opaque structures that hide implementation details
of matrices, such as leading dimensions, and exports object-based
programming interfaces to operate upon these structures. 
Likewise, FLAME algorithms are expressed (and coded) in terms of smaller
operations on sub-partitions of the matrix operands.
This abstraction facilitates programming without array or loop indices, 
which allows the user to avoid painful index-related programming errors 
altogether. 
Figure 2 at http://www.cs.utexas.edu/users/flame/co-op/ compares the
coding styles of libflame and LAPACK, highlighting the inherent elegance
of FLAME code and its striking resemblance to the corresponding FLAME
algorithm shown in Figure 1. 
This similarity is quite intentional, as it preserves the clarity of the
original algorithm as it would be illustrated on a white-board or in a
publication.

* Educational value
Aside from the potential to introduce students to formal algorithm 
derivation, FLAME serves as an excellent vehicle for teaching linear 
algebra algorithms in a classroom setting. 
The clean abstractions afforded by the API also make FLAME ideally suited 
for instruction of high-performance linear algebra courses at the 
undergraduate and graduate level. 
Dr. Robert van de Geijn routinely uses FLAME in his linear algebra and
numerical analysis courses. 
Some colleagues of the FLAME project are even beginning to use the 
notation to teach classes elsewhere around the country, including 
Timothy Mattson of Intel Corporation. 
Historically, the BLAS/LAPACK style of coding has been used in these
settings. 
However, coding in this manner tends to obscure the algorithms;
students often get bogged down debugging the frustrating errors that 
often result from indexing directly into arrays that represent the 
matrices. (See Figure 2 cited above.)

* A complete dense linear algebra framework
Like LAPACK, libflame provides ready-made implementations of common linear 
algebra operations. 
The implementations found in libflame mirror many of those found in the 
BLAS and LAPACK packages. 
However, unlike LAPACK, libflame provides a framework for building 
complete custom linear algebra codes.
We believe such an environment is more useful as it allows the user to
quickly prototype a linear algebra solution to fit the needs of his
application.  

* High performance
In our publications and performance graphs, we do our best to dispel the
myth that user- and programmer-friendly linear algebra codes cannot yield
high performance.
Our FLAME implementations of operations such as Cholesky
factorization and Triangular Inversion often outperform the corresponding
implementations available in the LAPACK library. 
Figure 3 (http://www.cs.utexas.edu/users/flame/co-op/) shows an example
of the performance increase possible by using libflame compared to LAPACK. 
Many instances of the libflame performance advantage result from the 
fact that LAPACK provides only one variant (algorithm) of every operation, 
while libflame provides all known variants. 
This allows the user and/or library developer to choose which algorithmic
variant is most appropriate for a given situation.
libflame relies only on the presence of a core set of highly optimized
unblocked routines to perform the small sub-problems found in FLAME
algorithm codes. 

* Dependency-aware multithreaded parallelism
Until recently, the authors of the BLAS and LAPACK advocated getting 
shared-memory parallelism from LAPACK routines by simply linking to 
multithreaded BLAS. 
This low-level solution requires no changes to LAPACK code but also
suffers from sharp limitations in terms of efficiency and scalability
for small- and medium-sized matrix problems.
The fundamental bottleneck to introducing parallelism directly within 
many algorithms is the web of data dependencies that inevitably exists
between sub-problems. 
The libflame project has developed a runtime system, SuperMatrix, to
detect and analyze dependencies found within FLAME algorithms-by-blocks
(algorithms whose sub-problems operate only on block operands). 
Once dependencies are known, the system schedules sub-operations to 
independent threads of execution.
This system is completely abstracted from the algorithm that is being
parallelized and requires virtually no change to the algorithm code,
but at the same time exposes abundant high-level parallelism.
The most recent version of LAPACK does not offer any similar mechanism.

* Support for hierarchical storage-by-blocks
Storing matrices by blocks, a concept advocated years ago by Fred 
Gustavson of IBM, often yields performance gains through improved spatial 
locality. 
Instead of representing matrices as a single linear array of data with a 
prescribed leading dimension as legacy libraries require (for column- or
row-major order), the storage scheme is encoded into the matrix object. 
Here, internal elements refer recursively to child objects that represent 
sub-matrices. 
Currently, libflame provides a subset of the conventional API that supports 
hierarchical matrices, allowing users to create and manage such matrix 
objects as well as convert between storage-by-blocks and conventional 
"flat" storage schemes. 

* Advanced build system
From its early revisions, libflame distributions have been bundled with a
robust build system, featuring automatic makefile creation and a
configuration script conforming to GNU standards (allowing the user to run
the "./configure; make; make install" sequence common to many open source
software projects). 
Without any user input, the configure script searches for and chooses
compilers based on a pre-defined preference order for each architecture. 
The user may request specific compilers via the configure interface, or 
enable other non-default features of libflame such as custom memory 
alignment, multithreading (via POSIX threads or OpenMP), compiler options 
(debugging symbols, warnings, optimizations), and memory leak detection. 
The reference BLAS and LAPACK libraries provide no configuration
support and require the user to manually modify a makefile with
appropriate references to compilers and compiler options depending on the
host architecture.

---------------------------------------------------------------------

Files included with this email:

 - README.txt

 - INSTALL.txt

 - FLAME_API.pdf

The source code may be found at the following URL as a gzipped tarball:

 - http://www.cs.utexas.edu/users/flame/co-op/libflame.tar.gz

README.txt provides brief overview of the software while INSTALL.txt explains
how to configure, build, and install the library as well as run an example
program.

---------------------------------------------------------------------

Additional documentation:

The attached PDF file FLAME_API.pdf provides a thorough discussion of
the FLAME Application Program Interfaces. This document was published
in ACM Transactions on Mathematical Software in March 2005. While the
material is somewhat dated, the Appendix should give the reader a
somewhat broader taste of the functions found in libflame. 

We are in the process of writing a book to fully document all of the
programming interfaces available in libflame. Unfortunately, this book
is in its early stages and would not be of use to the judges of this
competition. However, we do have a URL to automatically-generated
doxygen documentation: 

http://www.cs.utexas.edu/users/flame/libflame/html/

This website provides a browsable list of all the files and functions
found in libflame along with a directory map of the source tree.  

In terms of keystrokes, the FLAME code in Figure 2(left) cited earlier
may not appear to be easier to code than the equivalent Fortran-77
implementation, even if the FLAME code looks more pleasing. Remarkably,
most FLAME code is never written by hand. We employ a web tool called
Spark, which generates code skeletons for FLAME algorithms given a few
simple inputs. This web interface helps developers quickly prototype
FLAME algorithms. Spark can be accessed by visiting:

http://www.cs.utexas.edu/users/flame/Spark/

Note that Spark was designed as a tool to aid FLAME developers rather than
a portal for summoning fully-functional linear algebra implementations.
Still, many users may be unaccustomed to programming without indices. We
have been developing a Wiki page to explain many fundamental linear algebra
concepts using FLAME notation. The website serves as an online tutorial,
designed to acquaint novices with the advantages of coding using libflame:

http://www.linearalgebrawiki.org/

The FLAME project encompasses research involving many aspects of numerical
linear algebra. libflame started as a repository for the working prototypes
that resulted from these research projects and, over time, matured into a
complete set of linear algebra implementations. Our group frequently
publishes its findings at conferences and in scientific journals. Copies of
all of these publications may be found at: 

http://www.cs.utexas.edu/users/flame/Publications/index.html

In closing, we thank you for your consideration in the Co-op Open Source
Software Competition!

The core FLAME developers,
Ernie Chan
Robert A. van de Geijn
Field G. Van Zee

