 |
 |
Research |
M.S. (The PARADIGM Compiler Project)
The PARADIGM Compiler Project (PARAllelizing compiler for
DIstributed-memory General-purpose Multicomputers) addressed the
problem of HPF (High Performance FORTRAN) compilation for distributed
memory machines. I developed and implemented a linear algebra framework
that translated regular HPF programs having arbitrary alignment and
distribution directives into efficient SPMD (single program multiple
data) codes. The framework was integrated into a new version of the
PARADIGM compiler.
Ph.D. (The MATCH Project)
In recent years, there has been a resurgence of interest in implicitly
typed array-based languages following the commercial success of a
number of problem-solving and visualization environments such as MATLAB
and IDL. Through the amalgamation of features like implicit typing,
multidimensional data structures, complex arithmetic, a rich
polymorphic function set, and an interactive interpretive execution
model, these languages have proven to be ideal tools for a style of
program development called exploratory programming in which
the processes of data examination, manipulation, visualization and
incremental code development are tightly knit, allowing for an
intuitive feedback-driven approach to writing code. This in turn
permits the rapid specification, testing and implementation of new
designs and algorithms. Success stories that have benefited from these
systems abound, spanning diverse application domains such as the earth
sciences, medicine, physics and the stock market.
Prior Art
However, the very same rich feature set that these languages provide
have limited their execution scopes to interpretation, causing
implementations to usually suffer from deployment inefficiencies.
Because of this, the last few years have witnessed much activity in
ways to speed up the execution of these languages with particular focus
being on their compilation. In this area, one emphasis has been on type
determination since the quality of the generated code is crucially
dependent on the kind of type information available to the compiler.
Solutions have ranged from "hand-holding" the compiler by providing
type declarations in the form of comments, to static inference
mechanisms that infer and propagate type attributes when possible, and
that resort to a dynamic interpreter-like strategy otherwise. Static
inference mechanisms are attractive due to their automatable nature and
because of their potential to avoid run-time footprints, but their
advantage is currently confined to situations in which the type
attributes are completely determinable at compile time. In situations
where type attributes are unknown, even partially, conservative
run-time resolution is relied upon.
Contributions
A new symbolic shape inference framework that enables the automatic
determination of an array's shape in a prototypical array-based
language called MATLAB has been developed. The approach improves on the
current state of the art in static array shape inference systems in one
significant way: By manipulating shapes algebraically, it deduces
valuable shape information even when array extents are compile-time
unknowns. An ancillary improvement is that unlike previous approaches
that limited themselves to matrices, it forms a complete framework by
encompassing all of MATLAB's multidimensional array constructs,
including empty arrays, and by accommodating the detection of
nonconforming shapes.
Approach
The approach uses the shape semantics of the language's operators
(called built-in functions) in an abstract interpretation of the input
program. Each of these shape semantics can be fully characterized by
formal algebraic systems that expose a number of important properties
borne by them. (Examples of properties would be commutativity,
associativity, idempotency and mixed-operator compositional identities,
to name a few.) Through the combination of symbolic and fixed-point
techniques, these properties can be exploited for code optimizations
even when the actual shapes aren't statically known. The key empowering
observation is that only a few algebras suffice to cover the shape
semantics of a large number of MATLAB's built-in functions. By relying
on a systematic algebraic view of shape, the approach is naturally
scalable to arrays of arbitrary dimensions and thus goes beyond the
previously used tabular view of shape.
A secondary contribution is a complete MATLAB-to-C translator, with an
integrated type inference engine called MAGICA
(MAthematica system for General-purpose Inferring and Compile-time
Analyses), which produces an optimized C translation of a MATLAB
source. The goals were threefold: (1) to show that our approach was
realizable as an efficient implementation, (2) to demonstrate that the
inferred type information goes beyond what has been previously achieved
in MATLAB and APL, and (3) to show that the inferred types permit the
generation of a highly optimized C translation. The inference engine,
which deduces a MATLAB expression's value range, intrinsic type and
array shape, was designed as an external module with two objectives:
(1) to facilitate its use in other MATLAB compiler infrastructures, and
(2) to enable its independent use in the development of new inference
techniques and algorithms.
Applications
A number of applications of the derived shape information to code
optimizations have also been shown. Examples include detecting
redundant shape conformability checks, memory preallocation, code
selection and the inlining of "scalarized" code. Some of the
optimizations that have been considered, like array storage coalescing,
are vital to good performance when dealing with large arrays and
haven't been addressed before in the area of MATLAB compilation.
Implementation
Many features of our translator prototype, as far as we are aware,
haven't been implemented before - for instance, value ranges can be
complex-valued, and array shapes can be multidimensional as well as
symbolic. The implementation is extensive and covers close to 70
built-in functions in MATLAB, and includes support for MATLAB's
user-defined functions, conditional and looping constructs. The
front-end has over 20 passes that perform various transformations on a
static single-assignment (SSA) form, such as dead code elimination,
global common subexpression elimination, constant propagation and array
storage coalescence. We plan to also release the
MAT2C
translator front-end in the near future.
Licensing
A part of the implementation, specifically, a reverse engineered
grammar and lexical specification for MATLAB, has already been licensed
and presently forms the language-backbone of a suite of commercial EDA
(electronic design automation) tools from
AccelChip, Inc.
|
|
|
|
|
 |