Research

M.S. (The PARADIGM Compiler Project)

The PARADIGM Compiler Project (PARAllelizing compiler for DIstributed-memory General-purpose Multicomputers) addressed the problem of HPF (High Performance FORTRAN) compilation for distributed memory machines. I developed and implemented a linear algebra framework that translated regular HPF programs having arbitrary alignment and distribution directives into efficient SPMD (single program multiple data) codes. The framework was integrated into a new version of the PARADIGM compiler.

Ph.D. (The MATCH Project)

In recent years, there has been a resurgence of interest in implicitly typed array-based languages following the commercial success of a number of problem-solving and visualization environments such as MATLAB and IDL. Through the amalgamation of features like implicit typing, multidimensional data structures, complex arithmetic, a rich polymorphic function set, and an interactive interpretive execution model, these languages have proven to be ideal tools for a style of program development called exploratory programming in which the processes of data examination, manipulation, visualization and incremental code development are tightly knit, allowing for an intuitive feedback-driven approach to writing code. This in turn permits the rapid specification, testing and implementation of new designs and algorithms. Success stories that have benefited from these systems abound, spanning diverse application domains such as the earth sciences, medicine, physics and the stock market.

Prior Art However, the very same rich feature set that these languages provide have limited their execution scopes to interpretation, causing implementations to usually suffer from deployment inefficiencies. Because of this, the last few years have witnessed much activity in ways to speed up the execution of these languages with particular focus being on their compilation. In this area, one emphasis has been on type determination since the quality of the generated code is crucially dependent on the kind of type information available to the compiler. Solutions have ranged from "hand-holding" the compiler by providing type declarations in the form of comments, to static inference mechanisms that infer and propagate type attributes when possible, and that resort to a dynamic interpreter-like strategy otherwise. Static inference mechanisms are attractive due to their automatable nature and because of their potential to avoid run-time footprints, but their advantage is currently confined to situations in which the type attributes are completely determinable at compile time. In situations where type attributes are unknown, even partially, conservative run-time resolution is relied upon.

Contributions A new symbolic shape inference framework that enables the automatic determination of an array's shape in a prototypical array-based language called MATLAB has been developed. The approach improves on the current state of the art in static array shape inference systems in one significant way: By manipulating shapes algebraically, it deduces valuable shape information even when array extents are compile-time unknowns. An ancillary improvement is that unlike previous approaches that limited themselves to matrices, it forms a complete framework by encompassing all of MATLAB's multidimensional array constructs, including empty arrays, and by accommodating the detection of nonconforming shapes.

Approach The approach uses the shape semantics of the language's operators (called built-in functions) in an abstract interpretation of the input program. Each of these shape semantics can be fully characterized by formal algebraic systems that expose a number of important properties borne by them. (Examples of properties would be commutativity, associativity, idempotency and mixed-operator compositional identities, to name a few.) Through the combination of symbolic and fixed-point techniques, these properties can be exploited for code optimizations even when the actual shapes aren't statically known. The key empowering observation is that only a few algebras suffice to cover the shape semantics of a large number of MATLAB's built-in functions. By relying on a systematic algebraic view of shape, the approach is naturally scalable to arrays of arbitrary dimensions and thus goes beyond the previously used tabular view of shape.

A secondary contribution is a complete MATLAB-to-C translator, with an integrated type inference engine called MAGICA (MAthematica system for General-purpose Inferring and Compile-time Analyses), which produces an optimized C translation of a MATLAB source. The goals were threefold: (1) to show that our approach was realizable as an efficient implementation, (2) to demonstrate that the inferred type information goes beyond what has been previously achieved in MATLAB and APL, and (3) to show that the inferred types permit the generation of a highly optimized C translation. The inference engine, which deduces a MATLAB expression's value range, intrinsic type and array shape, was designed as an external module with two objectives: (1) to facilitate its use in other MATLAB compiler infrastructures, and (2) to enable its independent use in the development of new inference techniques and algorithms.

Applications A number of applications of the derived shape information to code optimizations have also been shown. Examples include detecting redundant shape conformability checks, memory preallocation, code selection and the inlining of "scalarized" code. Some of the optimizations that have been considered, like array storage coalescing, are vital to good performance when dealing with large arrays and haven't been addressed before in the area of MATLAB compilation.

Implementation Many features of our translator prototype, as far as we are aware, haven't been implemented before - for instance, value ranges can be complex-valued, and array shapes can be multidimensional as well as symbolic. The implementation is extensive and covers close to 70 built-in functions in MATLAB, and includes support for MATLAB's user-defined functions, conditional and looping constructs. The front-end has over 20 passes that perform various transformations on a static single-assignment (SSA) form, such as dead code elimination, global common subexpression elimination, constant propagation and array storage coalescence. We plan to also release the MAT2C translator front-end in the near future.

Licensing A part of the implementation, specifically, a reverse engineered grammar and lexical specification for MATLAB, has already been licensed and presently forms the language-backbone of a suite of commercial EDA (electronic design automation) tools from AccelChip, Inc.