MAT2C is an optimizing source-to-source translator system that converts MATLAB programs to stand-alone C code. The system consists of four main parts: (1) a custom frontend that parses MATLAB M-files into an intermediate representation (IR); (2) a type inference engine called MAGICA that takes an encoding of the IR, infers the intrinsic types, array shapes and value ranges of the various represented MATLAB expressions, and decorates the IR with the inferred information; (3) optimization phases like dead-code elimination, common-subexpression elimination (CSE) and array copy propagation that don't require type information, and optimization phases such as code selection and array coalescence that depend on it; and finally (4) a code generator that takes an IR (possibly optimized) annotated with type information and outputs an equivalent C translation. Whole system operation is programmatically controlled through a pass file, command-line options and environment variables.
Download
The complete MAT2C distribution is available for download; all requests for the software should be directed to the associate staff at Northwestern's TTP office.
Some Results: Average Memory Footprints, Execution Times and Compilation Times
This page presents some summarized results pertaining to the quality of the C code generated by MAT2C and the operation of MAT2C itself. The measurements reported here were made on a 440MHz UltraSPARC-IIi Solaris 7 workstation with 128MB of main memory, and were based on the suite of benchmarks listed in Table I below.
Benchmark Synopsis Origin M-Files
adpt Adaptive Quadrature by Simpson's Rule FALCON drv_adapt.m, adapt.m
capr Transmission Line Capacitance Chalmers University of Technology drv_capacitor.m, capacitor.m, linspace.m, gauss.m, seidel.m
clos Transitive Closure OTTER drv_closure.m, closure.m
crni Crank-Nicholson Heat Equation Solver FALCON drv_crnich.m, crnich.m, tridiagonal.m
diff Young's Two-Slit Diffraction Experiment The MathWorks' Central File Exchange drv_diffraction.m, diffraction.m
dich Dirichlet Solution to Laplace's Equation FALCON drv_dirich.m, dirich.m
edit Edit Distance The MathWorks' Central File Exchange drv_editdist.m, editdist.m
fdtd Finite Difference Time Domain (FDTD) Technique Chalmers University of Technology drv_fdtd.m, fdtd.m
fiff Finite-Difference Solution to the Wave Equation FALCON drv_finediff.m, finediff.m
nb1d One-Dimensional N-Body Simulation OTTER drv_nbody1d.m, nbody1d.m
nb3d Three-Dimensional N-Body Simulation Modified nb1d drv_nbody3d.m, nbody3d.m
Table I
Version 6.1 (Release 12) of the MATLAB interpreter and version 2.2 of mcc (The MathWorks' MATLAB-to-C compiler) were used. [ Version 7 (Release 14) and version 4.0.1 are respectively the latest shipments of the two products, as of September 2004. ] Both mcc and mat2c (the MAT2C program) used the Sun Workshop C compiler (version 5.1) for back-end compilation. The type inference engine used was version 1.0 of MAGICA, which was executed on version 4.1 of the Mathematica kernel.

The back-end C compiler in both cases was passed the -xO4 and -xlibmil options that turn on a host of global and local optimizations, and inline certain library routines for faster execution. All optimizations offered by mcc were turned on. To minimize memory usage, the MATLAB interpreter was always run with the -nojvm option. This suppresses the loading of a Java virtual machine that allows a MATLAB session to draw on Java's capabilities.

Average Stack Trends
When a C program begins execution, there is already one stack frame on its run-time stack that contains the initial process environment such as the argc and argv parameters and the array of strings representing the process's environment variables. Thus, the stack segment, which grows in units of pages, will initially be at least one page in size; this is 8KB on the Solaris 7 UltraSPARC-IIi. Further procedure invocations from main can cause the stack segment to grow depending on how local variables are allocated and referenced in the invoked functions. Because the mcc C codes bank on the heap for all array allocations and use function interfaces only for the passing and allocation of handles to these arrays, the high watermark of their run-time stacks shouldn't be expected to be large. Indeed, the mcc C codes for all programs in the benchmark suite were found to have a stack segment size that grows to 16KB and stays at that. This is why the average stack segment size of the mcc C codes, shown in Figure I, is 16KB.
Fig. 1
Figure I: Average Stack, and Stack+Heap Levels
Figure I also shows four prominent peaks for the average stack segment size of the mat2c C codes for the clos, crni, fdtd, and fiff benchmarks. This is because mat2c allocates all arrays in these benchmarks on the stack. In the other benchmarks, significant percentages of the array shapes were symbolic; this led to their storage being primarily allocated on the heap. The exception was dich in which though 100% of the array shapes were statically inferred, the overall average stack segment size was only about 17.5KB because most of the arrays in this benchmark were small.
Average Heap Trends
Figure I also shows the average stack and heap space sums across all benchmarks. All average memory sizes, be it stack size, virtual memory level or resident set level, were calculated using a weighted time-averaged formula [ 1 ].

In general, the average dynamic program data sizes of mat2c C codes were smaller than that of mcc C codes; the relative reductions were over 20% in 7 out of 11 cases, being over 100% in over half of the cases. In the case of capr, though the mcc C code fared better by a small margin, it still has a far higher kcore-min value as discussed below. (For capr, the average dynamic program data sizes for the mcc and mat2c C codes were 2428.02KB and 2506.75KB.)

KCore-Min Reductions
An important point that the average figures don't uncover is the duration of consumption of a memory resource. If processes P1 and P2 both consume x kilobytes of memory, P1 for t seconds and P2 for 2t seconds, then both will have the same time-averaged memory consumption level but clearly P2 will be the bigger memory hog. To take into consideration the effect of time, UNIX systems use a metric called the kcore-min value of a process that is computed as

kcore-min = M * T

where M is the mean memory size in kilobytes and T is the duration of memory usage in minutes.

Thus, even though the average dynamic program data size of the mat2c C code was close to that of the mcc C code in capr and nbody3d, the mat2c versions have lower kcore-min values because of their shorter execution times. The relative kcore-min reductions were 102.3% and 71.5% in these two cases and are shown using arrow markers in Figure I. Though not shown in Figure I, the mat2c C codes for all other benchmarks also exhibit kcore-min reductions because besides having lower size averages, they also usually have significantly lower execution times as shown in Figure III.

To obtain the complete picture, the average virtual memory consumption level of the mat2c and mcc C codes, which includes all swapped-out pages, mapped files and devices, is shown in Figure II.

Fig. 2
Figure II: Average Virtual Memory Levels
Execution Times
Figure III compares execution times of the mat2c and mcc C codes on a log scale. The figure also shows the times taken by the MATLAB interpreter to execute the benchmarks. Indicated above the bars are the performance speedups of the mat2c C codes over the mcc C codes. In only one case was the speedup marginal - 10% for the adpt benchmark. The adpt benchmark has a parameter called tol that specifies the error tolerance in the quadrature values produced. Decreasing this value decreases the fractional overhead due to type checking, making numerical computation the dominant work in the benchmark. Measurements for adpt were taken with tol set at 10-12, which is the setting in the FALCON benchmark suite. Because increasing tol increases the relative overhead due to type checking, the mat2c C code performs much better than the mcc C code at higher values of tol. In all other benchmarks, the speedups ranged from 30% and 74% in two cases, to over 100% in the remaining. In fact, in 4 out of 11 benchmarks, the speedups were dramatic, being over an order of magnitude.
Fig. 3
Figure III: Comparative Execution Times
Fig. 4
Figure IV: Compilation Times' Breakup
Compilation Times
Finally, Figure IV displays the times taken by mat2c to compile the MATLAB sources to C code, and the time expended by the back-end compiler (denoted as bcc) to take the emitted C code down to binary. The distribution of times across the various passes in the MATLAB-to-C conversion phase is shown in the figure. "MathLink" refers to the portion of time required to encode and transfer the IR to MAGICA for inferring types; it excludes the type inference time, which is shown separately in the bar chart. "GCTD" indicates the Graph Coloring with Type-based Decomposition pass in mat2c; details about this pass are available in [ 2 ].
References
[ 1 ] Pramod G. Joisha, Prithviraj Banerjee. Static Array Storage Optimization in MATLAB. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), San Diego, USA. June 2003.

[ 2 ] Pramod G. Joisha. A Type Inference System for MATLAB with Applications to Code Optimization. Doctoral Dissertation. Electrical and Computer Engineering Department, Northwestern University. December 2003.

Last updated on 09/22/2004.