 |
 |
 |
 |
 |
| MAT2C is an optimizing source-to-source
translator system that converts MATLAB programs to
stand-alone C code. The system consists of four main parts:
(1) a custom frontend that parses MATLAB M-files into an
intermediate representation (IR);
(2) a type inference engine called
MAGICA that takes an encoding of the IR, infers the
intrinsic types, array shapes and value ranges of the various
represented MATLAB expressions, and decorates the IR with the
inferred information;
(3) optimization phases like dead-code elimination,
common-subexpression elimination (CSE) and array copy
propagation that don't require type information, and
optimization phases such as code selection and array
coalescence that depend on it; and finally
(4) a code generator that takes an IR (possibly optimized)
annotated with type information and outputs an equivalent C
translation. Whole system operation is programmatically
controlled through a pass file, command-line options and
environment variables. |
 |
| Download |
 |
| The complete MAT2C distribution is available
for download; all requests for the software should be directed
to the associate
staff at Northwestern's
TTP office. |
 |
| Some Results: Average Memory Footprints,
Execution Times and Compilation Times |
 |
| This page presents some summarized results
pertaining to the quality of the C code generated by MAT2C
and the operation of MAT2C itself.
The measurements reported here were made on a 440MHz
UltraSPARC-IIi Solaris 7 workstation with 128MB of main memory,
and were based on the suite of benchmarks listed in Table I
below. |
 |
 |
Benchmark
|
Synopsis
|
Origin
|
M-Files
|
 |
| adpt |
Adaptive Quadrature by Simpson's Rule
|
FALCON
|
drv_adapt.m,
adapt.m
|
| capr |
Transmission Line Capacitance
|
Chalmers
University of Technology
|
drv_capacitor.m,
capacitor.m,
linspace.m,
gauss.m,
seidel.m
|
| clos |
Transitive Closure
|
OTTER
|
drv_closure.m,
closure.m
|
| crni |
Crank-Nicholson Heat Equation Solver
|
FALCON
|
drv_crnich.m,
crnich.m,
tridiagonal.m
|
| diff |
Young's Two-Slit Diffraction Experiment
|
The MathWorks'
Central File Exchange
|
drv_diffraction.m,
diffraction.m
|
| dich |
Dirichlet Solution to Laplace's Equation
|
FALCON
|
drv_dirich.m,
dirich.m
|
| edit |
Edit Distance
|
The MathWorks'
Central File Exchange
|
drv_editdist.m,
editdist.m
|
| fdtd |
Finite Difference Time Domain (FDTD) Technique
|
Chalmers
University of Technology
|
drv_fdtd.m,
fdtd.m
|
| fiff |
Finite-Difference Solution to the Wave Equation
|
FALCON
|
drv_finediff.m,
finediff.m
|
| nb1d |
One-Dimensional N-Body Simulation
|
OTTER
|
drv_nbody1d.m,
nbody1d.m
|
| nb3d |
Three-Dimensional N-Body Simulation
|
Modified nb1d
|
drv_nbody3d.m,
nbody3d.m
|
|
 |
| Table I |
|
 |
|
 |
|
Version 6.1 (Release 12) of the MATLAB interpreter and version
2.2 of mcc
(The MathWorks' MATLAB-to-C compiler) were used.
[ Version 7 (Release 14) and version 4.0.1 are respectively the
latest shipments of the two products, as of September 2004. ]
Both mcc and mat2c (the MAT2C program) used the
Sun
Workshop C compiler (version 5.1) for back-end compilation.
The type inference engine used was version 1.0 of MAGICA,
which was executed on version 4.1 of the
Mathematica kernel.
The back-end C compiler in both cases was passed the
-xO4 and -xlibmil options that turn on a host
of global and local optimizations, and inline certain library
routines for faster execution. All optimizations offered by
mcc were turned on. To minimize memory
usage, the MATLAB interpreter was always run with the
-nojvm option.
This suppresses the loading of a Java virtual machine that
allows a MATLAB session to draw on Java's capabilities.
|
 |
| Average Stack Trends |
 |
| When a C program begins execution, there is
already one stack frame on its run-time stack that contains the
initial process environment such as the argc and
argv parameters and the array of strings representing
the process's environment variables. Thus, the stack segment,
which grows in units of pages, will initially be at least one
page in size; this is 8KB on the Solaris 7 UltraSPARC-IIi.
Further procedure invocations from main can cause the
stack segment to grow depending on how local variables are
allocated and referenced in the invoked functions. Because the
mcc C codes bank on the heap for all
array allocations and use
function interfaces only for the passing and allocation of
handles to these arrays, the high watermark of their
run-time stacks shouldn't be expected to be large. Indeed, the
mcc C codes for all programs in the
benchmark suite were found to have a stack segment size that
grows to 16KB and stays at that. This is why the average stack
segment size of the mcc C codes, shown
in Figure I, is 16KB. |
 |
 |
 |
| Figure I: Average Stack, and
Stack+Heap Levels |
|
 |
| Figure I also shows four prominent peaks for
the average stack segment size of the mat2c C codes for the clos, crni,
fdtd, and fiff
benchmarks. This is because mat2c
allocates all arrays in these benchmarks on the stack. In the
other benchmarks, significant percentages of the array shapes
were symbolic; this led to their storage being primarily
allocated on the heap. The exception was dich in which though 100% of the array
shapes were statically inferred, the overall average stack
segment size was only about 17.5KB because most of the arrays
in this benchmark were small.
|
 |
| Average Heap Trends |
 |
| Figure I also shows the average stack and heap
space sums across all benchmarks. All average memory sizes, be
it stack size, virtual memory level or resident set level, were
calculated using a weighted time-averaged formula
[ 1 ].
In general, the average dynamic program data sizes of
mat2c C codes were smaller than that of
mcc C codes; the relative reductions
were over 20% in 7 out of 11 cases, being over 100% in over
half of the cases. In the case of capr,
though the mcc C code fared better by a
small margin, it still has a far higher kcore-min value as
discussed below. (For capr, the average
dynamic program data sizes for the mcc
and mat2c C codes were 2428.02KB and
2506.75KB.)
|
 |
| KCore-Min Reductions |
 |
An important point that the average figures
don't uncover is the duration of consumption of a memory
resource. If processes P1 and
P2 both consume x kilobytes of
memory, P1 for t seconds and
P2 for 2t seconds, then both will
have the same time-averaged memory consumption level but
clearly P2 will be the bigger memory hog.
To take into consideration the effect of time, UNIX systems use
a metric called the kcore-min value of a process that
is computed as
kcore-min = M * T
where M is the mean memory size in kilobytes and
T is the duration of memory usage in minutes.
Thus, even though the average dynamic program data size of the
mat2c C code was close to that of the
mcc C code in capr
and nbody3d, the
mat2c versions have lower kcore-min
values because of their shorter execution times. The relative
kcore-min reductions were 102.3% and 71.5% in these two cases
and are shown using arrow markers in Figure I. Though not shown
in Figure I, the mat2c C codes for all
other benchmarks also exhibit kcore-min reductions because
besides having lower size averages, they also usually have
significantly lower execution times as shown in Figure III.
To obtain the complete picture, the average virtual memory
consumption level of the mat2c and
mcc C codes, which includes all
swapped-out pages, mapped files and devices, is shown in
Figure II.
|
 |
 |
 |
| Figure II: Average Virtual Memory
Levels |
|
 |
| Execution Times |
 |
| Figure III compares execution times of the
mat2c and mcc C
codes on a log scale. The figure also shows the times taken by
the MATLAB interpreter to execute the benchmarks. Indicated
above the bars are the performance speedups of the mat2c C codes over the
mcc C codes. In only one case was the
speedup marginal - 10% for the adpt
benchmark. The adpt benchmark has a
parameter called tol that specifies the error
tolerance in the quadrature values produced. Decreasing this
value decreases the fractional overhead due to type checking,
making numerical computation the dominant work in the
benchmark. Measurements for adpt were
taken with tol set at 10-12, which
is the setting in the FALCON benchmark suite. Because
increasing tol increases the relative overhead due to
type checking, the mat2c C code performs
much better than the mcc C code at
higher values of tol. In all other benchmarks, the
speedups ranged from 30% and 74% in two cases, to over 100% in
the remaining. In fact, in 4 out of 11 benchmarks, the speedups
were dramatic, being over an order of magnitude. |
 |
 |
 |
| Figure III: Comparative Execution
Times |
 |
 |
 |
| Figure IV: Compilation
Times' Breakup |
|
 |
| Compilation Times |
 |
| Finally, Figure IV displays the
times taken by mat2c to compile the
MATLAB sources to C code, and the time expended by the back-end
compiler (denoted as bcc) to take the
emitted C code down to binary. The distribution of times across
the various passes in the MATLAB-to-C conversion phase is shown
in the figure. "MathLink" refers to the portion of time
required to encode and transfer the IR to MAGICA for inferring
types; it excludes the type inference time, which is shown
separately in the bar chart.
"GCTD" indicates the Graph
Coloring with Type-based Decomposition pass in
mat2c; details about this pass are
available in [ 2 ].
|
 |
| References |
 |
|
[ 1 ] Pramod G. Joisha, Prithviraj Banerjee.
Static Array Storage Optimization in MATLAB.
Proceedings of the ACM SIGPLAN Conference on Programming
Language Design and Implementation (PLDI), San Diego, USA.
June 2003.
[ 2 ] Pramod G. Joisha.
A Type Inference System for MATLAB with Applications to
Code Optimization.
Doctoral Dissertation. Electrical and Computer Engineering
Department, Northwestern University. December 2003.
|
 |
 |
 |
| Last updated on 09/22/2004. |
 |
|
 |