
(http://www.ece.nwu.edu/cpdc)
Sep. 1, 1998- Aug. 31, 1999
Prof.
Prith Banerjee
Director, Center for Parallel and
Distributed Computing
Walter P. Murphy Professor and
Chairman
Department of Electrical and
Computer Engineering
Northwestern University
2145 Sheridan Road
Evanston, IL-60208
TEL: (847) 491-3641
FAX: (847) 491-4455
EMAIL: banerjee@ece.nwu.edu
URL:http://www.ece.nwu.edu/~banerjee
The
McCormick School of Engineering and Applied Science at Northwestern University
established a Center for Parallel and Distributed Computing on September 1,
1996; it is headed by Prof. Prith Banerjee. The goals of the Center are to
conduct leading edge, fundamental science and engineering research that will
provide a sound technological basis to the technology of parallel and
distributed computing. The specific research areas that will be undertaken at
the Center will be all aspects of parallel and distributed computing, namely,
parallel architectures, parallel compilers and software, parallel algorithms
and applications, parallel and distributed database systems, and parallel I/O
systems.
This report
outlines the activities in the center from September 1, 1998 through August 31,
1999.
During
1998-99, the Center for Parallel and Distributed Computing supported one
Director, one assistant, 12 faculty members from Northwestern University, two
scientists from Argonne National Lab, and 45 graduate students.
PRITH
BANERJEE: Director of the Center for Parallel and Distributed
Computing, and Walter P. Murphy Professor and Chairman of Electrical and
Computer Engineering, Ph.D. Univ. of Illinois at Urbana Champaign.
(http://www.ece.nwu.edu/~banerjee)
Research Interests: Parallel algorithms for VLSI
computer-aided design applications, Parallelizing compilers for distributed
memory multicomputers, Compilers for adaptive computing.
NANCY SINGER, Administrative Assistant, Center for
Parallel and Distributed Computing and Electrical and Computer Engineering
Department
ALVIN
BAYLISS:
Professor and Chairman of Engineering
Science and Applied Mathematics, and Professor, Electrical and Computer
Engineering, Ph.D. New York University
(http://www.ece.nwu.edu/ecefaculty/alvin.html)
Research
Interests: Numerical analysis, large-scale scientific computing,
combustion, computational fluid dynamics, solid mechanics, acoustics.
ALOK
CHOUDHARY: Associate Professor of Electrical and Computer
Engineering, Ph.D. Univ. of Illinois at Urbana-Champaign.
(http://www.ece.nwu.edu/~choudhar)
Research Interests: High-performance computing and
communications, input-output, compiler and runtime systems for HPCC, multimedia
systems and databases.
SCOTT
HAUCK:
Assistant Professor of Electrical and Computer Engineering, Ph.D. Univ.
Washington. (http://www.ece.nwu.edu/~hauck)
Research Interests: Reconfigurable and Multi-FPGA systems,
FPGA architectures and CAD tools, rapid-prototyping, asynchronous circuit and
VLSI design, parallel processing, parallel programming languages.
ANDREAS
MOSHOVOS: Assistant
Professor of Electrical and Computer Engineering, Ph.D. University of
Wisconsin, Madison. (http://www.ece.nwu.edu/ecefaculty/moshovosl.html)
Research Interests: Computer architecture and compilers,
hardware/software techniques to expose/exploit parallelism,
speculation/prediction, intelligent memory systems, special purpose
architectures.
JORGE
NOCEDAL:
Deputy Director of the Optimization Technology Center, and Professor of
Electrical and Computer Engineering, Ph.D. Rice University.
(http://www.ece.nwu.edu/ecefaculty/nocedal.html)
Research Interests: Nonlinear Optimization, applied
linear algebra, numerical analysis, software development for numerical
computations.
DER-TSAI
LEE:
Professor of Electrical and Computer
Engineering, Ph.D. University of Illinois at Urbana-Champaign.
(http://www.ece.nwu.edu/ecefaculty/dtlee.html)
Research
Interests: Design and analysis of algorithms, data structures, VLSI
systems, computational geometry, parallel algorithms, computational complexity,
algorithm visualization.
PETER
SCHEUERMANN: Professor, Electrical and Computer Engineering, Ph.D.
State University of New York, Stony Brook.
(http://www.ece.nwu.edu/ecefaculty/peters.html)
Research Interests: Physical database design, pictorial
databases, parallel I/O systems, parallel algorithms for data-intensive
applications, distributed database systems.
MAJID
SARRAFZADEH: Professor of
Electrical and Computer Engineering, Ph.D. University of Illinois at
Urbana-Champaign. (http://www.ece.nwu.edu/ecefaculty/majid.html)
Research
Interests: VLSI Design, computer-aided design, high-performance
architectural design, design and analysis of algorithms, computational
complexity.
NAGARAJ
SHENOY: Research Associate Professor of Electrical
and Computer Engineering, Ph.D. Indian Institute of Science, Bangalore. (http://www.ece.nwu.edu/~nagaraj)
Research Interests: Compilers for parallel computing.
ALLEN
TAFLOVE, Professor of Electrical and Computer
Engineering, Ph.D. Northwestern Univ.
(http://www.ece.nwu.edu/ecefaculty/taflove.html)
Research Interests: Applied electromagnetic field theory
and applications, computational electromagnetics, scattering and diffraction,
supercomupting, Maxwell’s equations-based computational nonlinear optics,
electromagnetic waves in nonlinear dispersive media, femtosecond, optical
switches.
VALERIE
TAYLOR,
Associate Professor of Electrical and
Computer Engineering, Ph.D. Univ. California, Berkeley.
(http://www.ece.nwu.edu/ecefaculty/taylor.html)
Research Interests: Performance of parallel scientific
applications, computer architecture, visual supercomputing environments.
TED
BELYTSCHKO, Walter P. Murphy Professor and Chairman of Mechanical
Engineering, Ph.D. Illinois Institute of Technology.
(http://www.ece.nwu.edu/mefaculty/belytschko.html)
Research Interests: Computational solid mechanics,
computational fluid dynamics, explicit finite-element programs, parallel
versions of above applications.
IAN
FOSTER,
Senior Computer Scientist, Mathematics and Computer Science Division, Argonne
National Lab, Ph.D. Imperial College, England.
(http://www.mcs.anl.gov/people/foster)
Research Interests: Various aspects of parallel and
distributed computing: algorithms, languages, tools; techniques to integrate
high-performance computing into large-scale internetworked environments;
computational science efforts in developing parallel climate models.
WILLIAM
GROPP,
Senior Computer Scientist, Mathematics and Computer Science Division, Argonne
National Lab. (http://www.mcs.anl.gov/people/gropp)
Research Interests: Scientific computing, particularly in
algorithms and software for adaptive and parallel methods for partial
differential equations, development of PETSc library of numerical routines,
development of MPI message passing interface standard and implementation.
·
Dhruva Ranjan
Chakrabarti, Ph.D., Compiler Support for
Parallel Irregular Applications, June 2000. (Advisor: P. Banerjee)
·
Yanhong Yuan, Ph.D.,
Parallel Algorithms for 3D Resistance
Capacitance Extraction, June 2000. (Advisor: P. Banerjee)
·
Jiwoong Victor Kim,
Ph.D., Parallel Algorithms for Logic
Synthesis, June 2000. (Advisor: P. Banerjee)
·
Chris Bachmann,
M.S., A Hardware/Software Testbed for
Adaptive Computing, Dec. 1998. (Advisor: P. Banerjee)
·
Pramod Joisha,
Ph.D., Compilation of MATLAB Programs on
Distributed Memory Multiprocessors, June 2001. (Advisor: P. Banerjee)
·
Suresh Periyacheri,
M.S., Design and Implementation of Signal
Processing Libraries in Reconfigurable Computing, June 1999. (Advisor: P.
Banerjee)
·
Alex Zhi Ye, M.S./Ph.D.,
Compilation of High-Level C Programs for
Adaptive Computing, M.S., June 1999. (Advisor: P. Banerjee)
·
Alex Jones,
M.S./Ph.D., Design and Implementation of
Image Processing Libraries on DSPs and FPGAs, M.S., Dec. 1999. (Advisor: P.
Banerjee)
·
Anshuman Nayak,
M.S./Ph.D., Design and Implementation of
Signal Processing Libraries on DSP Processors, M.S., June 2000. (Advisor:
P. Banerjee)
·
Abhay Kanhere,
Ph.D., Compilation for Adaptive Computing
Systems, 2001. (Advisor: A. Choudhary)
·
Malay Haldar, M.S., Dynamic Scheduling in Heterogeneous Adaptive
Systems, 2001. (Advisor: A.
Choudhary)
·
Xioui Shen, Ph.D., Large Scale Data Management for Scientific
Computing, 2001. (Advisor: A.
Choudhary)
·
Harsha Nagesh,
Ph.D., Topic to be decided, 2001. (Advisor: A. Choudhary)
·
Gokhan Memik, Topic
to be decided, 2001. (Advisor: A. Choudhary)
·
Sriram Raghavendran,
M.S., A Prototype for Visualization of
Large Datasets, Dec. 1998. (Advisor: A. Choudhary)
·
Mahmut Kandemir,
Ph.D., Syracuse University, Advanced
Compiler Optimization for High-Performance Computers, 1999. (Advisor: A.
Choudhary)
·
Sachin More, Ph.D., Parallel I/O for Large Object-Relational
Databases, expected 1999. (Advisor: A. Choudhary)
·
Sanjay Goil, Ph.D., High-Performance On-Line Analytical
Processing and Data Mining on Parallel Computers, 1999. (Advisor: A.
Choudhary)
·
Chutimet Srinilta,
Ph.D., Design and Evaluation of
High-Performance Multimedia Servers, Dec. 1998. (Advisor: A. Choudhary)
·
Katherine Nelson,
M.S./Ph.D, Reconfigurable Computing Architectures. (Advisor: S. Hauck)
·
Mark Chang, M.S./Ph.D. , Hyperspectral Application on
Reconfigurable Computing. (Advisor: S. Hauck)
·
G. Gu, M.S., Configuration Caching. (Advisor:
S. Hauck)
·
Z. Li, M.S./Ph.D., Adaptive Computing
Architectures. (Advisor: S. Hauck)
·
Shawn Phillips, M.S./Ph.D., CAD Tools for Configurable
Computing. (Advisor: S. Hauck)
·
Ankur Srivastava, M.S./Ph.D., FPGA Placement and
Routing. (Advisor: S. Hauck)
·
C. Mulpuri, M.S./Ph.D., FPGA Partitioning. (Advisor:
S. Hauck)
·
V. Karnam, M.S./Ph.D., Designing Image Processing
Algorithms on FPGA Boards. (Advisor: S. Hauck)
·
R. Guo, M.S., Web Based Programming Environment. (Advisor: D. T. Lee)
·
Y. Chao, M.S., On a Web-based Telemanipulation and
Monitoring System, Sept. 1998. (Advisor: D. T. Lee)
·
K. Aoki, Ph.D., A Web-based Distributed Programming
Environment, Dec. 1999. (Advisor: D. T. Lee)
·
L. Lin, Ph.D., DAViD: A Distributed Algorithm Visualization and Debugging System for
Geometric Computing in 3D.
(Advisor: D. T. Lee)
·
Philippe Canal, M.S., Finding Paths Among Obstacles
in Two-Layer Interconnection Model, Sept. 1998. (Advisor: D. T. Lee)
·
D. Y. Seo, Ph.D., On the Complexity of Bicriteria
Spanning Tree Problems for a Set of Points in the Plane, Dec. 1999. (Advisor:
D. T. Lee)
·
Amirali Baniasadi, M.S./Ph.D., Topic to be decided.
(Advisor: Andreas Moshovos)
·
Amir Farahi, M.S./Ph.D., Topic to be decided. (Advisor:
Andreas Moshovos)
·
Jean Liu, M.S.,
June 1999. (Advisor: M. Sarrafzadeh)
·
Shunliang Wang, M.S., June 1999. (Advisor: M. Sarrafzadeh)
·
Eli Bozorgzadeh,, MS/PhD., Physical Design in FPGAs, (Advisor: M. Sarrafzadeh)
·
Vivian Guzman,
M.S./Ph.D., Interaction of Physical Design and Synthesis. (Advisor: M. Sarrafzadeh)
·
Seda Ogrenci, MS/Ph.D., Multimedia Application on
FPGAs. (Advisor: M. Sarrafzadeh)
· Kia Bazargan, Ph.D., Reconfigurable Computing. (Advisor: M. Sarrafzadeh)
·
Abhshek Ranjan, M.S./Ph.D., Interaction of Physical
Design and Synthesis. (Advisor: M.
Sarrafzadeh)
·
Maogang Wang , Ph.D., Physical Design. (Advisor: M. Sarrafzadeh)
·
Todd Haverkos, M.S., Chip Design. (Advisor:
M. Sarrafzadeh)
·
Xioajian Yang, M.S./Ph.D., Physical Design. (Advisor:
M. Sarrafzadeh)
· B. Chen, Ph.D, Title to be decided, 2000. (Advisor: P. Scheuermann)
· L. Singh, Ph.D., Generating Constraint Association Rules from Semi-Structured Data, 1999. (Advisor: P. Scheuermann)
· M. Sayal, Ph.D., Title to be decided, 2000. (Advisor: P. Scheuermann)
· J. Shim, Ph.D., Title to be decided, June 1999. (Advisor: P. Scheuermann)
· J. Chen, Ph.D., Performance Evaluation of Parallel Applications. (Advisor: V. Taylor)
·
Jonathan Geisler, M.S., Ph.D., Performance Evaluation
of Parallel Applications. (Advisor: V.
Taylor)
·
Zhiling Lan, Ph.D., Performance Evaluation of Parallel
Applications. (Advisor: V. Taylor)
·
Xin Li, M.S./Ph.D., Title to be decided. (Advisor: V. Taylor)
· Warren Smith, M.S./Ph.D., Scheduling Parallel Applications. (Advisor: V. Taylor)
The Center
has the following computing resources:
·
A 16
processor IBM SP-2 distributed memory message passing multicomputer with each
node having 128 MB memory, 2 GB disk, and a IBM Power2 processor.
·
One IBM
model 42T RS/6000 workstation used as a control workstation for the IBM SP-2
machine.
·
An 8
processor IBM J-40 shared memory multiprocessor, with 8 PowerPC 601 processors,
1 GB of memory, 14 GB disk.
·
An 8
processor SGI Origin 2000 distributed shared memory multiprocessor with eight
MIPS R10000 processors, 2 GB memory
·
Ten C-110 HP workstations with a 120 MHz PA 7200
CPU, 128 MB memory, 4 GB disk, 20 inch color monitors.
·
Fifty HP
C-180 workstations with 180 MHz PA 8000 CPU, 128 MB memory, 20 inch monitors
·
Four Dell
personal computers with 300 Intel Pentium II processors, 128 MB memory, 2 GB
disk, 17-inch color monitors, CDROM drive.
·
The above
equipment are all connected via a local area Ethernet network which will be
upgraded shortly into an OC-3 ATM based network operating at 155 Mbps.
This section gives an
overview of some of the research projects at the Center for Parallel and
Distributed Computing.
We have research funding from a variety of sources including the National Science Foundation (NSF), Defense Advanced Research Projects Agency (DARPA), and various industries including IBM, Intel, and Motorola.
In the WADE project, we have developed a
novel framework of a web-based automated parallel CAD environment. The design
files of a user working in a remote machine are transparently shipped to the
WADE compute center, the relevant computations are performed in a parallel
environment, and the results are returned back to the user. A job submission
and scheduling tool ensures proper load balance and maximal usage of the
various parallel machines. At present three parallel CAD applications are
supported: (1) a parallel placement tool for standard cell circuits (2) a
parallel fault simulator for sequential circuits (3) a parallel VHDL logic
simulator. We have developed techniques for ensuring the safety and security of
data files in the system. We are
developing techniques for performing extremely high performance mappings to
FPGA-based systems. These have the
potential to significantly improve the utility of logic emulators, adaptive
computing systems, and other multi-FPGA systems.
This research
has been supported by NSF, DARPA, and Quickturn, has supported 2 Ph.D.
students, 1 B.S. student, and generated 3 journal papers and one conference
paper.
In this research,
we have investigated parallel algorithms for several CAD tools with the
objective of reducing the design turnaround times of the CAD tools, and to
enable larger CAD problems to be solved. All the tools are developed using the
Message-Passing Interface (MPI).
In the area
of parallel logic synthesis, we have investigated parallel algorithms for
various key steps of a commercial logic synthesis software called BUILDGATES
from Ambit Design Systems, and parallelized them for efficient execution on a
network of workstations. In some related work, we have done some novel work in
the area of low-power logic synthesis within the framework of the BUILDGATES
tools.
In the area
of logic verification, we have developed two parallel algorithms for
constructing Binary Decision Diagrams (BDDs). The first algorithm is based on
fine grained parallelism, and did not result in any speedups. The second
algorithm was based on partitioning BDDs by fan-in cones and resulted in
speedups of about 3 on 8 processors of an SGI Origin shared memory
multiprocessor. We have also done some novel work in the area of equivalence
checking of multi-phase sequential circuits using re-timing.
In the area
of behavioral synthesis, we have developed some parallel algorithms for
simultaneous scheduling, binding, and floor-planning. In comparison to
traditional approaches which separates high-level synthesis from physical
design, we have devised an algorithm where these stages interact closely. This
results in solutions with lower latency and area, but require increased
computation times. Experimental results on a set of synthesis benchmarks show
speedups of about 7.8 on 8 processors on an IBM SP2 message-passing
multiprocessor.
Finally, in
the area of behavioral simulation, we have developed an efficient sequential
VHDL simulator based on compiled event driven simulation. Two approaches to
parallelizing compiled VHDL simulation have been developed, one based on fine
grain task scheduling, and another based on coarse grain partitioning based on
fan-in cones. While the first algorithm did not result in any speedups, the
second algorithm achieved speedups of about 3 on 8 processors of an SGI Origin
shared memory multiprocessor.
For each of
the above parallel algorithms, we have benchmarked them on a network of
workstations and the SGI Origin multiprocessor obtained as part of the NSF
grant. This project was supported by DARPA and supported 3 Ph.D. students. It
resulted in 3 journal papers and 13 conference publications.
With the
growing popularity of personal computing and communication devices, the
high-performance as well as low power consumption is much desirable in the
synthesis of VLSI circuits and systems. Over the years, much research progress
has been made towards the timing, area and power optimization. However, when
the designers look into the possibility of further improvement over circuit
area/power, the timing constraints turn out to be a strong limiting factor.
This trend can be exacerbated at lower levels (such as gate-level and
physical-level) where relatively accurate timing models are available.
Therefore, the tradeoff between delay and area/power has to be made, in
general, to achieve the result with reasonably good quality. Since the timing
performance of a circuit depends on critical paths, the logic gates on
non-critical paths have typically large timing slacks (i.e. the difference
between required time and arrival time). The problem is thus translated into
how to exploit these excessive slacks effectively during the process of
optimization. Most of prior efforts do not deal directly with the slacks.
In the past year we directly
targeted the analysis of the slack
distribution and developed a Slack- Equalization-Algorithm to maximize the
objective function (power or area) under the timing constraints. This algorithm can find many applications in
low-level synthesis and optimization
ranging from gate-level power/area optimization to layout-level placement and routing design. Experimental
results verify the effectiveness of the proposed technique.
This research
has been supported by NSF and Motorola, and has supported 8 students (5 female,
1 minority), and has generated 3 journal papers, and 10 conference papers.
Reconfigurable
logic has the potential to alter the way most types of computations are
performed. By integrating FPGAs into
host processors, custom instruction sets and functional accelerators can be
generated on an application by application basis. However, the architecture of these devices, applications, and
configuration management are significant concerns in these systems. This research is attempting to overcome
these techniques via new approaches in each of these areas. We have developed new CAD algorithms to
support these devices, including physical design and configuration management
techniques. We have also investigated
chip architectures, compilers, and other aspects of this problem. Specifically, this year we have developed
novel compression algorithms by minimizing the reconfiguration times in FPGAs,
designed high-performance carry chains for faster arithmetic computation,
developed CAD algorithms for template based physical design using parameterized
designs of adders and other components.
Finally, this year we have developed a C compiler based on GCC for mixed
processor-FPGA systems where sequences of normal processor instructions are
replaced by complex reconfigurable logic instructions. The compiler has been evaluated with the use
of a simulator on various benchmarks, and speedups of about 1.5 to 7 have been
measured over a MIPS IV processor.
This research
has been supported by NSF, DARPA, and Motorola, has supported 10 graduate
students and 8 undergraduates, and generated 2 journal and 3 conference papers.
Multidimensional
Analysis and On-Line Analytical Processing (OLAP) operations require summary
information on multi-dimensional data sets. Most common are aggregate
operations along one or more dimensions of numerical data values. Simultaneous
calculation of multi-dimensional aggregates are provided by the Data Cube
operator, used to calculate and store summary information on a number of
dimensions. This is computed only partially if the number of dimensions is
large. Query processing for these applications require different views of data
to gain insight and for effective decision support. These may either be
answered from a materialized cube in the data cube or calculated on the fly.
The
multi-dimensionality of the underlying problem can be represented both in
relational and multi-dimensional databases, the latter being a better fit when
query performance is the criteria for judgment. Relational databases are
scalable in size for OLAP and multidimensional analysis and efforts are on to
make their performance acceptable. On the other hand multi-dimensional
databases have proven to provide good performance for such queries, although
they are not very scalable. In this project we are addressing scalability in
multi-dimensional systems for OLAP and multi-dimensional analysis.
Multidimensional
databases store data in multidimensional structure on which analytical
operations are performed. A challenge for these systems is how to handle large
and sparse data sets with large dimensionality. Parallel computing is necessary
to address the performance issues for these data sets.
Chunking is
used to store data either as a dense block using multi-dimensional arrays or a
sparse set using a Bit encoded sparse structure (BESS). Chunks provide a
multi-dimensional index structure for efficient dimension oriented data accesses
much the same as multi-dimensional arrays do. Operations within chunks and
between chunks are a combination of relational and multi-dimensional operations
depending on whether the chunk is sparse or dense.
In this
research we address issues of data partitioning, parallelism, schedule
construction, data cube building, chunk storage and memory usage on a
distribute memory machine architecture. The following figures summarize the
various available options, especially in terms of storage of cubes, parallelism
at different levels, aggregation calculation orderings and the chunk access for
aggregations.
Management,
storage, efficient access, analysis of 100's of GBs to 100's of TBs of data,
that is likely to be generated and/or used in various phases of large-scale
scientific experiments and simulations. Current data management and analysis
techniques do not measure up to the challenges posed by such large-scale
requirements in performance, scalability, ease of use and interfaces. Terascale
computing requires newer models and approaches to solving the problems in
storing, retrieving, managing, sharing, visualizing, organizing and analyzing
data at such a massive scale. Also,
since problem solving in scientific applications requires collaboration and
usage of distributed resources, the problem of data management and storage is
further exacerbated.
In this
project, we are developing a Scalable High-Performance Data-Management System
(SHPDM) that will provide support for data management, query capability and
high-performance accesses to large datasets. SHPDM will provide the flexibility
of databases for indexing, searches, management of objects, creating and
keeping histories and trails of data accesses, while providing high-performance
access methods and optimizations (pooled striping, prefetching, caching,
collective I/O) for accessing large-scale data-objects found in scientific
computations. This allows decoupling data management from data accesses thereby
allowing us to leverage many high-performance I/O methods that have already
been developed or are under development.
We have developed Web++, a prototype system based on user-transparent geographic replication which aims at improving the response time and the reliability of the HTTP service. Web++ extends the current Web client/server architecture with additional capabilities. The extension is provided via Java servlets at the server side and applets at the client side. The servlets pre-process the requested documents such that each logical URL is replaced by a list of physical URLs corresponding to the list of the resource's replica. An applet is automatically downloaded to the client machines together with the first request of each new user session. For subsequent requests within a user session, the applet chooses the physical URL that corresponds to a resource held by an available server that is expected to deliver the best response time for the server.
In the past
year, we have done extensive research in four areas in compilation techniques
for High Performance Fortran (HPF) Programs for parallel and distributed
machines. First, we have developed the PARADIGM 2.0 compiler which uses a
linear algebra framework to handle arbitrary data distributions and data
alignments in HPF programs and generate message-passing codes for programs
having general affine subscript expressions. We have benchmarked the
performance of our PARADIGM2.0 compiler with the PGHPGF compiler from Portland
Group and the XLHPF compiler from IBM, and demonstrated thath our compiler generates
more efficient code with faster execution times, lower communication costs, and
comparable compilation times as the commercial compilers. We have performed a lot of experimental
measurements of the PARADIGM compiled codes on an IBM SP2 and a network of
workstations. Second, we have developed a compiler and runtime system that
simultaneously supports both regular and irregular accesses using a unique
interval representation called PILAR. Third, we have developed a linear algebra
framework for applying both loop and data transformations in order to improve
temporal and spatial cache locality in distributed shared memory
multiprocessors.
This research
has been supported by NSF, DARPA, Intel, and Air Force Research Labs, and has
supported 3 Ph.D. students, 1 M.S. student, and generated 3 journal papers, and
9 conference papers. One M.S. student has graduated.
The objective
of the MATCH project is to make it easier for users to develop efficient codes
for adaptive computing systems. We will develop a compiler that will take in
applications written in a high-level language (MATLAB) and generate efficient
low level code that will run on a distributed environment of
commercial-off-the-shelf (COTS) FPGAs, embedded processors, and digital signal
processors. The specific goals of the MATCH project are to: (1) Enable the
users to reduce the code development times for adaptive applications from weeks
using manual approaches to hours using compiler tools. (2) Produce efficient
codes that are within a factor of 2-4 of the best manual approach with respect
to optimizing resources under performance constraints, or optimizing performance
under resource constraints.
We have
developed an adaptive computing testbed consisting of a VME chassis, some
Motorola embedded boards, a Transtech DSP board, an Annapolis FPGA board, and a
FORCE 5V host board. We have developed various MATLAB library functions to be
executed on various components of our testbed. We have designed some automated
heuristics using mixed integer linear programming to map a given dataflow graph
of a MATLAB program on heterogeneous resources while optimizing resources under
timing constraints. We have also developed a basic MATLAB compiler which
automatically calls library functions on various parts of the testbed.
This project
is supported by DARPA and supports 4 Ph.D. students, and 5 M.S. students, and
has generated 2 journal papers, and 4 conference papers.
The objective
of the PART project is to develop mesh partitioning techniques that consider
the static and dynamic features of distributed systems and adaptive
applications. These techniques are
integrated into a tool called PART.
Currently,
PART performs static mesh partitioning of applications, exploiting the
heterogeneity in processor and network performance that exists in distributed
systems, as well as the heterogeneity in the application. PART is distinguished from conventional mesh
partitioning tools and techniques, for which only heterogeneous processor
performance can be exploited. Future
work entails developing dynamic techniques applicable to distributed systems.
PART uses
simulated annealing, which is a computationally intense optimizing method. This year the focus was on parallelizing
PART, to reduce the time required to generate the partitions. Experiments on the IBM SP multiprocessors at
the Northwestern University Center on Parallel and Distributed Computing and at
the Argonne National Laboratory demonstrated
excellent scalability results of the parallel PART; in some cases the
results indicated superlinear speedup. Further,
PART was used to partition an explicit and an implicit finite element
applications for execution on two IBM SP systems, one located at San Diego
Supercomputer Center and the second located at Argonne National Laboratory. The results of using PART with these
applications indicated up to 35% improvement in efficiency as compared to the
widely used METIS system, for which only heterogeneous processor performance is
exploited.
This research
supported in part by a continuing NYI grant from National Science Foundation
and AlliedSignal, Inc. The funding has
been used to support 3 PhD students and 3 undergraduate students. This work has generated 4 journal papers
and 9 conference papers.
During the past year, we conducted research
in the area of geometric computing, and robotics control. We concentrated on
the design of web-based interactive visualization tool to facilitate
visualization of geometric algorithms. An interactive visualization tool,
called GeoSheet, which was running under Sun OS Unix workstations, has been
installed on the HP server and is accessible by all students. Students in the
following two courses have made use of the tool to do the machine programs.
This tool allows the user to enter geometric data, such as rectangles,
polygons, etc., directly on the screen, and see computed output on the same or
possibly different machine. The screen I/O can be distributed over the
Internet.
This research
was supported by ONR and NSF and supported 2 M.S. and 2 Ph.D. students (one
female
Researchers in the Center have had numerous interactions with
industry. Details of the interactions
are outlined below.
Prof. Banerjee, Choudhary, Hauck, and Shenoy are interacting with
Integrated Sensors Inc. (Contact: Dr. Milissa Benincasa) regarding the MATCH
compiler project. Specifically, they are using the RTEExpress software from ISI
as a parallel MATLAB library on the MATCH testbed. In return, ISI
will get the prototype of the MATCH compiler
for their research.
Prof. Banerjee is working with the Ambit Group of Cadence Design
Systems (contact: Dr. Devadas
Varma) in the area of parallel algorithms for logic
synthesis. We are developing parallel
algorithms for the Envisia synthesis tools from Cadence that are suitable for
execution on networks of workstations.
Prof. Taylor has been supported in part by a grant from Allied
Signal, Inc. for her work on the PART tool.
The funding provided support for a graduate student and visits to
AlliedSignal, Inc. One of the initial
applications used to develop PART is the finite element analysis of an aircraft
brake system, which is of great interest to AlliedSignal, Inc.
Prof. Alok Choudhary has a continuing relationship with Sandia
National Lab in a project entitled, "Runtime Libraries for Large-Scale
Parallel I/O" where he is extending the PASSION parallel I/O library on
the ASCI parallel supercomputer being developed at Sandia Labs.
Professors Banerjee, Choudhary and Hauck are working jointly with
Los Alamos National Lab on the MATCH
project on adaptive computing systems (Contacts: Dr. Jeff Block,
and Dr. Mark Dunham). We will transfer the technology of the MATCH compiler
that we are developing to them.
Professors Choudhary and Banerjee are working jointly with Air
Force Research Labs on developing a parallel version of a Space Time Adaptive
Application (STAP) (Dr. Rich Linderman). This application is going to be used
as the application driver in the MATCH compiler project supported by DARPA.
Prof. Taylor who has a guest appointment with the MCS Division at
Argonne National Laboratory, has had a long collaborative relationship with
researchers at Argonne. Prof. Taylor is
currently involved with three collaborative projects with Argonne National
Laboratory. First, is the collaborative project related to the analysis of the
aircraft brake system. The particular
finite element code used for the analysis was developed by Dr. Canfield at
Argonne. The second project relates to
the resource management for distributed system project. This project is part of
the GLOBUS project.
Prof. Choudhary has an ongoing project in the area of parallel I/O
software with Argonne National Lab (contact: Dr. Rakesh Bordawarkar).
In addition, two of the scientists from Argonne National Lab, Dr.
Ian Foster and Dr. William Gropp, are members of the Center. They have
continuing collaboration in the areas of the PARADIGM compiler for distributed
memory multicomputers, the PETSI library for parallel runtime software, the
PASSION project on parallel I/O, and the GLOBUS project on resource management
of distributed systems.
Prof. Valerie Taylor has established a relationship with NASA Ames
in the project entitled "Prophesy:
A Hierarchical Tools for Modeling and Analyzing Parallel, Scientific
Applications", where she is developing methods for modeling the coupling
between kernels and the performance impact on the application.
1.
G. Hasteer,
A. Mathur, and P. Banerjee, "Efficient Equivalence Checking of Multi-Phase
Designs Using Phase Abstraction and Retiming.," ACM Transactions on
Design Automation of Electronic Systems (TODAES) Special Issue on High
Level Design, Validation and Testing, Oct. 1998, pp. 600-625.
2.
M.
Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, J. Ramanujam, "A Linear
Algebra Framework for Automatic Determination of Optimal Data Layouts," IEEE
Transactions on Parallel and Distributed Systems, Vol. 10, No. 2, February
1999.
3.
J. Chandy
and P. Banerjee, "A Parallel Circuit Partitioned Algorithm for
Timing-Driven Standard Cell Placement," Journal of Parallel and
Distributed Computing, vol. 57., No. 1, pp. 64-90, April 1999.
4.
P.
Prabhakaran and P. Banerjee, "Parallel Algorithms for Force-Directed
Scheduling of Flattened and Hierarchical Signal Flow Graphs," IEEE
Transactions on Computers, 1999.
5.
M.
Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy, "A
Global Communication Optimization Technique Based on Data Flow Analysis and
Linear Algebra," ACM Trans. on
Programming Languages and Systems (TOPLAS), Vol. 21, No. 5, Sept.
1999.
6.
M.
Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A Matrix-Based
Approach to Global Locality Optimization," Journal of Parallel and Distributed Computing, Special Issue
on Compilation and Architectural Support for Parallel Applications, Vol. 58,
No. 2, Aug. 1999, pp. 190-235.
7.
V.
Krishnaswamy and P. Banerjee, "Parallel Compiled Event Driven VHDL
Simulation," Proc. Int. Conf. Supercomputing (ICS-98),
Melbourne, AUSTRALIA, July 1998.
8.
D. R.
Chakrabarti, N. Shenoy, A. Choudhary, and P. Banerjee, "An Efficient
Uniform Run-time Scheme for Mixed Regular-Irregular Applications," Proc.
Int. Conf. Supercomputing (ICS-98), Melbourne, AUSTRALIA, July 1998.
9.
M.
Kandemir, A. Choudhary, N. Shenoy, J. Ramanujam, and P. Banerjee, "A
Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests," Proc.
Int. Conf. Supercomputing (ICS-98), Melbourne, AUSTRALIA, July 1998.
10.
M.
Kandemir, J. Ramanujam, A. Choudhary, P. Banerjee, "An iteration space
transformation algorithm based on an explicit data layout representation for
optimizing locality," Proc. Workshop on Languages and Compilers
for Parallel Computing (LCPC-98), Chapel Hill, NC, Aug. 1998.
11.
M.
Kandemir, N. Shenoy, P. Banerjee, J. Ramanujam, and A. Choudhary,
"Minimizing Data and Synchronization Costs in One-Way Communication,"
Proc. Int. Conf. Parallel Processing (ICPP98), Minneapolis, MN, Aug.
1998.
12.
Z. Xing and
P. Banerjee, "A Parallel Algorithm for Timing-Driven Global Routing for
Standard Cells," Proc. Int. Conf. Parallel Processing (ICPP98),
Minneapolis, MN, Aug. 1998.
13.
M.
Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy, and P. Banerjee,
"Enhancing Spatial Locality using Data Layout Optimizations," Proc.
European Conference on Parallel Processing (Euro-Par'98), Southampton,
ENGLAND, Sep. 1998.
14.
A. Mishra
and P. Banerjee, "A Fault Tolerant Multi-Grid Algorithm," Proc.
Parallel and Distributed Computing Systems (PDCS98), Chicago, Sep.
1998.
15.
S. Roy, A.
Harms and P. Banerjee, "A Low Power Logic Optimization Methodology Based
on a Fast Power Driven Mapping," Proc. Int. Conf. Computer Design
(ICCD-98), Austin, TX, Oct. 1998.
16.
M.
Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy, and P. Banerjee, "A
Matrix-Based Approach to the Global Locality Optimization Problem," Proc.
Parallel Architectures and Compilation Tecniques (PACT-98), Paris, FRANCE,
Oct. 1998.
17.
S. Roy and
P. Banerjee, "Power Drive: A fast, canonical POWER estimator for DRIVing
synthEsis," Proc. 1998 International Conference on
Computer-Aided Design (ICCAD-98), San Jose, CA, Nov. 1998.
18.
G. Hasteer,
A. Mathur, and P. Banerjee, "Efficient Equivalence Checking of Multi-Phase
Designs Using Retiming", Proc. 1998 International Conference on
Computer-Aided Design (ICCAD-98), San Jose, CA, Nov. 1998.
19.
D.
Chakrabarti, P. Joisha, J. Chandy, D. Krishnaswamy, V. Krishnaswamy, and P.
Banerjee, "WADE: A Web-Based Automated Parallel CAD Environment," Proc.
International Conference on High Performance Computing (HiPC'98),
Chennai, India, Dec. 1998.
20.
M.
Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality
using loop and data transformations in an integrated framework" Proc.
31st Int. Symp. on Micro-Architecture (MICRO31), Dallas, Texas, Dec. 1998.
21.
P.
Prabhakaran, J. Crenshaw, P. Banerjee, and M. Sarrafzadeh, "Simultaneous
Scheduling, Binding and Floorplanning for Interconnect Power
Optimization," Proc. 1999 VLSI Design Conference, Goa, India, Jan.
1999.
22.
Y. Yuan and
P. Banerjee, "ICE: Incremental 3-Dimensional Capacitance and Resistance
Extraction for an Iterative Design Environment," Proc. 9th Great Lakes
Symposium on VLSI, Ann Arbor, MI, March 1999.
23.
J.
Crenshaw, M. Sarrafzadeh, P. Banerjee, and P. Prabhakaran, “An Incremental
Floor-Planner,” Proc. 9th Great Lakes Symposium on VLSI, Ann
Arbor, MI, March 1999.
24.
Y. Yuan and
P. Banerjee, “Fast Potential Integrals for Signal Integrity Analysis,” Proc.
ACM/IEEE International Workshop on Timing Issues in the Specification and
Synthesis of Digital Systems (TAU’99), Monterey, CA, March 1999.
25.
D.
Chakrabarti and P. Banerjee, “A Novel Compilation Framework for Supporting
Semi-Regular Distributions in Hybrid Applications,” Proc. 1999 International
Parallel Processing Symposium (IPPS’99), San Juan, Puerto Rico, April 1999,
pp. 597-602.
26.
S. Mohan,
P. Mazumder, D. Krishnaswamy, P. Banerjee, and E. M. Rudnick, “Parallel
Implementations,” Chapter 8 in Genetic Algorithms for VLSI Design, Layout
& Test Automation, P. Mazumder and E. M. Rudnick, eds., pp. 252-294,
Prentice Hall PTR, 1999.
27.
M.
Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, “An Iteration Space
Transformation Algorithm Based on Explicit Data Layout Representation for
Optimizing Locality,” in Languages and Compilers for Parallel Computers,
S. Chatterjee et al., eds., Lecture Notes in Computer Science, Springer-Verlag,
1999.
28.
M.
Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A Graph Based
Framework to Detect Optimal Memory Layouts for Improving Data Locality," Proc.
1999 International Parallel Processing Symposium (IPPS'99), San
Juan, Puerto Rico, April 1999.
29.
P. Joisha
and P. Banerjee, "PARADIGM (version 2.0): A New HPF Compilation
System," Proc. 1999 International Parallel Processing Symposium
(IPPS'99), San Juan, Puerto Rico, April 1999.
30.
Y. Yuan and
P. Banerjee, "Incremental Capacitance Extraction and Its Application to
Iterative Timing Driven Detailed Routing," 1999 International Symposium
on Physical Design (ISPD-99), Monterey, CA, April 1999.
31.
J. Chen and
P. Banerjee, "Parallel Construction Algorithms for BDDs," 1999
International Symposium on Circuits and Systems (ISCAS'99), Orlando,
FL, June 1999.
32.
S. Roy, A. Harms, and P. Banerjee, “An a-approximate Algorithm for
Delay-Constraint Technology Mapping,” Proc. 1999 Design Automation
Conference (DAC’99), June 1999.
33.
A. Mishra
and P. Banerjee, "An Algorithm Based Error Detection Scheme for the
Multigrid Method," Proc. of the 1999 International Symposium on
Fault Tolerant Computing, Madison, WI, June 15-18, 1999.
34.
M.
Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "An ILP
Approach for Optimizing Cache Locality," 1999 ACM International
Conference on Supercomputing (ICS'99), Rhodes, Greece, June 1999.
35.
M.
Kandaswamy, M. Kandemir, A. Choudhary, and D. Bernholdt, “An Experimental Study
to Analyze and Optimize Hartree-Fock Application’s I/O With PASSION,” International
Journal of High Performance Computing Applications, Vol. 12, No. 4, pp.
411-439, Winter 1998.
36.
M.
Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a
Combination of Loop and Data Transformations,” IEEE Transactions on
Computers, Vol. 48, No. 2, Feb. 1999.
37.
R.
Krishnaiyer, I. Foster, and A. Choudhary, “Performance of a Remote I/O Library
in High-Performance Distributed Computing Environments,” Proc. International
Conference on Parallel and Distributed Computing and Systems (PDCS’98),
Chicago, IL, Sept. 1998.
38.
J.
Carretero, J. No, and A. Choudhary, “Parallel I/O for Irregular Applications on
Distributed Memory Machines,” IX Jornadas de Paralelismo, San Sebastian,
SPAIN, Sept. 1998.
39.
J.
Carretero, W. Zhu, X. Shen, and A. Choudhary, “MiPFS: A Multimedia Integrated Parallel File System,” International
Joint Conference on Information Systems, Raleigh, NC, Oct. 1998.
40.
S. Goil and
A. Choudhary, “High Performance Multidimensional Analysis of Large Datasets,” Proc.
ACM First International Workshop on Data Warehousing and OLAP (DOLAP’98)
(in conjunction with CIKM’98), Washington, DC, Nov. 1998.
41.
S. Goil and
A. Choudhary, “High Performance Multidimensional Analysis and Data Mining,” Proc.
High Performance Networking and Computing Conference (SC’98), Orlando, FL,
Nov. 1998.
42.
S. More and
A. Choudhary, “Extended Collective I/O for Efficient Retrieval of Large
Objects,” Proc. 5th International Conference on High Performance
Computing (HiPC’98), Chennai, INDIA, Dec. 1998.
43.
J. No, J.
Carretero, and A. Choudhary, “Optimizations to Provide High-Performance
Parallel I/O for Irregular Applications,” Proc. Applied Informatics ’99,
Innsbruck, AUSTRIA, Feb. 1999.
44.
J.
Carretero, J. No, and A. Choudhary, “Optimizing I/O for Irregular Applications
on Distributed-Memory Machines,” Proc. ACPC’99, Salzburg, AUSTRIA, Feb. 1999.
45.
M. Kandemir
and A. Choudhary, “System-Level Meta-Data for High Performance Data
Management,” Proc. Third IEEE Meta-Data Conference, Bethesda, MD, April
1999.
46.
S. Goil and A. Choudhary, “Design and Implementation
of a Scalable Parallel System for Multidimensional Analysis,” Proc. 13th
International Parallel Processing Symposium & 10th Symposium on
Parallel and Distributed Processing (IPPS/SPDP’99), San Juan, Puerto Rico,
April 1999.
47.
W. Liao, D.
Weiner, P. Varshney, and A. Choudhary, “Multi-Threaded Design and
Implementation of Parallel Pipelined STAP on Parallel Computers with SMP
Nodes,” Proc. 13th International Parallel Processing Symposium,
10th Symposium on Parallel and Distributed Processing (IPPS/SPDP
1999), San Juan, Puerto Rico, April 1999.
48.
J. No, J.
Carretero, and A. Choudhary, “High Performance Parallel I/O Schemes for
Irregular Applications on Clusters of Workstations,” Proc. 7th
International Conference on High-Performance Computing and Networking (HPCN
Europe 1999), Amsterdam, THE NETHERLANDS, April 1999.
49.
M.
Kandemir, J. Ramanujam, and A. Choudhary, “Restructuring I/O-Intensive
Computations for Locality,” Proc. 7th Intenantional Conference on
High-Performance Computing and Networking (HPCN Europe 1999), Amsterdam,
THE NETHERLANDS, April 1999.
50.
S. Goil and
A. Choudhary, “An Infrastructure for Scalable Parallel Multidimensional
Analysis,” Proc. 11th International Conference on Scientific and
Statistical Databases (SSDBM’99), Cleveland, OH, July 1999.
51.
S. Goil and
A. Choudhary, “A Parallel Scalable Infrastructure for OLAP,” Proc.
International Database Engineering and Applications Symposium (IDEAS'99),
Montreal, CANADA, August 1999.
52.
M.
Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, R. Thakur, and A.
Choudhary, “Data Management for Large-Scale Scientific Computations in High
Performance Distributed Systems,” Proc. 8th IEEE International
Symposium on High Performance Distributed Computing, Redondo Beach, CA,
Aug. 1999.
53.
S. Hauck,
G. Borriello, C. Ebeling, "Mesh Routing Topologies for Multi-FPGA
Systems", IEEE Transactions on VLSI Systems, Vol. 6, No. 3,
September, 1998.
54.
S.
Hauck, Z. Li, E. J. Schwabe, "Configuration Compression for the Xilinx
XC6200 FPGA", to appear in IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems.
55.
M.
Enos, S. Hauck, M. Sarrafzadeh, "Evaluation and Optimization of
Replication Algorithms for Logic Bipartitioning", to appear in IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems.
56.
Z. Li, S.
Hauck, "Don't Care Discovery for FPGA Configuration Compression", Proc.
ACM/SIGDA Symposium on Field-Programmable Gate Arrays, pp 91-98,
1999.
57.
S. Hauck,
W. D. Wilson, "Abstract: Runlength Compression Techniques for FPGA Configurations",
Proc. IEEE Symposium on FPGAs
for Custom Computing Machines, 1999.
59.
J. Shim, P.Scheuermann,
R.Vingralek, "Proxy Cache Design: Algorithms, Implementation and
Performance”, to appear in IEEE
Transactions on Knowledge and Data Engineering, special issue on Web
Technologies, Vol. 11, No. 4, July/August 1999.
62.
L. Singh,
B. Chen, R. Height, and P. Scheuermann, “An Algorithm for Constrained
Association Rule Mining,” Proc. Pacific Asia Knowledge Discovery in
Databases Conference, Beijing, CHINA, April 1999.
63.
L. Singh,
M. Macie, J. Woytek, J. Pan, and P. Scheuermann, “Gaining Knowledge for
Ill-Structured Data,” Proc. Of the AFCEA Federal Data Mining Symposium,
McLean, VA, March 1999, pp. 155-161.
64.
L. Singh,
B. Chen, and P. Scheuermann, “IRIS: Our
Prototype Rule Generation System,” Proceedings SPIE, Orlando, FL, 1999.
65.
J. Shim, R.
Vingralek, and P. Scheuermann, “Dynamic Caching of Query Results for Decision
Support Systems,” Proc. 11th International Conference on
Statistical and Scientific Database Management, Cleveland, OH, July 1999,
pp. 254-263.
66.
P. Zabback,
I. Onyuksel, G. Weikum, and P. Scheuermann, “Database Reorganization in
Parallel Disk Arrays with I/O Service Stealing,” IEEE Transactions on Data
and Knowledge Engineering, Vol. 11, No. 4, September/October 1998, pp.
855-858.
67.
G.
Tellez, and M. Sarrafzadeh, `Minimal Buffer Insertion in Clock Trees with Skew
and Slew Rate Constraints,'' to appear in
IEEE Transactions on Computer-Aided Design.
68.
D. S. Chen,
Gary Yeap, and M. Sarrafzadeh, ``State Encoding of Finite State Machines for
Low Power Design'' to appear in VLSI Design.
69.
Wei-Liang
Lin and M. Sarrafzadeh, `` On the Power of Re-Synthesis,'' To appear in SIAM
Journal on Computing.
70.
Sara
Nicoloso and M. Sarrafzadeh, `Sum Coloring Problem on Interval Graphs'', to
appear in Algorithmica.
71.
Amir
Farrahi, D. T. Lee and M. Sarrafzadeh, `Two-Way and Multi-Way Partitioning of a
Set of Intervals for Clique-Width Maximization'' to appear in Algorithmica.
72.
Jun-Dong
Cho, and S. Raje, and M. Sarrafzadeh, ` Fast Approximation Algorithms on
Maxcut, k-coloring and k-color Ordering for VLSI Applications'', IEEE
Transactions on Computers, Vol. 47, No. 11, November 1998, pp. 1253-1266.
73.
J. D. Cho
and M. Sarrafzadeh, ``Mixed LP and Mincost Flow-based Four-bend Globar Routing
in Two Dimensional Arrays'', IEEE Transactions on CAD, September 1998,
pp. 793-802.
74.
Kia
Bazargan, Sam Kim, and M. Sarrafzadeh, “Nostradamus: Floorplanner of Uncertain
Designs'', IEEE Transactions on CAD, April 1999.
75.
Maogang
Wang, Prith Banerjee, and M. Sarrafzadeh, ``Placement with Incomplete Data'',
to appear in VLSI Design, Year 2000 special issue on Physical Design.
76.
Morgan
Enos, Scott Hauck, and M. Sarrafzadeh, ``Logic Partitioning with Replication'',
IEEE Transactions on CAD, to appear.
77.
Salil Raje
and Majid Sarrafzadeh, ``Scheduling with Multiple Voltages'', Proceedings of
1999 International Symposium on Circuits and Systems (ISCAS), May 30
- June 2, 1999, Miami, Florida.
78.
Majid
Sarrafzadeh and Toshihiko Takahashi, `A Fast Algorithm for Routability
Testing'', Proceedings of 1999 International Symposium on Circuits
and Systems (ISCAS), May 30 - June 2, 1999, Miami, Florida.
79.
Maogang
Wang and Majid Sarrafzadeh, ``Congestion Minimization During Placement,” Proceedings
of the 1999 International Symposium on Physical Design (ISPD 99),
Monterey, CA.
80.
Abhi
Ranjan, Kia Bazargan, and Majid Sarrafzadeh, "Floorplanning 1000 times
Faster,” System Level Interconnect Prediction (SLIP 99).
81.
Majid
Sarrafzadeh and Maogang Wang, ‘`Can Fast Algorithms Be Used as Good
Predictors?'' System Level
Interconnect Prediction (SLIP 99).
82.
Kia
Bazargan, Ryan Kastner, and Majid Sarrafzadeh, “3-D Floorplanning: Simulated
Annealing and Greedy Placement Methods for Reconfigurable Computing Systems'', Proc.
10th IEEE International Workshop on Rapid System Prototyping, June
16-18, 1999, Sheraton Sandkey, Clearwater, Florida, U.S.A .
83.
M.
Sarrafzadeh, “Slack Equalization Algorithm:
Precise Slack Distribution for Low-level Synthesis and Optimization,” Workshop
on Logic Synthesis (IWLS), Lake Tahoe, CA, June 1999.
84.
Kia
Bazargan and Majid Sarrafzadeh, ``Fast Online Placement for Reconfigurable
Computing Systems,'' FCCM: Symposium on Field Programmable Custom Computing
Machines, April 21-23, Napa Valley, CA, 1999.
87.
R. J. O. Figueiredo,
A. B. Fortes, Z. B. Miled, V. Taylor, R. Eigenmann, “Impact of
Computing-in-Memory on the Performance of Processor-and-Memory Hierarchies”, Proceedings
of International Conference on
Parallel and Distributed Computing Systems, September 1998.
88.
W. Smith, I. Foster
and V. Taylor, “Predicting Application Run Times Using Historical Information,”
4th Workshop on Job Scheduling Strategies for Parallel Processing
(in conjunction with IPPS 1998), April 1998.
89. Z. B. Miled, J. A. B. Fortes, R. Eigenmann, V. Taylor, “On the Implementation of Broadcast, Scatter, and Gather in a Heterogeneous Architecture,” Hawaii International Conference on System Sciences, January 1998.
90. J. Chen, V. Taylor, "ParaPART: Parallel Mesh Partitioning Tool for Distributed Systems", in IRREGULAR'99, Sixth International Workshop on Solving Irregularly Structured Problems in Parallel, in Conjunction with IEEE IPPS/SPDP'99 (13th International Parallel Processing Symposium), San Juan, Puerto Rico, April 1999.
91. J.
Chen, V. Taylor, "Mesh Partitioning for Distributed Systems: Exploring
Optimal Number of Partitions with Local and Remote Communication", 1999
SIAM Conference on Parallel Processing.
92. J. Chen, V. Taylor, "Mesh Partitioning for Distributed Systems", in Proceedings of Seventh IEEE International Symposium on High Performance Distributed Computing, Chicago, IL, July 1998.
93. J.
Geisler and V. Taylor, "Performance Coupling: A Methodology for Predicting
Application Performance using Kernel Performance," 1999 SIAM Conference
on Parallel Processing.
94. W. Smith, V. Taylor and I. Foster, "Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance," 5th Workshop on Job Scheduling Strategies for Parallel Processing with IEEE IPPS/SPDP'99 (13th International Parallel Processing Symposium), San Juan, Puerto Rico, April 1999.
95. V. Taylor, J. Fortes and R. Eigenmann, "HPAM Petaflop Point Design: Identifying Critical Research Issues for Petaflop Machines," Proceeding of the Third PetaFlop Workshop (TPF-3), in Conjunction with the Frontier Conference on Massively Parallel Computing, February, 1999.
96. J. Nocedal, L. T. Biegler, C. Schmidt, and D. Ternet, “Numerical Experience with a Reduced Hessian Method for Large-Scale Constrained Optimization,” Computational Optimization and Applications, 1999.
97. J. Nocedal and N. I. M. Gould, “The Modified Absolute-Value Factorization Norm for Trust-Region Minimization,” in High Performance Algorithms and Software in Nonlinear Optimization, pp. 225-241, eds. R. De Leone, A. Murli, P. M. Pardalos, and G. Toraldo, Kluwer, 1998.
98. J. Nocedal and Ya-xiang Yuan, “Combining Trust Region and Line Search Techniques,” in Advances in Nonlinear Programming, ed. Y. Yuan, Kluwer, 1998, pp. 153-175.
99. J. Nocedal and R. Byrd, “Active Set and Interior Point Methods for Nonlinear Optimization,” Documenta Mathematica, Vol. III, Journal de Deutschen Mathematiker-Vereinigung (Proceedings of the International Congress of Mathematicians), 1998.
100. J. Nocedal, “An Interior Point Method for Large-Scale Nonlinear Programming,” SIAM. Journal of Optimization, 9, 4, 1999, pp. 877-900.
101. J. Nocedal and S. Wright, Numerical Optimization, Springer Verlag, 1999, 650 pp.
The Center for Parallel and Distributed Computing had a total research expenditure of about $1.8 million during 1998-99. The details are provided below for the most active members of the Center, P. Banerjee. A. Choudhary, V. Taylor, A. Moshovos, S. Hauck, M. Sarafzadeh, P. Scheurmann, V. Taylor, and D. T. Lee.
“Efficient Compilation Issues in Distributed Memory Multicomputers”, Prith Banerjee, National Science Foundation, 1996-99, 0830-350-F812, $79,278.
“A MATLAB Compilation Environment for Adaptive Computing Systems,” P. Banerjee, A. Choudhary, S. Hauck, N. Shenoy, DARPA, 1998-2001, 0650-350-D400, $603,104.
“A High-Performance Distributed Computing Infrastructure,” P. Banerjee, A. Choudhary, S. Hauck, D.T. Lee, M. Sarrafzadeh, P. Scheurmann, V. Taylor, NSF, 1997-2002, 0830-350-D600, $245,746.
“Architectures, Compilers, and Configuration Management for Mass-Market Adaptive Computing,” S. Hauck, P. Banerjee, M. Sarrafzadeh, DARPA, 1997-2000, 0650-350-F482, $630,501.
“Large High-Performance Data Management, Access and Storage Techniques for Tera-scale Scientific Applications,” A. Choudhary, P. Banerjee, V. Taylor, Department of Energy, ASCI Level 2, 1998-2001, 0980-350-D201, $241,005.
“Compiler and Run-time Optimization Techniques for Parallel Programming,” A. Choudhary, NSF, 1993-2000, 0830-350-F811, $34,484.
“System Software for Input-Output in Parallel Computers,” A. Choudhary, NSF, 1996-99, 0830-350-F813, $46,286.
“Compiler and Runtime Support,” A. Choudhary, Intel Corporation, 0970-350-F702, 1996-99, $13,656.
“Modeling and Evaluation of I/O Architecture in Servers, Intel Corporation, 0970-350-F703, 1996-99, $9,405.
“Design, Development, Benchmarking and Evaluation of Parallel Applications,” A. Choudhary, Air Force Systems Command (Rome Labs), 0980-350-F711, 1996-99, $117,353.
“Scalable I/O Initiative,” A. Choudhary, Advanced Research Projects Agency (Subcontract from California Institute Technology, 0980-350-F713, 1996-99, $53,651.
“Hybrid Technology Multi-threaded Computer Architecture for Petaflops Computing,” A. Choudhary, Department of Energy (Argonne National Laboratory), 0980-350-F716, 1997-99, $36,054.
“Interoperable Data Files for High-Performance Computing,” A. Choudhary, NSF (University of Wyoming), 0980-350-F237, 1997-99, $50,099.
“Prophesy: A Hierarchical Tool for Modeling and Analyzing Parallel Scientific Application,” V. Taylor, NASA/AMES, 0720-350-F403, 1998-2000, $ 67,454.
“Prophesy: A Performance Network for Developing a Computer Algebra,” V. Taylor, NSF, 0830-350-F823, 1999-2000, $0.
“Young Investigator: A Tool for the Improvement of Uniprocessor Performance,” V. Taylor, NSF, 0830-350-F688, 1993-2000, $74, 677.
“Integration of Computer Architecture and Parallel Programming Tools Into Computer Science and Engineering,” V. Taylor, Subcontract from Purdue (NSF grant), 1998-2001, 0980-350-F728, $3,750.
“Alliance Team B: Distributed Computing,” V. Taylor, Subcontract from NCSA (NSF grant), 0980-350-F727, 1998-99, $36,032.
“Dynamically Reconfigurable Hardware for Digital Signal Processing,” S. Hauck, Motorola, University Partnership in Research Program, 0970-350-F718, 1997-2001, $7105.
“Mapping Time Oriented Tools for Logic Emulation, “ S. Hauck, M. Sarrafzadeh, NSF,
0830-350-F817, 1997-2000, $22,868
“Research Experiences for Undergraduates,” S. Hauck and M. Sarrafzadeh, NSF, 0830-350-F817, 1997-2000, $21,000.
“Algorithm Design for VLSI,” M. Sarrafzadeh, NSF, 0830-350-F614, 1995-2000, $43,516.
“Multimedia Processor Synthesis,” M. Sarrafzadeh , Motorola, 0970-350-F715, 1997-1999, $29,305.
“Application Specific Processor Synthesis (ASPROS),” M. Sarrafzadeh, Motorola Center, 0970-350-AF04, 1998-99, $78,638.
“SCALAR: Scalable Architecture for Replication on the Web Workstations,” P. Scheuermann, 0830-350-F822, 1999-2000, $0.
“Scalable and Adaptive File Management in Networks of Workstations,” P. Scheuermann, NASA, 0730-350-F477, 1996-2000, $25,162.
“Putting Log Data to Work: Mass Storage Information Systems,” P. Scheuermann, NASA, 0980-350-F721, 1998-1999, $59,515.
“Geometric Algorithm Design and Visualization,” D.T. Lee, NSF, 0830-350-F616, 1998-2001, 0830-350-F616, $49,588.
“Development of Distributed Visualization Tools,” D.T. Lee, ONR, Mathematical Science Division, 0650-350-F471, 1995-00, $48,437.
“Design of Assembly Operations for Recycling Disassembly,” D. T. Lee, NU-Motorola Center, 0970-350-AF07, 1999, $20,131.
“Nonlinear Programming: Algorithms and Software Environments,” J. Nocedal, DOE, 0680-350-Z400, 1998-2001, $73,800.
“Nonlinear Programming: Algorithms and Software Environments,” J. Nocedal, DOE, 0680-350-F440, 1987-1999, $49,900.
“Challenges in CISE: Metcomputing Environments for Optimization,” J. Nocedal, NSF, 0830-350-Z601, 1997-2000, $509,000.
“Climate Modeling on Parallel Computers,” J. Nocedal, Argonne National Lab, 0980-350-Z201, 1996-1998, $7,700.
Oct. 5, 1998:
Pramod Joisha, ECE Graduate Student, “An Experimental Evaluation of a New HPF
Framework based on a Commercial Symbolic Analysis Package.”
Oct. 12,
1998: Sachin More, ECE Graduate Student, “Workload Characterization and Task
Scheduling for Tape Resident Data.”
Oct. 19, 1998:
Prof. Nagaraj Shenoy, ECE, Northwestern Univ., “Automatic Parallelization for
Distributed Memory Parallel Machines – Issues, Challenges and State of the
Art.”
Oct. 26,
1998: Chutimet Srinilta, ECE Grad Student, “Techniques for Improving
Performance in Continuous Media Server Design.”
Oct. 26,
1998: Harsha Nagesh, ECE Grad Student, “On the Randomness of Random Numbers.”
Oct. 30,
1998: Prof. Jesus Carretero, Universida Politecnica de Madrid, “MiPFS: A
Multimedia Integrated Parallel File System.”
Nov. 6, 1998:
Prof. Hans Zima, University of Vienna, “The ESPRIT Project HPF+.”
Nov. 13,
1998: Dr. Rakesh Krishnaiyer, Argonne National Lab, “Efficient Collective Data
Transfer Mechanisms for High-Performance Parallel and Distributed Computing
Environments.”
Nov. 23,
1998: Dr. Warren Smith, Argonne
National Lab, “Metacomputing, Scheduling and Prediction.”
Dec. 1, 1998:
Prof. Francine Berman, UC San Diego, “Achieving Application Performance on
Distributed Resources.”
Jan. 12,
1999: Jian Chen, ECE Grad Student, “Issues in Mesh Partitioning for Distributed
Systems.”
Jan. 19,
1999: Prof. Andreas Moshovos, ECE, Northwestern University, “Memory Dependence
Prediction and Future Directions.”
Jan. 26,
1999: Dhruva Chakrabarti, ECE Grad Student, “A Compilation Framework for Hybrid
Applications.”
Feb. 2, 1999:
Prof. Jennifer Schopf, CS, Northwestern Univ, “Prediction and Scheduling for
Shared Clusters of Workstations.”
Feb. 9, 1999:
Prof. Babak Falsafi, Purdue Univ, “Intelligent Distributed Memory.”
Feb. 16,
1999: Pramod Joisha, ECE Grad Student, “Efficiently Exploiting Ownership Sets
in HPF.”
Feb. 23,
1999: Mahmut Kandemir, ECE Grad Student, “Improving Locality Using Loop and
Data Transformations in a Unified Framework.”
March 2,
1999: Jonathan Geisler, ECE Grad Student, “Performance Coupling: A Methodology
for Analyzing Application Performance using kernel Performance.”
March 29,
1999: Prof. Andreas Moshovos, Northwestern Univ, “Organizational Meeting.”
Apr. 5, 1999:
Malay Haldar, ECE Grad Student, “Innovative Cache Architectures.”
Apr. 12,
1999: Yelena Yesha, NASA Goddard, “Challenges in Global Electronic Commerce.”
Apr. 19,
1999: Olaf Lubeck, Los Alamos National Lab, “Performance Models and Scalability
Analysis of Wavefront Algorithms.”
Apr. 26,
1999: Xiaohui Shen, ECE Grad Student, “Prediction in Cache Protocols.”
May 10, 1999:
Antonio Gonzalez, UPC, Spain, “Register File Architectures.”
May 17, 1999:
Witod Litwin, Univ. Paris, “High Availability in Scalable Distributed Data
Structures.”
May 24, 1999:
Swee Mok, ECE Grad student, “Part Mating Strategies Using Genetic Assembly.”
·
The Center
for Parallel and Distributed Computing completed its third year on Aug. 31,
1999. It currently has 1 Director, 1
Assistant, 11 faculty members from Northwestern, 2 members from Argonne, 45
graduate students.
·
Researchers
at the Center received a continuation grant for a $1.8 million grant “A MATLAB
Compilation Environment for Adaptive Computing” from DARPA.
·
Researchers
at the Center received a continuation grant for a $1.million grant
“Architectures, Compilers, and Configuration Management for Mass Market
Adaptive Computing” from DARPA.
·
Researchers
at the Center received a continuation grant for a $906,000 grant “PANTHER: A
High-Performance Distributed Computing Infrastructure” from the National
Science Foundation.
· Researchers at the Center (Choudhary, Banerjee, Taylor) received a new grant for $800,000 on “Large High-Performance Data Management, from Department of Defense ASCI Level 2 program.