CENTER FOR

PARALLEL AND DISTRIBUTED COMPUTING

 

(http://www.ece.nwu.edu/cpdc)

 

ANNUAL REPORT

Sep. 1, 1998- Aug. 31, 1999

 

 

 

 

 

 

 

 

 

 

Prof. Prith Banerjee

Director, Center for Parallel and Distributed Computing

Walter P. Murphy Professor and Chairman

Department of Electrical and Computer Engineering

Northwestern University

2145 Sheridan Road

Evanston, IL-60208

TEL: (847) 491-3641

FAX: (847) 491-4455      

EMAIL: banerjee@ece.nwu.edu

URL:http://www.ece.nwu.edu/~banerjee

 


CONTENTS

 

  1. INTRODUCTION.............................................................................................................................. 3
  2. PERSONNEL...................................................................................................................................... 4
  3. COMPUTING RESOURCES............................................................................................................ 8
  4. RESEARCH ACTIVITIES................................................................................................................ 9
  5. INDUSTRIAL COLLABORATION.............................................................................................. 14
  6. PUBLICATIONS............................................................................................................................. 16
  7. SPONSORED RESEARCH PROJECTS........................................................................................ 23
  8. CPDC SEMINAR SERIES.............................................................................................................. 25
  9. HIGHLIGHTS OF THE CENTER ACTIVITIES 1998-99............................................................. 27

 

1. INTRODUCTION

 

The McCormick School of Engineering and Applied Science at Northwestern University established a Center for Parallel and Distributed Computing on September 1, 1996; it is headed by Prof. Prith Banerjee. The goals of the Center are to conduct leading edge, fundamental science and engineering research that will provide a sound technological basis to the technology of parallel and distributed computing. The specific research areas that will be undertaken at the Center will be all aspects of parallel and distributed computing, namely, parallel architectures, parallel compilers and software, parallel algorithms and applications, parallel and distributed database systems, and parallel I/O systems.

 

This report outlines the activities in the center from September 1, 1998 through August 31, 1999.

 


 

2. PERSONNEL

 

During 1998-99, the Center for Parallel and Distributed Computing supported one Director, one assistant, 12 faculty members from Northwestern University, two scientists from Argonne National Lab, and 45 graduate students.

2.1. DIRECTOR

 

PRITH BANERJEE: Director of the Center for Parallel and Distributed Computing, and Walter P. Murphy Professor and Chairman of Electrical and Computer Engineering, Ph.D. Univ. of Illinois at Urbana Champaign. (http://www.ece.nwu.edu/~banerjee)

       Research Interests: Parallel algorithms for VLSI computer-aided design applications, Parallelizing compilers for distributed memory multicomputers, Compilers for adaptive computing.

 

2.2. ADMINISTRATIVE ASSISTANT

 

NANCY SINGER, Administrative Assistant, Center for Parallel and Distributed Computing and Electrical and Computer Engineering Department

 

2.3. FACULTY MEMBERS

 

ALVIN BAYLISS: Professor and Chairman of  Engineering Science and Applied Mathematics, and Professor, Electrical and Computer Engineering, Ph.D. New York University (http://www.ece.nwu.edu/ecefaculty/alvin.html)

       Research Interests: Numerical analysis, large-scale scientific computing, combustion, computational fluid dynamics, solid mechanics, acoustics.

 

ALOK CHOUDHARY: Associate Professor of Electrical and Computer Engineering, Ph.D. Univ. of Illinois at Urbana-Champaign. (http://www.ece.nwu.edu/~choudhar)

        Research Interests: High-performance computing and communications, input-output, compiler and runtime systems for HPCC, multimedia systems and databases.

 

SCOTT HAUCK: Assistant Professor of Electrical and Computer Engineering, Ph.D. Univ. Washington. (http://www.ece.nwu.edu/~hauck)

       Research Interests: Reconfigurable and Multi-FPGA systems, FPGA architectures and CAD tools, rapid-prototyping, asynchronous circuit and VLSI design, parallel processing, parallel programming languages.

 

ANDREAS MOSHOVOS: Assistant  Professor of Electrical and Computer Engineering, Ph.D. University of Wisconsin, Madison. (http://www.ece.nwu.edu/ecefaculty/moshovosl.html)

       Research Interests: Computer architecture and compilers, hardware/software techniques to expose/exploit parallelism, speculation/prediction, intelligent memory systems, special purpose architectures.

 

JORGE NOCEDAL: Deputy Director of the Optimization Technology Center, and Professor of Electrical and Computer Engineering, Ph.D. Rice University. (http://www.ece.nwu.edu/ecefaculty/nocedal.html)

       Research Interests: Nonlinear Optimization, applied linear algebra, numerical analysis, software development for numerical computations.

 

DER-TSAI LEE: Professor of  Electrical and Computer Engineering, Ph.D. University of Illinois at Urbana-Champaign. (http://www.ece.nwu.edu/ecefaculty/dtlee.html)

       Research Interests: Design and analysis of algorithms, data structures, VLSI systems, computational geometry, parallel algorithms, computational complexity, algorithm visualization.

 

PETER SCHEUERMANN: Professor, Electrical and Computer Engineering, Ph.D. State University of New York, Stony Brook. (http://www.ece.nwu.edu/ecefaculty/peters.html)

       Research Interests: Physical database design, pictorial databases, parallel I/O systems, parallel algorithms for data-intensive applications, distributed database systems.

 

MAJID SARRAFZADEH: Professor of  Electrical and Computer Engineering, Ph.D. University of Illinois at Urbana-Champaign. (http://www.ece.nwu.edu/ecefaculty/majid.html)

        Research Interests: VLSI Design, computer-aided design, high-performance architectural design, design and analysis of algorithms, computational complexity.

 

NAGARAJ SHENOY:  Research Associate Professor of Electrical and Computer Engineering, Ph.D. Indian Institute of Science, Bangalore.  (http://www.ece.nwu.edu/~nagaraj)

      Research Interests:  Compilers for parallel computing.

 

ALLEN TAFLOVE,  Professor of Electrical and Computer Engineering, Ph.D. Northwestern Univ. (http://www.ece.nwu.edu/ecefaculty/taflove.html)

       Research Interests: Applied electromagnetic field theory and applications, computational electromagnetics, scattering and diffraction, supercomupting, Maxwell’s equations-based computational nonlinear optics, electromagnetic waves in nonlinear dispersive media, femtosecond, optical switches.

 

VALERIE TAYLOR, Associate  Professor of Electrical and Computer Engineering, Ph.D. Univ. California, Berkeley. (http://www.ece.nwu.edu/ecefaculty/taylor.html)

       Research Interests: Performance of parallel scientific applications, computer architecture, visual supercomputing environments.

 

TED BELYTSCHKO, Walter P. Murphy Professor and Chairman of Mechanical Engineering, Ph.D. Illinois Institute of Technology. (http://www.ece.nwu.edu/mefaculty/belytschko.html)

       Research Interests: Computational solid mechanics, computational fluid dynamics, explicit finite-element programs, parallel versions of above applications.

 

2.4. OTHER RESEARCHERS

 

IAN FOSTER, Senior Computer Scientist, Mathematics and Computer Science Division, Argonne National Lab, Ph.D. Imperial College, England. (http://www.mcs.anl.gov/people/foster)

       Research Interests: Various aspects of parallel and distributed computing: algorithms, languages, tools; techniques to integrate high-performance computing into large-scale internetworked environments; computational science efforts in developing parallel climate models.

 

WILLIAM GROPP, Senior Computer Scientist, Mathematics and Computer Science Division, Argonne National Lab. (http://www.mcs.anl.gov/people/gropp)

       Research Interests: Scientific computing, particularly in algorithms and software for adaptive and parallel methods for partial differential equations, development of PETSc library of numerical routines, development of MPI message passing interface standard and implementation.

2.5. GRADUATE STUDENTS

 

·         Dhruva Ranjan Chakrabarti, Ph.D., Compiler Support for Parallel Irregular Applications, June 2000. (Advisor: P. Banerjee)

·         Yanhong Yuan, Ph.D., Parallel Algorithms for 3D Resistance Capacitance Extraction, June 2000. (Advisor: P. Banerjee)

·         Jiwoong Victor Kim, Ph.D., Parallel Algorithms for Logic Synthesis, June 2000. (Advisor: P. Banerjee)

·         Chris Bachmann, M.S., A Hardware/Software Testbed for Adaptive Computing, Dec. 1998. (Advisor: P. Banerjee)

·         Pramod Joisha, Ph.D., Compilation of MATLAB Programs on Distributed Memory Multiprocessors, June 2001. (Advisor: P. Banerjee)

·         Suresh Periyacheri, M.S., Design and Implementation of Signal Processing Libraries in Reconfigurable Computing, June 1999. (Advisor: P. Banerjee)

·         Alex Zhi Ye, M.S./Ph.D., Compilation of High-Level C Programs for Adaptive Computing, M.S., June 1999. (Advisor: P. Banerjee)

·         Alex Jones, M.S./Ph.D., Design and Implementation of Image Processing Libraries on DSPs and FPGAs, M.S., Dec. 1999. (Advisor: P. Banerjee)

·         Anshuman Nayak, M.S./Ph.D., Design and Implementation of Signal Processing Libraries on DSP Processors, M.S., June 2000. (Advisor: P. Banerjee)

·         Abhay Kanhere, Ph.D., Compilation for Adaptive Computing Systems, 2001. (Advisor: A. Choudhary)

·         Malay Haldar, M.S., Dynamic Scheduling in Heterogeneous Adaptive Systems, 2001.  (Advisor: A. Choudhary)

·         Xioui Shen, Ph.D., Large Scale Data Management for Scientific Computing, 2001.  (Advisor: A. Choudhary)

·         Harsha Nagesh, Ph.D., Topic to be decided, 2001. (Advisor: A. Choudhary)

·         Gokhan Memik, Topic to be decided, 2001. (Advisor: A. Choudhary)

·         Sriram Raghavendran, M.S., A Prototype for Visualization of Large Datasets, Dec. 1998. (Advisor: A. Choudhary)

·         Mahmut Kandemir, Ph.D., Syracuse University, Advanced Compiler Optimization for High-Performance Computers, 1999. (Advisor: A. Choudhary)

·         Sachin More, Ph.D., Parallel I/O for Large Object-Relational Databases, expected 1999. (Advisor: A. Choudhary)

·         Sanjay Goil, Ph.D., High-Performance On-Line Analytical Processing and Data Mining on Parallel Computers, 1999. (Advisor: A. Choudhary)

·         Chutimet Srinilta, Ph.D., Design and Evaluation of High-Performance Multimedia Servers, Dec. 1998. (Advisor: A. Choudhary)

·         Katherine Nelson, M.S./Ph.D, Reconfigurable Computing Architectures.  (Advisor: S. Hauck)

·         Mark Chang, M.S./Ph.D. , Hyperspectral Application on Reconfigurable Computing.  (Advisor: S. Hauck)

·         G. Gu, M.S., Configuration Caching.  (Advisor: S. Hauck)

·         Z. Li, M.S./Ph.D., Adaptive Computing Architectures.  (Advisor: S. Hauck)

·         Shawn Phillips, M.S./Ph.D., CAD Tools for Configurable Computing.  (Advisor: S. Hauck)

·         Ankur Srivastava, M.S./Ph.D., FPGA Placement and Routing.  (Advisor: S. Hauck)

·         C. Mulpuri, M.S./Ph.D., FPGA Partitioning.  (Advisor: S. Hauck)

·         V. Karnam, M.S./Ph.D., Designing Image Processing Algorithms on FPGA Boards.  (Advisor: S. Hauck)

·         R. Guo, M.S., Web Based Programming Environment.  (Advisor: D. T. Lee)

·         Y. Chao, M.S., On a Web-based Telemanipulation and Monitoring System, Sept. 1998. (Advisor: D. T. Lee)

·         K. Aoki, Ph.D., A Web-based Distributed Programming Environment, Dec. 1999. (Advisor: D. T. Lee)

·         L. Lin, Ph.D., DAViD:  A Distributed Algorithm Visualization and Debugging System for Geometric Computing in 3D.  (Advisor: D. T. Lee)

·         Philippe Canal, M.S., Finding Paths Among Obstacles in Two-Layer Interconnection Model, Sept. 1998.  (Advisor: D. T. Lee)

·         D. Y. Seo, Ph.D., On the Complexity of Bicriteria Spanning Tree Problems for a Set of Points in the Plane, Dec. 1999.  (Advisor:  D. T. Lee)

·         Amirali Baniasadi, M.S./Ph.D., Topic to be decided. (Advisor:  Andreas Moshovos)

·         Amir Farahi, M.S./Ph.D., Topic to be decided.  (Advisor:  Andreas Moshovos)

·         Jean Liu, M.S.,  June 1999. (Advisor: M. Sarrafzadeh)

·         Shunliang Wang, M.S., June 1999.  (Advisor: M. Sarrafzadeh)

·         Eli Bozorgzadeh,, MS/PhD., Physical Design in FPGAs,  (Advisor: M. Sarrafzadeh)

·         Vivian Guzman,  M.S./Ph.D., Interaction of Physical Design and Synthesis.  (Advisor: M. Sarrafzadeh)

·         Seda Ogrenci, MS/Ph.D., Multimedia Application on FPGAs.   (Advisor: M. Sarrafzadeh)

·         Kia Bazargan, Ph.D., Reconfigurable Computing.  (Advisor: M. Sarrafzadeh)

·         Abhshek Ranjan, M.S./Ph.D., Interaction of Physical Design and Synthesis.  (Advisor: M. Sarrafzadeh)

·         Maogang Wang , Ph.D., Physical Design.  (Advisor: M. Sarrafzadeh)

·         Todd Haverkos, M.S., Chip Design.  (Advisor:  M. Sarrafzadeh)

·         Xioajian Yang, M.S./Ph.D., Physical Design.  (Advisor:  M. Sarrafzadeh)

·         B. Chen, Ph.D, Title to be decided, 2000.  (Advisor: P. Scheuermann)

·         L. Singh, Ph.D., Generating Constraint Association Rules from Semi-Structured Data, 1999.  (Advisor: P. Scheuermann)

·         M. Sayal, Ph.D., Title to be decided, 2000.  (Advisor: P. Scheuermann)

·         J. Shim, Ph.D., Title to be decided, June 1999. (Advisor: P. Scheuermann)

·         J. Chen, Ph.D., Performance Evaluation of Parallel Applications.  (Advisor: V. Taylor)

·         Jonathan Geisler, M.S., Ph.D., Performance Evaluation of Parallel Applications.  (Advisor: V. Taylor)

·         Zhiling Lan, Ph.D., Performance Evaluation of Parallel Applications.   (Advisor: V. Taylor)

·         Xin Li, M.S./Ph.D., Title to be decided.  (Advisor: V. Taylor)

·         Warren Smith, M.S./Ph.D., Scheduling Parallel Applications.  (Advisor: V. Taylor)


 

3. COMPUTING RESOURCES

 

The Center has the following computing resources:

 

·         A 16 processor IBM SP-2 distributed memory message passing multicomputer with each node having 128 MB memory, 2 GB disk, and a IBM Power2 processor.

·         One IBM model 42T RS/6000 workstation used as a control workstation for the IBM SP-2 machine.

·         An 8 processor IBM J-40 shared memory multiprocessor, with 8 PowerPC 601 processors, 1 GB of memory, 14 GB disk.

·         An 8 processor SGI Origin 2000 distributed shared memory multiprocessor with eight MIPS R10000 processors, 2 GB memory

·         Ten  C-110 HP workstations with a 120 MHz PA 7200 CPU, 128 MB memory, 4 GB disk, 20 inch color monitors.

·         Fifty HP C-180 workstations with 180 MHz PA 8000 CPU, 128 MB memory, 20 inch monitors

·         Four Dell personal computers with 300 Intel Pentium II processors, 128 MB memory, 2 GB disk, 17-inch color monitors, CDROM drive.

·         The above equipment are all connected via a local area Ethernet network which will be upgraded shortly into an OC-3 ATM based network operating at 155 Mbps.


 

4. RESEARCH PROJECTS

 

This section gives an overview of some of the research projects at the Center for Parallel and Distributed Computing.

We have research funding from a variety of sources including the National Science Foundation (NSF), Defense Advanced Research Projects Agency (DARPA), and various industries including IBM, Intel, and Motorola.

Web-based CAD Computing Environment (P. Banerjee and S. Hauck - WADE Project)

 

 In the WADE project, we have developed a novel framework of a web-based automated parallel CAD environment. The design files of a user working in a remote machine are transparently shipped to the WADE compute center, the relevant computations are performed in a parallel environment, and the results are returned back to the user. A job submission and scheduling tool ensures proper load balance and maximal usage of the various parallel machines. At present three parallel CAD applications are supported: (1) a parallel placement tool for standard cell circuits (2) a parallel fault simulator for sequential circuits (3) a parallel VHDL logic simulator. We have developed techniques for ensuring the safety and security of data files in the system.  We are developing techniques for performing extremely high performance mappings to FPGA-based systems.  These have the potential to significantly improve the utility of logic emulators, adaptive computing systems, and other multi-FPGA systems.

 

This research has been supported by NSF, DARPA, and Quickturn, has supported 2 Ph.D. students, 1 B.S. student, and generated 3 journal papers and one conference paper.

 

Parallel and Distributed Algorithms for VLSI CAD (P. Banerjee - PROPERCAD Project)

 

In this research, we have investigated parallel algorithms for several CAD tools with the objective of reducing the design turnaround times of the CAD tools, and to enable larger CAD problems to be solved. All the tools are developed using the Message-Passing Interface (MPI).

 

In the area of parallel logic synthesis, we have investigated parallel algorithms for various key steps of a commercial logic synthesis software called BUILDGATES from Ambit Design Systems, and parallelized them for efficient execution on a network of workstations. In some related work, we have done some novel work in the area of low-power logic synthesis within the framework of the BUILDGATES tools.

 

In the area of logic verification, we have developed two parallel algorithms for constructing Binary Decision Diagrams (BDDs). The first algorithm is based on fine grained parallelism, and did not result in any speedups. The second algorithm was based on partitioning BDDs by fan-in cones and resulted in speedups of about 3 on 8 processors of an SGI Origin shared memory multiprocessor. We have also done some novel work in the area of equivalence checking of multi-phase sequential circuits using re-timing.

 

In the area of behavioral synthesis, we have developed some parallel algorithms for simultaneous scheduling, binding, and floor-planning. In comparison to traditional approaches which separates high-level synthesis from physical design, we have devised an algorithm where these stages interact closely. This results in solutions with lower latency and area, but require increased computation times. Experimental results on a set of synthesis benchmarks show speedups of about 7.8 on 8 processors on an IBM SP2 message-passing multiprocessor.

 

Finally, in the area of behavioral simulation, we have developed an efficient sequential VHDL simulator based on compiled event driven simulation. Two approaches to parallelizing compiled VHDL simulation have been developed, one based on fine grain task scheduling, and another based on coarse grain partitioning based on fan-in cones. While the first algorithm did not result in any speedups, the second algorithm achieved speedups of about 3 on 8 processors of an SGI Origin shared memory multiprocessor.

 

For each of the above parallel algorithms, we have benchmarked them on a network of workstations and the SGI Origin multiprocessor obtained as part of the NSF grant. This project was supported by DARPA and supported 3 Ph.D. students. It resulted in 3 journal papers and 13 conference publications.

 

High-Performance CAD for Low-power Design (M. Sarrafzadeh - NUCAD Project)

 

With the growing popularity of personal computing and communication devices, the high-performance as well as low power consumption is much desirable in the synthesis of VLSI circuits and systems. Over the years, much research progress has been made towards the timing, area and power optimization. However, when the designers look into the possibility of further improvement over circuit area/power, the timing constraints turn out to be a strong limiting factor. This trend can be exacerbated at lower levels (such as gate-level and physical-level) where relatively accurate timing models are available. Therefore, the tradeoff between delay and area/power has to be made, in general, to achieve the result with reasonably good quality. Since the timing performance of a circuit depends on critical paths, the logic gates on non-critical paths have typically large timing slacks (i.e. the difference between required time and arrival time). The problem is thus translated into how to exploit these excessive slacks effectively during the process of optimization. Most of prior efforts do not deal directly with the  slacks.  In the past year  we directly targeted the analysis of the  slack distribution and developed a Slack- Equalization-Algorithm to maximize the objective function (power or area) under the timing constraints. This  algorithm can find many applications in low-level  synthesis and optimization ranging from gate-level power/area optimization to layout-level  placement and routing design. Experimental results verify the effectiveness of the proposed technique.

 

This research has been supported by NSF and Motorola, and has supported 8 students (5 female, 1 minority), and has generated 3 journal papers, and 10 conference papers.

 

CAD for Adaptive Computing Systems (S. Hauck, P. Banerjee, M. Sarrafzadeh - CHIMAERA Project)

 

Reconfigurable logic has the potential to alter the way most types of computations are performed.  By integrating FPGAs into host processors, custom instruction sets and functional accelerators can be generated on an application by application basis.  However, the architecture of these devices, applications, and configuration management are significant concerns in these systems.  This research is attempting to overcome these techniques via new approaches in each of these areas.  We have developed new CAD algorithms to support these devices, including physical design and configuration management techniques.  We have also investigated chip architectures, compilers, and other aspects of this problem.  Specifically, this year we have developed novel compression algorithms by minimizing the reconfiguration times in FPGAs, designed high-performance carry chains for faster arithmetic computation, developed CAD algorithms for template based physical design using parameterized designs of adders and other components.  Finally, this year we have developed a C compiler based on GCC for mixed processor-FPGA systems where sequences of normal processor instructions are replaced by complex reconfigurable logic instructions.  The compiler has been evaluated with the use of a simulator on various benchmarks, and speedups of about 1.5 to 7 have been measured over a MIPS IV processor.

 

This research has been supported by NSF, DARPA, and Motorola, has supported 10 graduate students and 8 undergraduates, and generated 2 journal and 3  conference papers.

 

Parallel and Scalable OLAP and Data Mining (PARSIMONY Project – A. Choudhary)

 

Multidimensional Analysis and On-Line Analytical Processing (OLAP) operations require summary information on multi-dimensional data sets. Most common are aggregate operations along one or more dimensions of numerical data values. Simultaneous calculation of multi-dimensional aggregates are provided by the Data Cube operator, used to calculate and store summary information on a number of dimensions. This is computed only partially if the number of dimensions is large. Query processing for these applications require different views of data to gain insight and for effective decision support. These may either be answered from a materialized cube in the data cube or calculated on the fly.

 

The multi-dimensionality of the underlying problem can be represented both in relational and multi-dimensional databases, the latter being a better fit when query performance is the criteria for judgment. Relational databases are scalable in size for OLAP and multidimensional analysis and efforts are on to make their performance acceptable. On the other hand multi-dimensional databases have proven to provide good performance for such queries, although they are not very scalable. In this project we are addressing scalability in multi-dimensional systems for OLAP and multi-dimensional analysis.

 

Multidimensional databases store data in multidimensional structure on which analytical operations are performed. A challenge for these systems is how to handle large and sparse data sets with large dimensionality. Parallel computing is necessary to address the performance issues for these data sets.

 

Chunking is used to store data either as a dense block using multi-dimensional arrays or a sparse set using a Bit encoded sparse structure (BESS). Chunks provide a multi-dimensional index structure for efficient dimension oriented data accesses much the same as multi-dimensional arrays do. Operations within chunks and between chunks are a combination of relational and multi-dimensional operations depending on whether the chunk is sparse or dense.

 

In this research we address issues of data partitioning, parallelism, schedule construction, data cube building, chunk storage and memory usage on a distribute memory machine architecture. The following figures summarize the various available options, especially in terms of storage of cubes, parallelism at different levels, aggregation calculation orderings and the chunk access for aggregations.

 

High-Performance Data Management and I/O (SHPDM Project – A. Choudhary)

 

Management, storage, efficient access, analysis of 100's of GBs to 100's of TBs of data, that is likely to be generated and/or used in various phases of large-scale scientific experiments and simulations. Current data management and analysis techniques do not measure up to the challenges posed by such large-scale requirements in performance, scalability, ease of use and interfaces. Terascale computing requires newer models and approaches to solving the problems in storing, retrieving, managing, sharing, visualizing, organizing and analyzing data at such a massive scale.  Also, since problem solving in scientific applications requires collaboration and usage of distributed resources, the problem of data management and storage is further exacerbated.

 

In this project, we are developing a Scalable High-Performance Data-Management System (SHPDM) that will provide support for data management, query capability and high-performance accesses to large datasets. SHPDM will provide the flexibility of databases for indexing, searches, management of objects, creating and keeping histories and trails of data accesses, while providing high-performance access methods and optimizations (pooled striping, prefetching, caching, collective I/O) for accessing large-scale data-objects found in scientific computations. This allows decoupling data management from data accesses thereby allowing us to leverage many high-performance I/O methods that have already been developed or are under development.

 

 

Scalable and high-performance file systems (P. Scheuermann—Web++ project)

 

We have developed Web++, a prototype system based on  user-transparent geographic replication which aims at improving the response time and the reliability of the HTTP service. Web++  extends the current Web client/server architecture with additional capabilities. The extension is provided via Java servlets at the server side and applets at the client side. The servlets pre-process the requested documents such that each logical URL is replaced by a list of physical URLs corresponding to the list of the resource's replica. An applet is automatically downloaded to the client machines together with the first request of each new user session. For subsequent requests within a user session, the applet chooses the physical URL that corresponds to a resource held by an available server that is expected to deliver the best response time for the server.

 

 

Compilation Techniques for Distributed Environment (P. Banerjee, A. Choudhary - PARADIGM Project)

 

In the past year, we have done extensive research in four areas in compilation techniques for High Performance Fortran (HPF) Programs for parallel and distributed machines. First, we have developed the PARADIGM 2.0 compiler which uses a linear algebra framework to handle arbitrary data distributions and data alignments in HPF programs and generate message-passing codes for programs having general affine subscript expressions. We have benchmarked the performance of our PARADIGM2.0 compiler with the PGHPGF compiler from Portland Group and the XLHPF compiler from IBM, and demonstrated thath our compiler generates more efficient code with faster execution times, lower communication costs, and comparable compilation times as the commercial compilers.  We have performed a lot of experimental measurements of the PARADIGM compiled codes on an IBM SP2 and a network of workstations. Second, we have developed a compiler and runtime system that simultaneously supports both regular and irregular accesses using a unique interval representation called PILAR. Third, we have developed a linear algebra framework for applying both loop and data transformations in order to improve temporal and spatial cache locality in distributed shared memory multiprocessors.

 

This research has been supported by NSF, DARPA, Intel, and Air Force Research Labs, and has supported 3 Ph.D. students, 1 M.S. student, and generated 3 journal papers, and 9 conference papers. One M.S. student has graduated.

 

 A MATLAB Compilation Environment for Distributed Heterogeneous Adaptive Computing (P. Banerjee, A. Choudhary, S. Hauck - MATCH Project)

 

The objective of the MATCH project is to make it easier for users to develop efficient codes for adaptive computing systems. We will develop a compiler that will take in applications written in a high-level language (MATLAB) and generate efficient low level code that will run on a distributed environment of commercial-off-the-shelf (COTS) FPGAs, embedded processors, and digital signal processors. The specific goals of the MATCH project are to: (1) Enable the users to reduce the code development times for adaptive applications from weeks using manual approaches to hours using compiler tools. (2) Produce efficient codes that are within a factor of 2-4 of the best manual approach with respect to optimizing resources under performance constraints, or optimizing performance under resource constraints.

 

We have developed an adaptive computing testbed consisting of a VME chassis, some Motorola embedded boards, a Transtech DSP board, an Annapolis FPGA board, and a FORCE 5V host board. We have developed various MATLAB library functions to be executed on various components of our testbed. We have designed some automated heuristics using mixed integer linear programming to map a given dataflow graph of a MATLAB program on heterogeneous resources while optimizing resources under timing constraints. We have also developed a basic MATLAB compiler which automatically calls library functions on various parts of the testbed.

 

This project is supported by DARPA and supports 4 Ph.D. students, and 5 M.S. students, and has generated 2 journal papers, and 4 conference papers.

 

Mesh Partitioning Techniques for Distributed Applications (V. Taylor - PART Project)

 

The objective of the PART project is to develop mesh partitioning techniques that consider the static and dynamic features of distributed systems and adaptive applications.   These techniques are integrated into a tool called PART.

Currently, PART performs static mesh partitioning of applications, exploiting the heterogeneity in processor and network performance that exists in distributed systems, as well as the heterogeneity in the application.  PART is distinguished from conventional mesh partitioning tools and techniques, for which only heterogeneous processor performance can be exploited.   Future work entails developing dynamic techniques applicable to distributed systems.

 

PART uses simulated annealing, which is a computationally intense optimizing method.  This year the focus was on parallelizing PART, to reduce the time required to generate the partitions.  Experiments on the IBM SP multiprocessors at the Northwestern University Center on Parallel and Distributed Computing and at the Argonne National Laboratory demonstrated  excellent scalability results of the parallel PART; in some cases the results indicated superlinear speedup.  Further, PART was used to partition an explicit and an implicit finite element applications for execution on two IBM SP systems, one located at San Diego Supercomputer Center and the second located at Argonne National Laboratory.  The results of using PART with these applications indicated up to 35% improvement in efficiency as compared to the widely used METIS system, for which only heterogeneous processor performance is exploited.

 

This research supported in part by a continuing NYI grant from National Science Foundation and AlliedSignal, Inc.  The funding has been used to support 3 PhD students and 3 undergraduate students.    This work has generated 4 journal papers and 9 conference papers.

 

Interactive Visualization of Parallel Geometric Algorithms (D. T. Lee - GEOSHEET Project)

 

 During the past year, we conducted research in the area of geometric computing, and robotics control. We concentrated on the design of web-based interactive visualization tool to facilitate visualization of geometric algorithms. An interactive visualization tool, called GeoSheet, which was running under Sun OS Unix workstations, has been installed on the HP server and is accessible by all students. Students in the following two courses have made use of the tool to do the machine programs. This tool allows the user to enter geometric data, such as rectangles, polygons, etc., directly on the screen, and see computed output on the same or possibly different machine. The screen I/O can be distributed over the Internet.

 

This research was supported by ONR and NSF and supported 2 M.S. and 2 Ph.D. students (one female


 

5. INDUSTRIAL COLLABORATION

 

Researchers in the Center have had numerous interactions with industry.  Details of the interactions are outlined below.

Integrated Sensors Inc.

 

Prof. Banerjee, Choudhary, Hauck, and Shenoy are interacting with Integrated Sensors Inc. (Contact: Dr. Milissa Benincasa) regarding the MATCH compiler project. Specifically, they are using the RTEExpress software from ISI

as a parallel MATLAB library on the MATCH testbed. In return, ISI will get the prototype of the MATCH compiler

for their research.

Cadence Design Systems

 

Prof. Banerjee is working with the Ambit Group of Cadence Design Systems (contact: Dr. Devadas

Varma) in the area of parallel algorithms for logic synthesis.  We are developing parallel algorithms for the Envisia synthesis tools from Cadence that are suitable for execution on networks of workstations.

Allied Signal

 

Prof. Taylor has been supported in part by a grant from Allied Signal, Inc. for her work on the PART tool.  The funding provided support for a graduate student and visits to AlliedSignal, Inc.  One of the initial applications used to develop PART is the finite element analysis of an aircraft brake system, which is of great interest to AlliedSignal, Inc.

Sandia National Lab

 

Prof. Alok Choudhary has a continuing relationship with Sandia National Lab in a project entitled, "Runtime Libraries for Large-Scale Parallel I/O" where he is extending the PASSION parallel I/O library on the ASCI parallel supercomputer being developed at Sandia Labs.

Los Alamos National Lab

 

Professors Banerjee, Choudhary and Hauck are working jointly with Los Alamos National Lab on the MATCH

project on adaptive computing systems (Contacts: Dr. Jeff Block, and Dr. Mark Dunham). We will transfer the technology of the MATCH compiler that we are developing to them.

Air Force Research Labs

 

Professors Choudhary and Banerjee are working jointly with Air Force Research Labs on developing a parallel version of a Space Time Adaptive Application (STAP) (Dr. Rich Linderman). This application is going to be used as the application driver in the MATCH compiler project supported by DARPA.

Argonne National Lab

 

Prof. Taylor who has a guest appointment with the MCS Division at Argonne National Laboratory, has had a long collaborative relationship with researchers at Argonne.  Prof. Taylor is currently involved with three collaborative projects with Argonne National Laboratory. First, is the collaborative project related to the analysis of the aircraft brake system.  The particular finite element code used for the analysis was developed by Dr. Canfield at Argonne.  The second project relates to the resource management for distributed system project. This project is part of the GLOBUS project.

 

Prof. Choudhary has an ongoing project in the area of parallel I/O software with Argonne National Lab (contact: Dr. Rakesh Bordawarkar).

 

In addition, two of the scientists from Argonne National Lab, Dr. Ian Foster and Dr. William Gropp, are members of the Center. They have continuing collaboration in the areas of the PARADIGM compiler for distributed memory multicomputers, the PETSI library for parallel runtime software, the PASSION project on parallel I/O, and the GLOBUS project on resource management of distributed systems.

NASA Ames Laboratory

 

Prof. Valerie Taylor has established a relationship with NASA Ames in the project entitled "Prophesy:  A Hierarchical Tools for Modeling and Analyzing Parallel, Scientific Applications", where she is developing methods for modeling the coupling between kernels and the performance impact on the application.


 

6. PUBLICATIONS

 

1.        G. Hasteer, A. Mathur, and P. Banerjee, "Efficient Equivalence Checking of Multi-Phase Designs Using Phase Abstraction and Retiming.," ACM Transactions on Design Automation of Electronic Systems (TODAES) Special Issue on High Level Design, Validation and Testing, Oct. 1998, pp. 600-625.

 

2.        M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, J. Ramanujam, "A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts," IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 2, February 1999.

 

3.        J. Chandy and P. Banerjee, "A Parallel Circuit Partitioned Algorithm for Timing-Driven Standard Cell Placement," Journal of Parallel and Distributed Computing, vol. 57., No. 1, pp. 64-90, April 1999.

 

4.        P. Prabhakaran and P. Banerjee, "Parallel Algorithms for Force-Directed Scheduling of Flattened and Hierarchical Signal Flow Graphs," IEEE Transactions on Computers, 1999.

 

5.        M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy, "A Global Communication Optimization Technique Based on Data Flow Analysis and Linear Algebra,"  ACM Trans. on Programming Languages and Systems (TOPLAS), Vol. 21, No. 5, Sept. 1999.

 

6.        M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A Matrix-Based Approach to Global Locality Optimization,"  Journal of Parallel and Distributed Computing, Special Issue on Compilation and Architectural Support for Parallel Applications, Vol. 58, No. 2, Aug. 1999, pp. 190-235.

 

7.        V. Krishnaswamy and P. Banerjee, "Parallel Compiled Event Driven VHDL Simulation," Proc. Int. Conf. Supercomputing (ICS-98), Melbourne, AUSTRALIA, July 1998.

 

8.        D. R. Chakrabarti, N. Shenoy, A. Choudhary, and P. Banerjee, "An Efficient Uniform Run-time Scheme for Mixed Regular-Irregular Applications," Proc. Int. Conf. Supercomputing (ICS-98), Melbourne, AUSTRALIA, July 1998.

 

9.        M. Kandemir, A. Choudhary, N. Shenoy, J. Ramanujam, and P. Banerjee, "A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests," Proc. Int. Conf. Supercomputing (ICS-98), Melbourne, AUSTRALIA, July 1998.

 

10.     M. Kandemir, J. Ramanujam, A. Choudhary, P. Banerjee, "An iteration space transformation algorithm based on an explicit data layout representation for optimizing locality," Proc. Workshop on Languages and Compilers for Parallel Computing (LCPC-98), Chapel Hill, NC, Aug. 1998.

 

11.     M. Kandemir, N. Shenoy, P. Banerjee, J. Ramanujam, and A. Choudhary, "Minimizing Data and Synchronization Costs in One-Way Communication," Proc. Int. Conf. Parallel Processing (ICPP98), Minneapolis, MN, Aug. 1998.

 

12.     Z. Xing and P. Banerjee, "A Parallel Algorithm for Timing-Driven Global Routing for Standard Cells," Proc. Int. Conf. Parallel Processing (ICPP98), Minneapolis, MN, Aug. 1998.

 

13.     M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy, and P. Banerjee, "Enhancing Spatial Locality using Data Layout Optimizations," Proc. European Conference on Parallel Processing (Euro-Par'98), Southampton, ENGLAND, Sep. 1998.

 

14.     A. Mishra and P. Banerjee, "A Fault Tolerant Multi-Grid Algorithm," Proc. Parallel and Distributed Computing Systems (PDCS98), Chicago, Sep. 1998.

 

15.     S. Roy, A. Harms and P. Banerjee, "A Low Power Logic Optimization Methodology Based on a Fast Power Driven Mapping," Proc. Int. Conf. Computer Design (ICCD-98), Austin, TX, Oct. 1998.

 

16.     M. Kandemir, A. Choudhary, J. Ramanujam, N. Shenoy, and P. Banerjee, "A Matrix-Based Approach to the Global Locality Optimization Problem," Proc. Parallel Architectures and Compilation Tecniques (PACT-98), Paris, FRANCE, Oct. 1998.

 

17.     S. Roy and P. Banerjee, "Power Drive: A fast, canonical POWER estimator for DRIVing synthEsis," Proc. 1998 International Conference on Computer-Aided Design (ICCAD-98), San Jose, CA, Nov. 1998.

 

18.     G. Hasteer, A. Mathur, and P. Banerjee, "Efficient Equivalence Checking of Multi-Phase Designs Using Retiming", Proc. 1998 International Conference on Computer-Aided Design (ICCAD-98), San Jose, CA, Nov. 1998.

 

19.     D. Chakrabarti, P. Joisha, J. Chandy, D. Krishnaswamy, V. Krishnaswamy, and P. Banerjee, "WADE: A Web-Based Automated Parallel CAD Environment," Proc. International Conference on High Performance Computing (HiPC'98), Chennai, India, Dec. 1998.

 

20.     M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework" Proc. 31st Int. Symp. on Micro-Architecture (MICRO31), Dallas, Texas, Dec. 1998.

 

21.     P. Prabhakaran, J. Crenshaw, P. Banerjee, and M. Sarrafzadeh, "Simultaneous Scheduling, Binding and Floorplanning for Interconnect Power Optimization," Proc. 1999 VLSI Design Conference, Goa, India, Jan. 1999.

 

22.     Y. Yuan and P. Banerjee, "ICE: Incremental 3-Dimensional Capacitance and Resistance Extraction for an Iterative Design Environment," Proc. 9th Great Lakes Symposium on VLSI, Ann Arbor, MI, March 1999.

 

23.     J. Crenshaw, M. Sarrafzadeh, P. Banerjee, and P. Prabhakaran, “An Incremental Floor-Planner,” Proc. 9th Great Lakes Symposium on VLSI, Ann Arbor, MI, March 1999.

 

24.     Y. Yuan and P. Banerjee, “Fast Potential Integrals for Signal Integrity Analysis,” Proc. ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU’99), Monterey, CA, March 1999.

 

25.     D. Chakrabarti and P. Banerjee, “A Novel Compilation Framework for Supporting Semi-Regular Distributions in Hybrid Applications,” Proc. 1999 International Parallel Processing Symposium (IPPS’99), San Juan, Puerto Rico, April 1999, pp. 597-602.

 

26.     S. Mohan, P. Mazumder, D. Krishnaswamy, P. Banerjee, and E. M. Rudnick, “Parallel Implementations,” Chapter 8 in Genetic Algorithms for VLSI Design, Layout & Test Automation, P. Mazumder and E. M. Rudnick, eds., pp. 252-294, Prentice Hall PTR, 1999.

 

27.     M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, “An Iteration Space Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality,” in Languages and Compilers for Parallel Computers, S. Chatterjee et al., eds., Lecture Notes in Computer Science, Springer-Verlag, 1999.

 

28.     M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality," Proc. 1999 International Parallel Processing Symposium (IPPS'99), San Juan, Puerto Rico, April 1999.

 

29.     P. Joisha and P. Banerjee, "PARADIGM (version 2.0): A New HPF Compilation System," Proc. 1999 International Parallel Processing Symposium (IPPS'99), San Juan, Puerto Rico, April 1999.

 

30.     Y. Yuan and P. Banerjee, "Incremental Capacitance Extraction and Its Application to Iterative Timing Driven Detailed Routing," 1999 International Symposium on Physical Design (ISPD-99), Monterey, CA, April 1999.

 

31.     J. Chen and P. Banerjee, "Parallel Construction Algorithms for BDDs," 1999 International Symposium on Circuits and Systems (ISCAS'99), Orlando, FL, June 1999.

 

32.     S.  Roy, A. Harms, and P. Banerjee, “An a-approximate Algorithm for Delay-Constraint Technology Mapping,” Proc. 1999 Design Automation Conference (DAC’99), June 1999.

 

33.     A. Mishra and P. Banerjee, "An Algorithm Based Error Detection Scheme for the Multigrid Method," Proc. of the 1999 International Symposium on Fault Tolerant Computing, Madison, WI, June 15-18, 1999.

 

34.     M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "An ILP Approach for Optimizing Cache Locality," 1999 ACM International Conference on Supercomputing (ICS'99), Rhodes, Greece, June 1999.

 

35.     M. Kandaswamy, M. Kandemir, A. Choudhary, and D. Bernholdt, “An Experimental Study to Analyze and Optimize Hartree-Fock Application’s I/O With PASSION,” International Journal of High Performance Computing Applications, Vol. 12, No. 4, pp. 411-439, Winter 1998.

 

36.     M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Transactions on Computers, Vol. 48, No. 2, Feb. 1999.

 

37.     R. Krishnaiyer, I. Foster, and A. Choudhary, “Performance of a Remote I/O Library in High-Performance Distributed Computing Environments,” Proc. International Conference on Parallel and Distributed Computing and Systems (PDCS’98), Chicago, IL, Sept. 1998.

 

38.     J. Carretero, J. No, and A. Choudhary, “Parallel I/O for Irregular Applications on Distributed Memory Machines,” IX Jornadas de Paralelismo, San Sebastian, SPAIN, Sept. 1998.

 

39.     J. Carretero, W. Zhu, X. Shen, and A. Choudhary, “MiPFS:  A Multimedia Integrated Parallel File System,” International Joint Conference on Information Systems, Raleigh, NC, Oct. 1998.

 

40.     S. Goil and A. Choudhary, “High Performance Multidimensional Analysis of Large Datasets,” Proc. ACM First International Workshop on Data Warehousing and OLAP (DOLAP’98) (in conjunction with CIKM’98), Washington, DC, Nov. 1998.

 

41.     S. Goil and A. Choudhary, “High Performance Multidimensional Analysis and Data Mining,” Proc. High Performance Networking and Computing Conference (SC’98), Orlando, FL, Nov. 1998.

 

42.     S. More and A. Choudhary, “Extended Collective I/O for Efficient Retrieval of Large Objects,” Proc. 5th International Conference on High Performance Computing (HiPC’98), Chennai, INDIA, Dec. 1998.

 

43.     J. No, J. Carretero, and A. Choudhary, “Optimizations to Provide High-Performance Parallel I/O for Irregular Applications,” Proc. Applied Informatics ’99, Innsbruck, AUSTRIA, Feb. 1999.

 

44.     J. Carretero, J. No, and A. Choudhary, “Optimizing I/O for Irregular Applications on Distributed-Memory Machines,” Proc. ACPC’99, Salzburg, AUSTRIA,  Feb. 1999.

 

45.     M. Kandemir and A. Choudhary, “System-Level Meta-Data for High Performance Data Management,” Proc. Third IEEE Meta-Data Conference, Bethesda, MD, April 1999.

 

46.      S. Goil and A. Choudhary, “Design and Implementation of a Scalable Parallel System for Multidimensional Analysis,” Proc. 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP’99), San Juan, Puerto Rico, April 1999.

 

47.     W. Liao, D. Weiner, P. Varshney, and A. Choudhary, “Multi-Threaded Design and Implementation of Parallel Pipelined STAP on Parallel Computers with SMP Nodes,” Proc. 13th International Parallel Processing Symposium, 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP 1999), San Juan, Puerto Rico, April 1999.

 

48.     J. No, J. Carretero, and A. Choudhary, “High Performance Parallel I/O Schemes for Irregular Applications on Clusters of Workstations,” Proc. 7th International Conference on High-Performance Computing and Networking (HPCN Europe 1999), Amsterdam, THE NETHERLANDS, April 1999.

 

49.     M. Kandemir, J. Ramanujam, and A. Choudhary, “Restructuring I/O-Intensive Computations for Locality,” Proc. 7th Intenantional Conference on High-Performance Computing and Networking (HPCN Europe 1999), Amsterdam, THE NETHERLANDS, April 1999.

 

50.     S. Goil and A. Choudhary, “An Infrastructure for Scalable Parallel Multidimensional Analysis,” Proc. 11th International Conference on Scientific and Statistical Databases (SSDBM’99), Cleveland, OH, July 1999.

 

51.     S. Goil and A. Choudhary, “A Parallel Scalable Infrastructure for OLAP,” Proc. International Database Engineering and Applications Symposium (IDEAS'99), Montreal, CANADA, August 1999.

 

52.     M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, R. Thakur, and A. Choudhary, “Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems,” Proc. 8th IEEE International Symposium on High Performance Distributed Computing, Redondo Beach, CA, Aug. 1999.

 

53.     S. Hauck, G. Borriello, C. Ebeling, "Mesh Routing Topologies for Multi-FPGA Systems", IEEE Transactions on VLSI Systems, Vol. 6, No. 3, September, 1998.

 

54.    S. Hauck, Z. Li, E. J. Schwabe, "Configuration Compression for the Xilinx XC6200 FPGA", to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

 

55.    M. Enos, S. Hauck, M. Sarrafzadeh, "Evaluation and Optimization of Replication Algorithms for Logic Bipartitioning", to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

 

56.     Z. Li, S. Hauck, "Don't Care Discovery for FPGA Configuration Compression", Proc. ACM/SIGDA Symposium on Field-Programmable Gate Arrays, pp 91-98, 1999.

 

57.     S. Hauck, W. D. Wilson, "Abstract: Runlength Compression Techniques for FPGA Configurations", Proc.  IEEE Symposium on FPGAs for Custom Computing Machines, 1999.

 

  1. L.W. Lee, P. Scheuermann and R. Vingralek, “File Assignment in Parallel I/O Systems with Minimal Variance of Service Time”, to appear in IEEE Transactions on Parallel and Distributed Systems.

 

59.     J. Shim, P.Scheuermann, R.Vingralek, "Proxy Cache Design: Algorithms, Implementation and Performance”, to appear in IEEE Transactions on Knowledge and Data Engineering, special issue on Web Technologies, Vol. 11, No. 4, July/August 1999.

 

  1. R. Vingralek, Y. Breitbart, M. Sayal and P. Scheuermann, “Web++: A System for Fast and Reliable Web Service”, Proc. USENIX 1999 Annual Conference, June 1999.

 

  1. R. Vingralek, Y. Breitbart, M. Sayal and P. Scheuermann, “A Transparent Replication of HTTP  Service”, Proc. Intern. Conf. On Data Engineering, Sydney, Australia, March 1999.

 

62.     L. Singh, B. Chen, R. Height, and P. Scheuermann, “An Algorithm for Constrained Association Rule Mining,” Proc. Pacific Asia Knowledge Discovery in Databases Conference, Beijing, CHINA, April 1999.

 

63.     L. Singh, M. Macie, J. Woytek, J. Pan, and P. Scheuermann, “Gaining Knowledge for Ill-Structured Data,” Proc. Of the AFCEA Federal Data Mining Symposium, McLean, VA, March 1999, pp. 155-161.

 

64.     L. Singh, B. Chen, and P. Scheuermann, “IRIS:  Our Prototype Rule Generation System,” Proceedings SPIE, Orlando, FL, 1999.

 

65.     J. Shim, R. Vingralek, and P. Scheuermann, “Dynamic Caching of Query Results for Decision Support Systems,” Proc. 11th International Conference on Statistical and Scientific Database Management, Cleveland, OH, July 1999, pp. 254-263.

 

66.     P. Zabback, I. Onyuksel, G. Weikum, and P. Scheuermann, “Database Reorganization in Parallel Disk Arrays with I/O Service Stealing,” IEEE Transactions on Data and Knowledge Engineering, Vol. 11, No. 4, September/October 1998, pp. 855-858.

 

67.    G. Tellez, and M. Sarrafzadeh, `Minimal Buffer Insertion in Clock Trees with Skew and Slew Rate Constraints,'' to appear in  IEEE Transactions on Computer-Aided Design.

 

68.     D. S. Chen, Gary Yeap, and M. Sarrafzadeh, ``State Encoding of Finite State Machines for Low Power Design'' to appear in VLSI Design.

 

69.     Wei-Liang Lin and M. Sarrafzadeh, `` On the Power of Re-Synthesis,'' To appear in SIAM Journal on Computing.

 

70.     Sara Nicoloso and M. Sarrafzadeh, `Sum Coloring Problem on Interval Graphs'', to appear in Algorithmica.

 

71.     Amir Farrahi, D. T. Lee and M. Sarrafzadeh, `Two-Way and Multi-Way Partitioning of a Set of Intervals for Clique-Width Maximization'' to appear in Algorithmica.

 

72.     Jun-Dong Cho, and S. Raje, and M. Sarrafzadeh, ` Fast Approximation Algorithms on Maxcut, k-coloring and k-color Ordering for VLSI Applications'', IEEE Transactions on Computers, Vol. 47, No. 11, November 1998, pp. 1253-1266.

 

73.     J. D. Cho and M. Sarrafzadeh, ``Mixed LP and Mincost Flow-based Four-bend Globar Routing in Two Dimensional Arrays'', IEEE Transactions on CAD, September 1998, pp. 793-802.

 

74.     Kia Bazargan, Sam Kim, and M. Sarrafzadeh, “Nostradamus: Floorplanner of Uncertain Designs'', IEEE Transactions on CAD, April 1999.

 

75.     Maogang Wang, Prith Banerjee, and M. Sarrafzadeh, ``Placement with Incomplete Data'', to appear in VLSI Design, Year 2000 special issue on Physical Design.

 

76.     Morgan Enos, Scott Hauck, and M. Sarrafzadeh, ``Logic Partitioning with Replication'', IEEE Transactions on CAD, to appear.

 

77.     Salil Raje and Majid Sarrafzadeh, ``Scheduling with Multiple Voltages'', Proceedings of 1999 International Symposium on Circuits and Systems (ISCAS), May 30 - June 2, 1999, Miami, Florida.

 

78.     Majid Sarrafzadeh and Toshihiko Takahashi, `A Fast Algorithm for Routability Testing'', Proceedings of 1999 International Symposium on Circuits and Systems (ISCAS), May 30 - June 2, 1999, Miami, Florida.

 

79.     Maogang Wang and Majid Sarrafzadeh, ``Congestion Minimization During Placement,” Proceedings of the 1999 International Symposium on Physical Design (ISPD 99), Monterey, CA.

 

80.     Abhi Ranjan, Kia Bazargan, and Majid Sarrafzadeh, "Floorplanning 1000 times Faster,” System Level Interconnect Prediction (SLIP 99).

 

81.     Majid Sarrafzadeh and Maogang Wang, ‘`Can Fast Algorithms Be Used as Good Predictors?''  System Level Interconnect Prediction (SLIP 99).

 

82.     Kia Bazargan, Ryan Kastner, and Majid Sarrafzadeh, “3-D Floorplanning: Simulated Annealing and Greedy Placement Methods for Reconfigurable Computing Systems'', Proc. 10th IEEE International Workshop on Rapid System Prototyping, June 16-18, 1999, Sheraton Sandkey, Clearwater, Florida, U.S.A .

 

83.     M. Sarrafzadeh, “Slack Equalization Algorithm:  Precise Slack Distribution for Low-level Synthesis and Optimization,” Workshop on Logic Synthesis (IWLS), Lake Tahoe, CA, June 1999.

 

84.     Kia Bazargan and Majid Sarrafzadeh, ``Fast Online Placement for Reconfigurable Computing Systems,'' FCCM: Symposium on Field Programmable Custom Computing Machines, April 21-23, Napa Valley, CA, 1999.

 

  1. M. Hribar, V.E. Taylor, D. E. Boyce, “Termination Detection for Parallel Shortest Path Algorithms,” Journal of Parallel and Distributed Computing, Vol. 55, pp. 153-165, 1998.

 

  1. Z. Ben Miled, J. A. B. Fortes, R. Eigenmann and V. Taylor, “A Simulation-based Cost-Efficiency Study of Hierarchical Heterogeneous Machines for Compiler and Hand Parallelized Applications,” International Journal of Parallel and Distributed Systems and Networks, Vol. 1:4, pp. 193-203, 1998.

 

87.     R. J. O. Figueiredo, A. B. Fortes, Z. B. Miled, V. Taylor, R. Eigenmann, “Impact of Computing-in-Memory on the Performance of Processor-and-Memory Hierarchies”, Proceedings of International Conference on Parallel and Distributed Computing Systems, September 1998.

 

88.     W. Smith, I. Foster and V. Taylor, “Predicting Application Run Times Using Historical Information,” 4th Workshop on Job Scheduling Strategies for Parallel Processing (in conjunction with IPPS 1998), April 1998.

 

89.     Z. B. Miled, J. A. B. Fortes, R. Eigenmann, V. Taylor, “On the Implementation of Broadcast, Scatter, and Gather in a Heterogeneous Architecture,” Hawaii International Conference on System Sciences, January 1998.

 

90.     J. Chen, V. Taylor, "ParaPART: Parallel Mesh Partitioning Tool for Distributed Systems", in IRREGULAR'99, Sixth International Workshop on Solving Irregularly Structured Problems in Parallel, in Conjunction with IEEE IPPS/SPDP'99 (13th International Parallel Processing Symposium), San Juan, Puerto Rico, April 1999.

 

91.    J. Chen, V. Taylor, "Mesh Partitioning for Distributed Systems: Exploring Optimal Number of Partitions with Local and Remote Communication", 1999 SIAM Conference on Parallel Processing.

 

92.     J. Chen, V. Taylor, "Mesh Partitioning for Distributed Systems", in Proceedings of Seventh IEEE International Symposium on High Performance Distributed Computing, Chicago, IL, July 1998.

 

93.    J. Geisler and V. Taylor, "Performance Coupling: A Methodology for Predicting Application Performance using Kernel Performance," 1999 SIAM Conference on Parallel Processing.

 

94.     W. Smith, V. Taylor and I. Foster, "Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance," 5th Workshop on Job Scheduling Strategies for Parallel Processing  with IEEE IPPS/SPDP'99 (13th International Parallel Processing Symposium), San Juan, Puerto Rico, April 1999.

 

95.     V. Taylor, J. Fortes and R. Eigenmann, "HPAM Petaflop Point Design: Identifying Critical Research Issues for Petaflop Machines," Proceeding of the Third PetaFlop Workshop (TPF-3), in Conjunction with the Frontier Conference on Massively Parallel Computing, February, 1999.

 

96.     J. Nocedal, L. T. Biegler, C. Schmidt, and D. Ternet, “Numerical Experience with a Reduced Hessian Method for Large-Scale Constrained Optimization,” Computational Optimization and Applications, 1999.

 

97.     J. Nocedal and N. I. M. Gould, “The Modified Absolute-Value Factorization Norm for Trust-Region Minimization,” in High Performance Algorithms and Software in Nonlinear Optimization, pp. 225-241, eds. R. De Leone, A. Murli, P. M. Pardalos, and G. Toraldo, Kluwer, 1998.

 

98.     J. Nocedal and Ya-xiang Yuan, “Combining Trust Region and Line Search Techniques,” in Advances in Nonlinear Programming, ed. Y. Yuan, Kluwer, 1998, pp. 153-175.

 

99.     J. Nocedal and R. Byrd, “Active Set and Interior Point Methods for Nonlinear Optimization,” Documenta Mathematica, Vol. III, Journal de Deutschen Mathematiker-Vereinigung (Proceedings of the International Congress of Mathematicians), 1998.

 

100.  J. Nocedal, “An Interior Point Method for Large-Scale Nonlinear Programming,” SIAM. Journal of Optimization, 9, 4, 1999, pp. 877-900.

 

101.  J. Nocedal and S. Wright, Numerical Optimization, Springer Verlag, 1999, 650 pp.

 


 

7. SPONSORED RESEARCH GRANTS

 

The Center for Parallel and Distributed Computing had a total research expenditure of about $1.8 million during 1998-99.  The details are provided below for the most active members of the Center, P. Banerjee. A. Choudhary, V. Taylor, A. Moshovos, S. Hauck, M. Sarafzadeh, P. Scheurmann, V. Taylor, and D. T. Lee.

 

 

 “Efficient Compilation Issues in Distributed Memory Multicomputers”, Prith Banerjee, National Science Foundation, 1996-99, 0830-350-F812,  $79,278.

 

“A MATLAB Compilation Environment for Adaptive Computing Systems,”   P. Banerjee, A. Choudhary, S. Hauck, N. Shenoy, DARPA, 1998-2001, 0650-350-D400, $603,104.

 

“A High-Performance Distributed Computing Infrastructure,” P. Banerjee, A. Choudhary, S. Hauck, D.T. Lee, M. Sarrafzadeh, P. Scheurmann, V. Taylor, NSF, 1997-2002, 0830-350-D600, $245,746.

 

“Architectures, Compilers,  and Configuration Management  for Mass-Market Adaptive Computing,” S. Hauck, P. Banerjee, M. Sarrafzadeh, DARPA, 1997-2000, 0650-350-F482, $630,501.

 

“Large High-Performance Data Management, Access and Storage Techniques for Tera-scale Scientific Applications,” A. Choudhary, P. Banerjee, V. Taylor, Department of Energy, ASCI Level 2, 1998-2001, 0980-350-D201, $241,005.

 

“Compiler and Run-time Optimization Techniques for Parallel Programming,” A. Choudhary, NSF, 1993-2000, 0830-350-F811, $34,484.

 

“System Software for Input-Output in Parallel Computers,” A. Choudhary, NSF, 1996-99, 0830-350-F813, $46,286.

 

“Compiler and Runtime Support,” A. Choudhary, Intel Corporation, 0970-350-F702, 1996-99, $13,656.

 

“Modeling and Evaluation of I/O Architecture  in Servers, Intel Corporation, 0970-350-F703, 1996-99, $9,405.

 

“Design, Development, Benchmarking and Evaluation of Parallel Applications,” A. Choudhary, Air Force Systems Command (Rome Labs), 0980-350-F711, 1996-99, $117,353.

 

“Scalable I/O Initiative,” A. Choudhary, Advanced Research Projects Agency (Subcontract from California Institute Technology, 0980-350-F713, 1996-99, $53,651.

 

“Hybrid Technology Multi-threaded Computer Architecture for Petaflops Computing,” A. Choudhary, Department of Energy (Argonne National Laboratory), 0980-350-F716, 1997-99, $36,054.

 

“Interoperable Data Files for High-Performance Computing,” A. Choudhary, NSF (University of Wyoming), 0980-350-F237, 1997-99, $50,099.

 

“Prophesy: A Hierarchical Tool for Modeling and Analyzing Parallel  Scientific Application,” V. Taylor, NASA/AMES, 0720-350-F403, 1998-2000, $ 67,454.

 

“Prophesy: A Performance Network for Developing a Computer Algebra,” V. Taylor, NSF, 0830-350-F823, 1999-2000, $0.

 

“Young Investigator: A Tool for the Improvement of Uniprocessor Performance,” V. Taylor, NSF, 0830-350-F688, 1993-2000, $74, 677.

 

“Integration of Computer Architecture and Parallel Programming Tools Into Computer Science and Engineering,” V. Taylor, Subcontract from Purdue (NSF grant), 1998-2001, 0980-350-F728, $3,750.

 

“Alliance Team B: Distributed Computing,” V. Taylor, Subcontract from NCSA (NSF grant), 0980-350-F727, 1998-99, $36,032.

 

“Dynamically Reconfigurable Hardware for Digital Signal Processing,” S. Hauck, Motorola, University Partnership in Research Program, 0970-350-F718, 1997-2001, $7105.

 

Mapping Time Oriented Tools for Logic Emulation, “ S. Hauck, M. Sarrafzadeh, NSF, 0830-350-F817, 1997-2000, $22,868

 

“Research Experiences for Undergraduates,” S. Hauck and M. Sarrafzadeh, NSF, 0830-350-F817, 1997-2000, $21,000.

 

“Algorithm Design for VLSI,” M. Sarrafzadeh, NSF, 0830-350-F614, 1995-2000, $43,516.

 

“Multimedia Processor Synthesis,” M. Sarrafzadeh , Motorola, 0970-350-F715, 1997-1999, $29,305.

 

“Application Specific Processor Synthesis (ASPROS),” M. Sarrafzadeh, Motorola Center, 0970-350-AF04, 1998-99, $78,638.

 

“SCALAR: Scalable Architecture for Replication on the Web Workstations,” P. Scheuermann, 0830-350-F822, 1999-2000, $0.

 

“Scalable and Adaptive File Management in Networks of Workstations,” P.  Scheuermann, NASA, 0730-350-F477, 1996-2000, $25,162.

 

“Putting Log Data to Work:  Mass Storage Information Systems,” P. Scheuermann, NASA, 0980-350-F721,    1998-1999, $59,515.

 

“Geometric Algorithm Design and Visualization,” D.T. Lee, NSF, 0830-350-F616, 1998-2001, 0830-350-F616, $49,588.

 

“Development of Distributed Visualization Tools,” D.T. Lee, ONR, Mathematical Science Division, 0650-350-F471, 1995-00, $48,437.

 

“Design of Assembly Operations for Recycling Disassembly,” D. T. Lee, NU-Motorola Center, 0970-350-AF07, 1999, $20,131.

 

“Nonlinear Programming:  Algorithms and Software Environments,” J. Nocedal, DOE, 0680-350-Z400, 1998-2001, $73,800.

 

“Nonlinear Programming:  Algorithms and Software Environments,” J. Nocedal, DOE, 0680-350-F440, 1987-1999, $49,900.

 

“Challenges in CISE:  Metcomputing Environments for Optimization,” J. Nocedal, NSF, 0830-350-Z601, 1997-2000, $509,000.

 

“Climate Modeling on Parallel Computers,” J. Nocedal, Argonne National Lab, 0980-350-Z201, 1996-1998, $7,700.


 

8. CPDC SEMINAR SERIES

 

Oct. 5, 1998: Pramod Joisha, ECE Graduate Student, “An Experimental Evaluation of a New HPF Framework based on a Commercial Symbolic Analysis Package.”

 

Oct. 12, 1998: Sachin More, ECE Graduate Student, “Workload Characterization and Task Scheduling for Tape Resident Data.”

 

Oct. 19, 1998: Prof. Nagaraj Shenoy, ECE, Northwestern Univ., “Automatic Parallelization for Distributed Memory Parallel Machines – Issues, Challenges and State of the Art.”

 

Oct. 26, 1998: Chutimet Srinilta, ECE Grad Student, “Techniques for Improving Performance in Continuous Media Server Design.”

 

Oct. 26, 1998: Harsha Nagesh, ECE Grad Student, “On the Randomness of Random Numbers.”

 

Oct. 30, 1998: Prof. Jesus Carretero, Universida Politecnica de Madrid, “MiPFS: A Multimedia Integrated Parallel File System.”

 

Nov. 6, 1998: Prof. Hans Zima, University of Vienna, “The ESPRIT Project HPF+.”

 

Nov. 13, 1998: Dr. Rakesh Krishnaiyer, Argonne National Lab, “Efficient Collective Data Transfer Mechanisms for High-Performance Parallel and Distributed Computing Environments.”

 

Nov. 23, 1998:  Dr. Warren Smith, Argonne National Lab, “Metacomputing, Scheduling and Prediction.”

 

Dec. 1, 1998: Prof. Francine Berman, UC San Diego, “Achieving Application Performance on Distributed Resources.”

 

Jan. 12, 1999: Jian Chen, ECE Grad Student, “Issues in Mesh Partitioning for Distributed Systems.”

 

Jan. 19, 1999: Prof. Andreas Moshovos, ECE, Northwestern University, “Memory Dependence Prediction and Future Directions.”

 

Jan. 26, 1999: Dhruva Chakrabarti, ECE Grad Student, “A Compilation Framework for Hybrid Applications.”

 

Feb. 2, 1999: Prof. Jennifer Schopf, CS, Northwestern Univ, “Prediction and Scheduling for Shared Clusters of Workstations.”

 

Feb. 9, 1999: Prof. Babak Falsafi, Purdue Univ, “Intelligent Distributed Memory.”

 

Feb. 16, 1999: Pramod Joisha, ECE Grad Student, “Efficiently Exploiting Ownership Sets in HPF.”

 

Feb. 23, 1999: Mahmut Kandemir, ECE Grad Student, “Improving Locality Using Loop and Data Transformations in a Unified Framework.”

 

March 2, 1999: Jonathan Geisler, ECE Grad Student, “Performance Coupling: A Methodology for Analyzing Application Performance using kernel Performance.”

 

March 29, 1999: Prof. Andreas Moshovos, Northwestern Univ, “Organizational Meeting.”

 

Apr. 5, 1999: Malay Haldar, ECE Grad Student, “Innovative Cache Architectures.”

 

Apr. 12, 1999: Yelena Yesha, NASA Goddard, “Challenges in Global Electronic Commerce.”

 

Apr. 19, 1999: Olaf Lubeck, Los Alamos National Lab, “Performance Models and Scalability Analysis of Wavefront Algorithms.”

 

Apr. 26, 1999: Xiaohui Shen, ECE Grad Student, “Prediction in Cache Protocols.”

 

May 10, 1999: Antonio Gonzalez, UPC, Spain, “Register File Architectures.”

 

May 17, 1999: Witod Litwin, Univ. Paris, “High Availability in Scalable Distributed Data Structures.”

 

May 24, 1999: Swee Mok, ECE Grad student, “Part Mating Strategies Using Genetic Assembly.”


 

9. HIGHLIGHTS OF THE CENTER DURING 1998-99

 

·         The Center for Parallel and Distributed Computing completed its third year on Aug. 31, 1999.  It currently has 1 Director, 1 Assistant, 11 faculty members from Northwestern, 2 members from Argonne, 45 graduate students.

·         Researchers at the Center received a continuation grant for a $1.8 million grant “A MATLAB Compilation Environment for Adaptive Computing” from DARPA.

·         Researchers at the Center received a continuation grant for a $1.million grant “Architectures, Compilers, and Configuration Management for Mass Market Adaptive Computing” from DARPA.

·         Researchers at the Center received a continuation grant for a $906,000 grant “PANTHER: A High-Performance Distributed Computing Infrastructure” from the National Science Foundation.

·         Researchers at the Center (Choudhary, Banerjee, Taylor)  received a new grant for $800,000 on “Large High-Performance Data Management, from Department of Defense ASCI Level 2 program.