Seminars
Fall Quarter 1998-99

Every Monday, 4pm - 5pm, L324
or Friday, 11am - 12 pm





Seminar Abstracts


Title:

An Experimental Evaluation of a New HPF Compiler Framework based on a Commercial Symbolic Analysis Package.

DATE: Monday Oct. 5
TIME/PLACE: 4-5PM, L324
SPEAKER: Pramod Joisha

Abstract:
In the past few years, much research has been devoted to the problem of parallelizing sequential programs written in languages such as FORTRAN, with the objective of casting the input code into an equivalent and efficient parallel version with little or no manual intervention. Recently, many HPF compilers have emerged in the market, from such diverse vendors as APR, DEC, PGI, and IBM. Some - such as xlhpf - provide limited support only, in terms of the distributions that they can handle, while others - such as pghpf - claim to be capable of generating code for a more general class of user-specified data mappings.

In this talk, we present sample performance figures for a new linear algebra-based compilation framework implemented in a research HPF compiler called PARADIGM. The metrics considered include compilation times, execution times, and communication costs. We compare all of these metrics against commercial, industrial strength compilers such as pghpf (v 2.2) and xlhpf (v 1.01) and show the superior benefits of PARADIGM (v 2.0) in all of the metrics used. We also demonstrate how robustly our framework performs in the presence of arbitrary alignments and distributions. The framework's symbolic manipulation capability is derived from an off-the-shelf commercial symbolic analysis software called Mathematica. Metrics have been measured for a few popular benchmarks such as Jacobi, Automatic Differentiation and Integration (ADI), Euler Fluxes, Matrix Multiplication, TOMCATV and 2-D Explicit Hydrodynamics (EXPL).

Ultimately, we show that our system enables the compilation of input programs into SPMD versions that are better - both in terms of execution times and communication costs - than that of the parallel codes generated by pghpf. We present experimentally evaluated data on these benchmarks which reveals for instance, that our framework compiles input programs into SPMD codes that exhibit speedups on the IBM SP2 that are about 3 times more on an average than that obtainable with codes directly compiled (using the -Mautopar option alone) with pghpf. Simultaneously, the communication costs incurred by the codes generated by our framework are nearly always lesser than that incurred by codes compiled with pghpf, often displaying savings of about two orders of magnitude. Even in the presence of the FORTRAN 90 FORALL construct and HPF's INDEPENDENT directive, code generated using pghpf is overall about 90% slower and far more expensive communication-wise, than that generated using PARADIGM. All of these feats were accomplishable without seriously impacting the compilation times, which on the whole were in fact - as we also demonstrate - comparable to that of pghpf.

With respect to xlhpf, comparisons with the PARADIGM system were more restrained and limited, primarily on account of the former's inability to handle the general CYCLIC(k) distribution. However, for benchmark samples that were successfully compiled by xlhpf into runnable codes and that essentially comprised of the BLOCK distribution alone, the performance differences in relation to PARADIGM were much more narrower than those observed in the case of pghpf. For example, though the communication expenses of codes compiled by PARADIGM were consistently lesser than those of xlhpf, the margin was considerably lesser, sometimes being as less as 7 bytes per every send operation per processor pair on an average.


Title:

Workload Characterization and Task Scheduling for Tape-Resident Data

DATE: Monday Oct. 12
TIME: 4-5PM
SPEAKER: Sachin More

Abstract:

The Earth Observation System (EOS) satellites will soon be producing 0.6 terabytes of raw data per day which will processed by over 30,000 jobs per day. The enormous amount of computing resources needed necessitate sophisticated resource scheduling for efficient data processing. In order to design resource scheduling policy, a good understanding of the workload characteristics is needed. The first part of the talk will present statistical analysis of the EOS workload. Also to test the effectiveness of a given scheduling policy, we built a task scheduling simulator. The second half of the talk will focus on the design of the simulator. We also present two new scheduling algorithms and compare them with conventional scheduling algorithms using the simulator.



Title:

Automatic Parallelization for Distributed Memory Parallel Machines - Issues, Challenges and State-of-the-art.

DATE: Monday Oct. 19
TIME/PLACE: 4-5PM, L324
SPEAKER: Dr. Nagaraj Shenoy

Abstract:

Distributed memory parallel machines offer a cost effective alternative to technology intensive and expensive 'super computers' of yester years. However, difficulties faced in programming these machines has always been an obstacle in their wide spread usage. For the past half a decade several researchers have looked into various issues attempting to change this scenario. But the challenges still remain and the progress has not been to the extent of what an user of these machines would have expected.

In this brief talk I will touch upon various issues and challenges in the area of automatic parallelization for this class of machines. I will also try to provide pointers to some of the work done in this area by various researchers.



Title:

Techniques for Improving Performance in Continuous Media Server Design

DATE: Monday Oct. 26
TIME/PLACE: 4-5PM, L324
SPEAKER: Chutimet Srinilta

Abstract:

A multimedia server is a key component of a distributed multimedia information system. Continuous media refer to the media whose record and playback must be carried out continuously. This dissertation focuses on the design and implementation of a server that provides playback service of continuous media objects. Major design and implementation issues arise in the input-output area. Parallelism of data retrieval is achieved by striping the objects across multiple storage devices and accessing them in parallel. Batch retrieval allows many users to have access to different parts of the same copy of the object simultaneously. A number of techniques such as adaptive refilling, deadline shifting, multi-pool interval caching and user request assignment schemes were designed and implemented within the framework of the server model. Adaptive refilling helps prevent situations where a session buffer either overflows or underflows. Deadline shifting improves the total amount of data successfully delivered to users. Multi-pool interval caching makes use of data sharing between user requests to minimize the number of physical accesses to storage devices and interaction between the server nodes. A number of user request assignment algorithms were introduced. Extensive experiments were conducted. Server performance was improved by a significant margin when a proper combination of parameters and their values were employed.



Title: On the randomness of Random Numbers

DATE: Monday Oct. 26
TIME/PLACE: 4-5PM, L324
SPEAKER: Harsha S. Nagesh

Abstract: Random Numbers are the nuts and bolts of a Monte Carlo Simulation. Problems like Capacitance Extraction in VLSI circuits, computing quantum-mechanical energies and wavefunctions, etc are very hard to solve by analytical techniques. The willingness to accept approximate solutions is one door for probability to occur and Monte Carlo is one such approach. The key point is that one pours in randomness that was not there to begin with. What was purchased at the price of this extra complication was the applicability of probability theory that enabled us to quantify the quality of our solution.

The crux of the Monte Carlo simulation lies in the random number generation. The better the random numbers are, the better is the quality of the solution we get. Hence it is of atmost importance that we select those random number generators which have statistical properties which closely match the requirements of our simulation.

In my talk I shall present the implementation of a very efficient technique, Alias Method , for the generation of Pseudo Random Numbers based on any arbitrary probability density function. I shall further present some techniques of generating random numbers and testing the quality of the numbers generated by these generators. I shall conclude my talk by presenting techniques for testing Parallel Random Number Generators.


Special Seminar


Title: MiPFS: A Multimedia Integrated Parallel File System

DATE: Friday, Oct 30
TIME/PLACE: 3PM, E311 (Tech)
SPEAKER: Prof. Jesus Carretero
Universidad Politecnica de Madrid (UPM)

Abstract:

The presentation will show the design of MiPFS, a parallel file system intended to be used as a low-level platform to develop more complex I/O entities on top of it. MiPFS is intended to provide multiple services, like high-performance I/O, multimedia, or standard functionality, everything with a reduced but very powerful user interface.The proposed interface includes basic I/O operations with scatter-gather addressing, file control operations, hints specification, file indexing, and quality of service negotiation.

This parallel file system relies on the idea that the user should execute I/O operations using the data-types stored on each object, so that each file can be managed as an typed object. However, as we want to provide only mechanism, and not policies, MiPFS includes functions to manage fixed and variable length records, and their associated indexes. This approach, and the quality of service functionality, allows to use MiPFS as a continuous media parallel file system.


Special Seminar


Title: The Esprit Project HPF+

DATE: Friday, Nov. 06
TIME/PLACE: 11AM-12PM, A230 Tech (Civil Engineering Conference Room)
(Note room change)
SPEAKER: Prof. Hans P. Zima
Institute for Software Technology and Parallel Systems
University of Vienna, Liechtensteinstrasse22,
A-1090 Vienna, Austria

Abstract:

High Performance Fortran (HPF) is a data-parallel language that was designed to provide users with a high-level interface for programming scientific applications, while delegating to the compiler the task of generating an explicitly parallel message-passing program. In this lecture, we will outline developments that led to HPF and shortly explain its basic features. Following this, we identify a set of important language requirements for the efficient solution of a range of advanced applications in science and engineering. Dealing with such problems does not only need the flexible data and work distribution features included in the HPF-2 Approved Extensions, but also requires additional capabilities such as the explicit control of communication schedules.

Most of the discussion will be related to research and development conducted in the ESPRIT Long Term Research Project HPF+ that was funded by the European Union and successfully finished last April. The consortium of this project, which was led by the University of Vienna, included academic partners responsible for language, compiler and tool development as well as industrial partners whose task was the evaluation of the software development in the project, using benchmarks derived from their commercial application codes. The major results of the project included a revised language definition and the Vienna Fortran Compiler (VFC), a new research compiler focussing on the efficient handling of irregular problems. The lecture concludes with an outlook to future research and development in HPF-related languages and compilers.

All Technical Reports produced by the HPF+ project are publicly available and can be retrieved from the website http://www.par.univie.ac.at/hpf+



Title:

Efficient Collective Data-Transfer Mechanisms for High-Performance Parallel and Distributed Computing Environments.

DATE: Friday, Nov. 13
TIME/PLACE: 11AM-12PM, A230 (Tech.)
SPEAKER: Dr. Rakesh Krishnaiyer

Abstract:

Advances in computing and networking infrastructure have enabled an increasing number of applications to utilize resources that are geographically distributed. High-performance distributed computing environments present significant new problems and opportunities for programming language designers, compiler implementors, and runtime system developers. This talk describes how efficient collective data-transfer mechanisms are employed in two contexts to achieve high performance in application programs. The techniques are designed to provide efficiency, portability, and ease of programming.

Many scientific applications need to access data resources from multiple locations in a distributed computing environment. In this talk, I will present the design and implementation of a system called RIO that provides remote I/O functionality for parallel programs. Programs use familiar parallel I/O interfaces to access remote file systems. The RIO library offers significant performance, flexibility, and convenience advantages over existing techniques for remote data access such as manual staging and deployment of distributed file systems. Mechanisms provided by the Globus Metacomputing Toolkit are used to support the integration of the library in a distributed environment.

Programming models for distributed systems will need to exploit all types of parallelism within application domains. We propose a programming model where common parallel program structures can be represented, with only minor extensions to the HPF model, by using a coordination library based on the Message Passing Interface (MPI). This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. Experiments with microbenchmarks and applications characterize the performance of this library and quantify the impact of various optimizations.



Title:

Metacomputing, Scheduling, and Prediction

DATE: Monday Nov. 23
TIME/PLACE: 4-5PM, L324
SPEAKER: Warren Smith

Abstract:

There are a large number of applications that wish to use remote high-performance computers and/or make use of other unique devices (data storage systems, ground stations, scientific instruments, etc.) and can therefore be considered metacomputing applications. In this talk I will briefly describe several such applications and discuss how the Globus toolkit supports these applications. Further, I will describe the issues involved in selecting resources from the large resource pool and present methods for scheduling access to these resources. Predictions of properties such as application run times and queue wait times are useful in this process. I will present our approach to providing these predictions and evaluate their accuracy and usefulness.



Special Seminar

Title:

Achieving Application Performance On Distributed Resources

DATE: Tuesday Dec. 01
TIME/PLACE: 11AM-12PM, A230 (Note room change)
SPEAKER: Prof. Francine Berman
University of California, San Diego.

Abstract:

Applications targeted to clustered resources and the computational grid do so to achieve a new level of performance. This performance may be measured as the ability to solve larger problem instances, reduction in execution or wall clock time, reduction in turnaround time (wait time + execution time), etc. By whatever criteria it is measured, performance is the goal for applications targeted to distributed environments, and the scheduling and execution of such applications must involve techniques which allow the application to optimize its performance potential in dynamic and distributed environments.

AppLeS (Application-Level Scheduler) is a project whose goal it is to develop and deploy custom performance-oriented application schedulers for individual grid and cluster applications, as well as scheduling tools for common classes of grid and cluster applications. In this talk, we give a progress report on the AppLeS project. We focus on the principles underlying AppLeS schedulers, current AppLeS projects, and challenges which must be addressed in order to achieve application performance in clustered and grid environments.