|
DATE: Monday Oct. 5
TIME/PLACE: 4-5PM, L324
SPEAKER: Pramod Joisha
Abstract:
In this talk, we present sample performance figures for a new linear
algebra-based compilation framework implemented in a research HPF
compiler called PARADIGM. The metrics considered include compilation
times, execution times, and communication costs. We compare all of
these metrics against commercial, industrial strength compilers such as
pghpf (v 2.2) and xlhpf (v 1.01) and show the superior benefits of
PARADIGM (v 2.0) in all of the metrics used. We also demonstrate how
robustly our framework performs in the presence of arbitrary alignments
and distributions. The framework's symbolic manipulation capability is
derived from an off-the-shelf commercial symbolic analysis software
called Mathematica. Metrics have been measured for a few popular
benchmarks such as Jacobi, Automatic Differentiation and Integration
(ADI), Euler Fluxes, Matrix Multiplication, TOMCATV and 2-D Explicit
Hydrodynamics (EXPL).
Ultimately, we show that our system enables the compilation of input
programs into SPMD versions that are better - both in terms of
execution times and communication costs - than that of the parallel
codes generated by pghpf. We present experimentally evaluated data on
these benchmarks which reveals for instance, that our framework
compiles input programs into SPMD codes that exhibit speedups on the
IBM SP2 that are about 3 times more on an average than that obtainable
with codes directly compiled (using the -Mautopar option alone) with
pghpf. Simultaneously, the communication costs incurred by the codes
generated by our framework are nearly always lesser than that incurred
by codes compiled with pghpf, often displaying savings of about two
orders of magnitude. Even in the presence of the FORTRAN 90 FORALL
construct and HPF's INDEPENDENT directive, code generated using pghpf
is overall about 90% slower and far more expensive communication-wise,
than that generated using PARADIGM. All of these feats were
accomplishable without seriously impacting the compilation times, which
on the whole were in fact - as we also demonstrate - comparable to that
of pghpf.
With respect to xlhpf, comparisons with the PARADIGM system were more
restrained and limited, primarily on account of the former's inability
to handle the general CYCLIC(k) distribution. However, for benchmark
samples that were successfully compiled by xlhpf into runnable codes
and that essentially comprised of the BLOCK distribution alone, the
performance differences in relation to PARADIGM were much more narrower
than those observed in the case of pghpf. For example, though the
communication expenses of codes compiled by PARADIGM were consistently
lesser than those of xlhpf, the margin was considerably lesser,
sometimes being as less as 7 bytes per every send operation per
processor pair on an average.
In the past few years, much research has been devoted to the problem
of parallelizing sequential programs written in languages such as
FORTRAN, with the objective of casting the input code into an
equivalent and efficient parallel version with little or no manual
intervention. Recently, many HPF compilers have emerged in the market,
from such diverse vendors as APR, DEC, PGI, and IBM. Some - such as
xlhpf - provide limited support only, in terms of the distributions
that they can handle, while others - such as pghpf - claim to be
capable of generating code for a more general class of user-specified
data mappings.
DATE: Monday Oct. 12
TIME: 4-5PM
SPEAKER: Sachin More
Abstract:
The Earth Observation System (EOS) satellites will soon be producing 0.6
terabytes of raw data per day which will processed by over 30,000 jobs per day.
The enormous amount of computing resources needed necessitate sophisticated
resource scheduling for efficient data processing. In order to design resource
scheduling policy, a good understanding of the workload characteristics is
needed. The first part of the talk will present statistical analysis of the EOS
workload. Also to test the effectiveness of a given scheduling policy, we built
a task scheduling simulator. The second half of the talk will focus on the
design of the simulator. We also present two new scheduling algorithms and
compare them with conventional scheduling algorithms using the simulator.
DATE: Monday Oct. 19
TIME/PLACE: 4-5PM, L324
SPEAKER: Dr. Nagaraj Shenoy
Abstract:
Distributed memory parallel machines offer a cost effective alternative to
technology intensive and expensive 'super computers' of yester years.
However, difficulties faced in programming these machines has always been
an obstacle in their wide spread usage. For the past half a decade several
researchers have looked into various issues attempting to change this
scenario. But the challenges still remain and the progress has not been
to the extent of what an user of these machines would have expected.
In this brief talk I will touch upon various issues and challenges in the
area of automatic parallelization for this class of machines. I will also
try to provide pointers to some of the work done in this area by various
researchers.
DATE: Monday Oct. 26
TIME/PLACE: 4-5PM, L324
SPEAKER: Chutimet Srinilta
Abstract:
A multimedia server is a key component of a distributed multimedia information system. Continuous media refer to the media whose record and playback must be carried out continuously. This dissertation focuses on the design and implementation of a server that provides playback service of continuous media objects. Major design and implementation issues arise in the input-output area. Parallelism of data retrieval is achieved by striping the objects across multiple storage devices and accessing them in parallel. Batch retrieval allows many users to have access to different parts of the same copy of the object simultaneously. A number of techniques such as adaptive refilling, deadline shifting, multi-pool interval caching and user request assignment schemes were designed and implemented within the framework of the server model. Adaptive refilling helps prevent situations where a session buffer either overflows or underflows. Deadline shifting improves the total amount of data successfully delivered to users. Multi-pool interval caching makes use of data sharing between user requests to minimize the number of physical accesses to storage devices and interaction between the server nodes. A number of user request assignment algorithms were introduced. Extensive experiments were conducted. Server performance was improved by a significant margin when a proper combination of parameters and their values were employed.
DATE: Monday Oct. 26
TIME/PLACE: 4-5PM, L324
SPEAKER: Harsha S. Nagesh
Abstract:
Random Numbers are the nuts and bolts of a Monte Carlo Simulation.
Problems like Capacitance Extraction in VLSI circuits, computing
quantum-mechanical energies and wavefunctions, etc are very hard to
solve by analytical techniques. The willingness to accept approximate
solutions is one door for probability to occur and Monte Carlo is one
such approach. The key point is that one pours in randomness
that was not there to begin with. What was purchased at the price of
this extra complication was the applicability of probability theory
that enabled us to quantify the quality of our solution.
The crux of the Monte Carlo simulation lies in the random
number generation. The better the random numbers are, the better is
the quality of the solution we get. Hence it is of atmost importance
that we select those random number generators which have statistical
properties which closely match the requirements of our simulation.
In my talk I shall present the implementation of a very
efficient technique, Alias Method , for the generation of Pseudo Random
Numbers based on any arbitrary probability density function. I shall
further present some techniques of generating random numbers and
testing the quality of the numbers generated by these generators. I
shall conclude my talk by presenting techniques for testing Parallel
Random Number Generators.
DATE: Friday, Oct 30
TIME/PLACE: 3PM, E311 (Tech)
SPEAKER:
Prof. Jesus Carretero
Universidad Politecnica de Madrid (UPM)
Abstract:
The presentation will show the design of MiPFS, a parallel file system
intended to be used as a low-level platform to develop more complex I/O
entities on top of it. MiPFS is intended to provide multiple services,
like high-performance I/O, multimedia, or standard functionality,
everything with a reduced but very powerful user interface.The proposed
interface includes basic I/O operations with scatter-gather addressing,
file control operations, hints specification, file indexing, and quality
of service negotiation.
This parallel file system relies on the idea that the user should execute
I/O operations using the data-types stored on each object, so that each
file can be managed as an typed object. However, as we want to provide
only mechanism, and not policies, MiPFS includes functions to manage
fixed and variable length records, and their associated indexes. This
approach, and the quality of service functionality, allows to use MiPFS as
a continuous media parallel file system.
DATE: Friday, Nov. 06
TIME/PLACE: 11AM-12PM, A230 Tech (Civil Engineering Conference Room)
(Note room change)
SPEAKER: Prof. Hans P. Zima
Institute for Software Technology and Parallel Systems
University of Vienna, Liechtensteinstrasse22,
A-1090 Vienna, Austria
Abstract:
High Performance Fortran (HPF) is a data-parallel language that was
designed to provide users with a high-level interface for programming
scientific applications, while delegating to the compiler the task of
generating an explicitly parallel message-passing program. In this
lecture, we will outline developments that led to HPF and shortly
explain its basic features. Following this, we identify a set of
important language requirements for the efficient solution of a range
of advanced applications in science and engineering. Dealing with
such problems does not only need the flexible data and work
distribution features included in the HPF-2 Approved Extensions, but
also requires additional capabilities such as the explicit control of
communication schedules.
Most of the discussion will be related to research and development
conducted in the ESPRIT Long Term Research Project HPF+ that was
funded by the European Union and successfully finished last April.
The consortium of this project, which was led by the University of
Vienna, included academic partners responsible for language, compiler
and tool development as well as industrial partners whose task was the
evaluation of the software development in the project, using
benchmarks derived from their commercial application codes. The major
results of the project included a revised language definition and the
Vienna Fortran Compiler (VFC), a new research compiler focussing on
the efficient handling of irregular problems. The lecture concludes
with an outlook to future research and development in HPF-related
languages and compilers.
All Technical Reports produced by the HPF+ project are publicly
available and can be retrieved from the website
http://www.par.univie.ac.at/hpf+
DATE: Friday, Nov. 13
TIME/PLACE: 11AM-12PM, A230 (Tech.)
SPEAKER: Dr. Rakesh Krishnaiyer
Abstract:
Advances in computing and networking infrastructure have enabled an
increasing number of applications to utilize resources that are
geographically distributed. High-performance distributed computing
environments present significant new problems and opportunities for
programming language designers, compiler implementors, and runtime
system developers. This talk describes how efficient collective
data-transfer mechanisms are employed in two contexts to achieve high
performance in application programs. The techniques are designed to
provide efficiency, portability, and ease of programming.
Many scientific applications need to access data resources from
multiple locations in a distributed computing environment. In this
talk, I will present the design and implementation of a system called
RIO that provides remote I/O functionality for parallel
programs. Programs use familiar parallel I/O interfaces to access
remote file systems. The RIO library offers significant performance,
flexibility, and convenience advantages over existing techniques for
remote data access such as manual staging and deployment of
distributed file systems. Mechanisms provided by the Globus
Metacomputing Toolkit are used to support the integration of the
library in a distributed environment.
Programming models for distributed systems will need to exploit all
types of parallelism within application domains. We propose a
programming model where common parallel program structures can be
represented, with only minor extensions to the HPF model, by using a
coordination library based on the Message Passing Interface (MPI).
This library allows data-parallel tasks to exchange distributed data
structures using calls to simple communication functions. Experiments
with microbenchmarks and applications characterize the performance of
this library and quantify the impact of various optimizations.
DATE: Monday Nov. 23
TIME/PLACE: 4-5PM, L324
SPEAKER: Warren Smith
Abstract:
There are a large number of applications that wish to use remote high-performance computers and/or make use of other unique devices (data storage systems, ground stations, scientific instruments, etc.) and can therefore be considered metacomputing applications. In this talk I will briefly describe several such applications and discuss how the Globus toolkit supports these applications. Further, I will describe the issues involved in selecting resources from the large resource pool and present methods for scheduling access to these resources. Predictions of properties such as application run times and queue wait times are useful in this process. I will present our approach to providing these predictions and evaluate their accuracy and usefulness.
DATE: Tuesday Dec. 01
TIME/PLACE: 11AM-12PM, A230 (Note room change)
SPEAKER: Prof. Francine Berman
University of California, San Diego.
Abstract:
Applications targeted to clustered resources and the computational
grid do so to achieve a new level of performance. This performance
may be measured as the ability to solve larger problem instances,
reduction in execution or wall clock time, reduction in turnaround
time (wait time + execution time), etc. By whatever criteria it is
measured, performance is the goal for applications targeted to
distributed environments, and the scheduling and execution of
such applications must involve techniques which allow the application
to optimize its performance potential in dynamic and distributed
environments.
AppLeS (Application-Level Scheduler) is a project whose goal
it is to develop and deploy custom performance-oriented application
schedulers for individual grid and cluster applications, as well as
scheduling tools for common classes of grid and cluster applications.
In this talk, we give a progress report on the AppLeS project.
We focus on the principles underlying AppLeS schedulers, current
AppLeS projects, and challenges which must be addressed in order
to achieve application performance in clustered and grid environments.