Giving a Demo for High-Performance Meta-Data Management System (HP-MDMS)
Slides for Super Computing 1999
Objectives:
To provide a demonstration of using the High-Performance Meta-Data Management
System.
The demonstration consists of two parts: The first one is a graphic user
interface (GUI) designed for users to run applications and visualization tools
through interacting with HP-MDMS.
The second one describes the concepts of HP-MDMS internal implementations.
0. Screen Dump
of Current GUI Implementation (April 18, 2000)
1. On-line Demo using JAVA GUI
A graphic JAVA application will be the main body of this demo.
It reads users' inputs from the interactive interface (e.g. button, text form,
etc.) and a sequence of subroutines will be executed according to these
inputs, in the meanwhile, to update or retrieve proper information from the
data base.
The information stored in the data base can be used in the later parallel
applications or visualization tools that involve I/O operations.
This graphic JAVA application contains an interface to execute parallel
applications and an interface to run visualization tools.
- JAVA graphic user interface --
It contains a list of text forms
or buttons for users to provide inputs of a parallel application name,
the machine name to run the application, run time parameters for the
application, and others.
These users' inputs will be used to create, update, or retrieve
information (table attributes) in the data base.
JDBC
is used to achieve the goal that this JAVA interface
invokes the corresponding subroutines to interact with data base.
Parallel applications can also be launched on a parallel machine
(e.g. SP2 at NWU) through this interface.
The parallel application is a stand alone execution code accessible by
the JAVA interface.
The results of execution outputs are shown in an output window.
Image data can be displayed through this interface by invoking a
visualization tool and by inquiring the files that contain the image data.
The retrieval of image data is achieved by consulting with the data base
to obtained the physical file path name.
The visualization tool is also a stand alone application which can be
written in C++ with VTK,
Tcl/Tk, JAVA, or any other tools.
- Parallel applications --
Users execute parallel applications through the JAVA GUI.
These applications can be I/O intensive that perform operations of
reading/writing data by consulting with the data base through using the
MDMS library.
By using MDMS utility, an application can update or obtain the meta data
from the data base and then perform I/O accordingly.
In the mean time, the actions of I/O operations will be recorded in data
base for the later use by other parallel or visualization application.
- Data Analysis --
A JAVA GUI sub-window displays contents of data sets for analysis
purpose.
Since parallel applications write their data arrays into files, a data
analysis program is used to find the maximum, minimum, and average
values of these data arrays.
This window is useful for users to choose interesting range of data
and display in the visualization sub-window.
- Visualization -- Stand alone visualization applications can be
launched from JAVA GUI.
The image data resulted from other parallel applications are read from
files also through the MDMS library.
The visualization applications can be written in JAVA,
VTK,
Tcl/Tk, or
other tools.
(For example, Mike's astroiso using Mesa and VTK reads data sets resulted
from Astro 3D parallel application and displays in a separate window.)
MDMS library here is also playing a role to provide the information of
data sets generated by other parallel applications for visualization
applications to retrieve the desired data.
The visualization applications need not know the physical location of
data sets to be displayed, since the HP-MDMS provides all information
of data stored in the system.
In addition, the display parameters for the visualization should be
provided from the users through JAVA GUI.
Figure 1. A possible outlook for the JAVA GUI part of the demonstration.
A possible outlook for the JAVA GUI part of the demonstration is illustrated in
Figure 1.
The application name (for both parallel and visualization) can be a pull-down
menu which displays all applications which is accessible by the JAVA GUI.
For new applications, the run-time parameters should be able to generate new
schema for users to enter both the attribute name and attribute types.
The actual implemented JAVA interface will replace this figure in the future.
2. Internal implementation:
The overall data management of the HP-MDMS architecture is shown in Figure 2.
The design and implementation focus on the Meta-Data Management System
(MDMS), High-Performance Storage Management System (HPSMS), and the Java
interface.
Figure 2. Architecture of the High Performance Meta-Data Management System.
- MDMS -- Its programming interface
that connects between parallel applications and the data base provides
a uniform access to diverse resources in a heterogeneous storage system.
The communication between applications and data base is implemented by
using the C programming interface of
PostgeSQL
An example of programming flow is provided to
represent the typical I/O operations in a parallel application.
Since all application's I/O actions were recorded by the MDMS, data
retrieval by different applications can be done by inquiring meta-data
from MDMS.
Users only have to provide the name of the source application that
produces the data set in the previous run and the meaningful data
set names.
In addition, the optimization of I/O operations is also achieved by
comparing the memory access pattern and file storage pattern to decide
whether using collective or non-collective
MPI I/O facility.
- HPSMS -- HPSMS allows efficient access to the data stored on
secondary storage devices.
We have implemented a library using C programming language, whose
routines can be accessed by either the programmers or the HP-MDMS system.
This library allows the user to see the data as an n-dimensional matrix.
Then accessing the data is performed by giving the start and end
coordinates of the data in the matrix.
To enhance the performance of the data accesses, we employ a sub-filing
strategy, in which a large multi-dimensional tape-resident global array is
stored not as a single file but as a number of smaller sub-files, whose
existence is transparent to the user.
This strategy dramatically improves the response times most of the I/O
calls.
The metadata about the subfiles is kept in the Postgres95 database.
We have implemented the library on top of
HPSS and MPI I/O.
SRB is used to access the files
in HPSS.
- JAVA interface -- JDBC
is used to connect the data base and retrieve and update the meta-data
stored in the data base.
The typical operations implemented through JDBC are creating tables,
inserting attribute values into the tables, and inquiring attribute values
from tables.
This JAVA interface contains the names of all accessible parallel
applications that perform I/O operations through MDMS.
Therefore, these external parallel applications can be launched from the
JAVA GUI.
When different application is chosen, the corresponding run-time parameter
input list is also changed with respect to the application.
Last modified on Sep. 29, 1999
Please send comments to
wkliao@ece.nwu.edu.