Giving a Demo for High-Performance Meta-Data Management System (HP-MDMS)


Slides for Super Computing 1999

Objectives:

To provide a demonstration of using the High-Performance Meta-Data Management System. The demonstration consists of two parts: The first one is a graphic user interface (GUI) designed for users to run applications and visualization tools through interacting with HP-MDMS. The second one describes the concepts of HP-MDMS internal implementations.

0. Screen Dump of Current GUI Implementation (April 18, 2000)

1. On-line Demo using JAVA GUI

A graphic JAVA application will be the main body of this demo. It reads users' inputs from the interactive interface (e.g. button, text form, etc.) and a sequence of subroutines will be executed according to these inputs, in the meanwhile, to update or retrieve proper information from the data base. The information stored in the data base can be used in the later parallel applications or visualization tools that involve I/O operations.

This graphic JAVA application contains an interface to execute parallel applications and an interface to run visualization tools.

  1. JAVA graphic user interface -- It contains a list of text forms or buttons for users to provide inputs of a parallel application name, the machine name to run the application, run time parameters for the application, and others. These users' inputs will be used to create, update, or retrieve information (table attributes) in the data base. JDBC is used to achieve the goal that this JAVA interface invokes the corresponding subroutines to interact with data base.

    Parallel applications can also be launched on a parallel machine (e.g. SP2 at NWU) through this interface. The parallel application is a stand alone execution code accessible by the JAVA interface. The results of execution outputs are shown in an output window.

    Image data can be displayed through this interface by invoking a visualization tool and by inquiring the files that contain the image data. The retrieval of image data is achieved by consulting with the data base to obtained the physical file path name. The visualization tool is also a stand alone application which can be written in C++ with VTK, Tcl/Tk, JAVA, or any other tools.

  2. Parallel applications -- Users execute parallel applications through the JAVA GUI. These applications can be I/O intensive that perform operations of reading/writing data by consulting with the data base through using the MDMS library. By using MDMS utility, an application can update or obtain the meta data from the data base and then perform I/O accordingly. In the mean time, the actions of I/O operations will be recorded in data base for the later use by other parallel or visualization application.
  3. Data Analysis -- A JAVA GUI sub-window displays contents of data sets for analysis purpose. Since parallel applications write their data arrays into files, a data analysis program is used to find the maximum, minimum, and average values of these data arrays. This window is useful for users to choose interesting range of data and display in the visualization sub-window.
  4. Visualization -- Stand alone visualization applications can be launched from JAVA GUI. The image data resulted from other parallel applications are read from files also through the MDMS library. The visualization applications can be written in JAVA, VTK, Tcl/Tk, or other tools. (For example, Mike's astroiso using Mesa and VTK reads data sets resulted from Astro 3D parallel application and displays in a separate window.)

    MDMS library here is also playing a role to provide the information of data sets generated by other parallel applications for visualization applications to retrieve the desired data. The visualization applications need not know the physical location of data sets to be displayed, since the HP-MDMS provides all information of data stored in the system. In addition, the display parameters for the visualization should be provided from the users through JAVA GUI.

Possible GUI demo outlook
Figure 1. A possible outlook for the JAVA GUI part of the demonstration.

A possible outlook for the JAVA GUI part of the demonstration is illustrated in Figure 1. The application name (for both parallel and visualization) can be a pull-down menu which displays all applications which is accessible by the JAVA GUI. For new applications, the run-time parameters should be able to generate new schema for users to enter both the attribute name and attribute types. The actual implemented JAVA interface will replace this figure in the future.

2. Internal implementation:

The overall data management of the HP-MDMS architecture is shown in Figure 2. The design and implementation focus on the Meta-Data Management System (MDMS), High-Performance Storage Management System (HPSMS), and the Java interface.
the demo architecture
Figure 2. Architecture of the High Performance Meta-Data Management System.

  1. MDMS -- Its programming interface that connects between parallel applications and the data base provides a uniform access to diverse resources in a heterogeneous storage system. The communication between applications and data base is implemented by using the C programming interface of PostgeSQL An example of programming flow is provided to represent the typical I/O operations in a parallel application.

    Since all application's I/O actions were recorded by the MDMS, data retrieval by different applications can be done by inquiring meta-data from MDMS. Users only have to provide the name of the source application that produces the data set in the previous run and the meaningful data set names. In addition, the optimization of I/O operations is also achieved by comparing the memory access pattern and file storage pattern to decide whether using collective or non-collective MPI I/O facility.

  2. HPSMS -- HPSMS allows efficient access to the data stored on secondary storage devices. We have implemented a library using C programming language, whose routines can be accessed by either the programmers or the HP-MDMS system. This library allows the user to see the data as an n-dimensional matrix. Then accessing the data is performed by giving the start and end coordinates of the data in the matrix.

    To enhance the performance of the data accesses, we employ a sub-filing strategy, in which a large multi-dimensional tape-resident global array is stored not as a single file but as a number of smaller sub-files, whose existence is transparent to the user. This strategy dramatically improves the response times most of the I/O calls. The metadata about the subfiles is kept in the Postgres95 database. We have implemented the library on top of HPSS and MPI I/O. SRB is used to access the files in HPSS.

  3. JAVA interface -- JDBC is used to connect the data base and retrieve and update the meta-data stored in the data base. The typical operations implemented through JDBC are creating tables, inserting attribute values into the tables, and inquiring attribute values from tables.

    This JAVA interface contains the names of all accessible parallel applications that perform I/O operations through MDMS. Therefore, these external parallel applications can be launched from the JAVA GUI. When different application is chosen, the corresponding run-time parameter input list is also changed with respect to the application.


Last modified on Sep. 29, 1999

Please send comments to wkliao@ece.nwu.edu.