I/O Optimizations for Hierarchical Storage Systems
Gokhan Memik
Abstract
Over the last decade, processors have made enormous gains in
speed. But increase in the speed of the secondary and tertiary storage
devices could not cope with these gains. The result is that the
secondary and tertiary storage access times dominate execution time of
data intensive computations. Therefore, efficient data access
functionality for data stored in secondary and tertiary storage is a
must. Hierarchical storage systems (HSS) like HPSS are becoming
increasingly popular due to their large storage capabilities and high
data transfer rates. In this work, we have devised APRIL, a parallel
runtime library, that provides an easy-to-use interface for the data
residing on a hierarchical storage systems. The library uses three
different optimizations for improving the response times for several
I/O patterns: Sub-filing, multiple collective I/O, and selective
collective I/O. In sub-filing, a large multi-dimensional data
residing on the lower levels of the HSS is stored not as a single file
but as a number of smaller sub-files. Multiple Collective I/O is
used to optimize the accesses to several files. Selective
Collective I/O uses the collective I/O calls selectively, thereby
increasing the performance when the synchronization inherent in
collective I/O is not needed. In order to evaluate the performance of
the library, we have tested it against several access patterns. The
experiments show that, these optimizations can reduce the I/O time by
an order of magnitude in several cases.
Gzipped Postscript version of the paper