I/O Optimizations for Hierarchical Storage Systems

Gokhan Memik

Abstract

Over the last decade, processors have made enormous gains in speed. But increase in the speed of the secondary and tertiary storage devices could not cope with these gains. The result is that the secondary and tertiary storage access times dominate execution time of data intensive computations. Therefore, efficient data access functionality for data stored in secondary and tertiary storage is a must. Hierarchical storage systems (HSS) like HPSS are becoming increasingly popular due to their large storage capabilities and high data transfer rates. In this work, we have devised APRIL, a parallel runtime library, that provides an easy-to-use interface for the data residing on a hierarchical storage systems. The library uses three different optimizations for improving the response times for several I/O patterns: Sub-filing, multiple collective I/O, and selective collective I/O. In sub-filing, a large multi-dimensional data residing on the lower levels of the HSS is stored not as a single file but as a number of smaller sub-files. Multiple Collective I/O is used to optimize the accesses to several files. Selective Collective I/O uses the collective I/O calls selectively, thereby increasing the performance when the synchronization inherent in collective I/O is not needed. In order to evaluate the performance of the library, we have tested it against several access patterns. The experiments show that, these optimizations can reduce the I/O time by an order of magnitude in several cases.

Gzipped Postscript version of the paper