Storage Optimization for Large Multidimensional Datasets

Sachin More and Alok Choudhary

Abstract

Large multidimensional datasets are found in diverse application areas like data warehousing, satellite data processing and high energy physics. Due to the enormous size of the data to be stored, tertiary devices are the only cost-effective storage option. Due to the sequential access interface provided by these devices, storage pattern optimization becomes important to avoid high latency and attain streaming data bandwidth during query processing. This paper examines issues involved in determining efficient storage patterns for large multidimensional datasets in a database environment. We propose and evaluate a heuristic-based online algorithm for tertiary storage management under limited secondary storage assumption.

Gzipped Postscript version of the paper