Scheduling Queries on Tape-resident Data
Sachin More and Alok Choudhary
Abstract
Advances in storage technology have made near-line tertiary storage a viable
alternative for database and data warehouse systems.
Tertiary storage systems are employed in cases where secondary storage can not
satisfy the data handling requirements or tertiary storage is more cost
effective option. Tertiary storage devices have traditionally been used as
archival storage. The new application domains require on-demand retrieval of
data from these devices.
This paper investigates issues in optimizing I/O time for a query whose data
resides on automated tertiary storage containing multiple storage devices.
We model the problem as a limited storage parallel two-machine flow-shop
scheduling problem with additional constraints. Given a query, we establish
an upper bound on the number of storage devices for an optimal I/O schedule and
provide experimental proof for it. For queries that access small amounts of
data from multiple media, we derive an optimal schedule analytically. For
queries that access large amount of data we derive a heuristics based scheduling
algorithm using analytically proved results. We also discuss
practical aspects of about using the theoretical results in a real environment
and demonstrate our applicability of our work using an accurate robotic tape
library simulator.
Gzipped Postscript version of the paper