BESS: Sparse Data Storage of Multidimensional Data for OLAP and Data Mining

Sanjay Goil and Alok Choudhary

Abstract

Decision support systems use On-Line Analytical Processing and data mining to find interesting information from large databases. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multi-dimensionality of the underlying data, a task for which multi-dimensional databases are well suited. Traditional multi-dimensional databases store data in multi-dimensional arrays on which analytical operations are performed. Multi-Dimensional arrays are good to store dense data. Since most datasets are sparse, sparse storage schemes are required for efficient storage of data. However, there is a trade-off between storage and access of data. Complex operations, such as required for knowledge discovery and data mining can be very expensive in terms of data access time if efficient data structures are not used.

In this paper we introduce the bit-encoded sparse structure(BESS) for storing multi-dimensional sparse data. A comparison with other sparse data structures is presented in terms of storage efficiencies and data access times for the typical accesses required for OLAP and data mining workloads. The gain from reduced storage requirements is usually offset by overhead required in data accesses. We present an analysis of some typical OLAP operations using sparse storage schemes for higher dimensional data. This helps in highlighting the benefits of BESS. Performance results are presented which show good performance for OLAP queries using BESS over other storage schemes. We evaluate BESS with a space $\times$ time metric and find that its performance is better than the other structures. Furthermore, the analysis and performance results presented in this paper can be used by a data modeler to compare and contrast the storage versus data access performance trade-offs in choosing a suitable multi-dimensional structure for a given application.

Gzipped Postscript version of the paper