Parallel Data Cube Construction for High Performance On-Line Analytical Processing

Sanjay Goil and Alok Choudhary

Abstract

Decision support systems use On-Line Analytical Processing (OLAP) to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Traditionally, a relational approach (relational OLAP) has been taken to build such systems. Mor e recently, multidimensional database techniques (multidimensional OLAP) have been applied to decision-support applications. Data is stored in multidimensional arrays which is a more natural way to express the multi-dimensionality of the enterprise data and is more suited for analysis.

Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. Large amounts of data is taken from a data-warehouse for analysis and high performance computing is required to provide reasonable query response time for analytical queries. In this paper, we present algorithms for construction of the data cube on a distributed-memory parallel computer. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base data cube and compare their performances. We present results for in-memory data cube construction on the IBM- SP2.

Gzipped Postscript version of the paper