Motivation
With the unprecedented growth rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from the data. Many data warehouses are filling up with huge amounts of data. Data mining, also known as knowledge discovery, attempts to develop automatic procedures that search these enormous data sets to obtain useful information that would otherwise remain undiscovered. Such knowledge can take the form of patterns, rules, clusters, or anomalies that exist in the massive datasets. These discoveries could be of great significance to scientific or business organizations. Given the enormous size and dimensionality of the datasets, high performance (parallel, distributed, grid-based) algorithms are crucial to any successful data mining solution.
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single-core CPUs, the trend clearly goes towards multi-core systems. Therefore, supercomputers, large-scale distributed computing infrastructures, and grid-based computing environments provide new opportunities for high performance data mining. Research on the corresponding algorithms must hence be kept on the forefront of this fast evolving field in order to keep pushing the performance envelope of data mining applications to meet the requirements.
Goals
The goal of this workshop is to bring researchers and practitioners together in a setting where they can discuss the design, implementation, and deployment of large-scale, parallel, distributed, or grid-based data mining systems, which can manipulate data obtained from very large enterprise or scientific databases, regardless of whether the data are located centrally or are globally distributed.
|