Parallel K-Means Data Clustering


This software package parallel-kmeans.tar.gz (4.6 MB) of parallel K-means data clustering contains the followings: For large data support (more than 2 billion number of data points), see this page for an MPI implementation that uses 8-byte integers.

Algorithm:

To compile:

Although I used Intel C compiler, icc, version 7.1 during the code development, there is no particular features required except for OpenMP. Thus, the implementation should be fairly portable. Please modify Makefile to change the compiler if needed.

To run:

Input file format:

The executables read an input file that stores the data points to be clustered. A few example files are provided in the sub-directory ./Image_data. The input files can be in two formats: ASCII text and raw binary.

Output files:

There are two output files:

Performance results:


Limitations:

Derived Work:

Related Links:


Wei-keng Liao
Electrical Engineering and Computer Science Department
Northwestern University
Please send comments to
Software available since Sep. 17, 2005.
Page last modified date: Dec. 5, 2013.