clusterdata (Statistics Toolbox)

Construct clusters from data

Syntax

T = clusterdata(X, cutoff)
T = clusterdata(X,'param1',val1,'param2',val2,...)

Description

T = clusterdata(X, cutoff) uses the pdist, linkage, and cluster functions to construct clusters from data X. X is an m-by-n matrix, treated as m observations of n variables. cutoff is a threshold for cutting the hierarchical tree generated by linkage into clusters. When 0 < cutoff < 2, clusterdata forms clusters when inconsistent values are greater than cutoff (see the inconsistent function). When cutoff is an integer and cutoff >= 2, then clusterdata inteprets cutoff as the maximum number of clusters to keep in the hierarchical tree generated by linkage. The output T is a vector of size m containing a cluster number for each observation.

T = clusterdata(X,cutoff) is the same as

Y = pdist(X,'euclid'); 
Z = linkage(Y,'single'); 
T = cluster(Z,'cutoff',cutoff);

T = clusterdata(X,'param1',val1,'param2',val2,...) provides more control over the clustering through a set of parameter/value pairs. Valid parameters are:

'distance'
Any of the distance metric names allowed by pdist (follow the 'minkowski' option by the value of the exponent p).

'linkage'
Any of the linkage methods allowed by the linkage function

'cutoff'
Cutoff for inconsistent or distance measure

'maxclust'
Maximum number of clusters to form

'criterion'
Either 'inconsistent' or 'distance'

'depth'
Depth for computing inconsistent values

`'distance'`	Any of the distance metric names allowed by `pdist` (follow the `'minkowski'` option by the value of the exponent `p`).
`'linkage'`	Any of the linkage methods allowed by the `linkage` function
`'cutoff'`	Cutoff for inconsistent or distance measure
`'maxclust'`	Maximum number of clusters to form
`'criterion'`	Either `'inconsistent'` or `'distance'`
`'depth'`	Depth for computing inconsistent values

Example

The example first creates a sample dataset of random numbers. It then uses clusterdata to compute the distances between items in the dataset and create a hierarchical cluster tree from the dataset. Finally, the clusterdata function groups the items in the dataset into three clusters. The example uses the find function to list all the items in cluster 2.

rand('seed',12); 
X = [rand(10,3); rand(10,3)+1.2; rand(10,3)+2.5]; 
T = clusterdata(X,'maxclust',3); 
find(T==2)
ans =
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30

See Also

cluster, inconsistent, kmeans, linkage, pdist

cluster cmdscale