Statistics Toolbox    
linkage

Create hierarchical cluster tree

Syntax

Description

Z = linkage(Y) creates a hierarchical cluster tree, using the Single Linkage algorithm. The input matrix, Y, is a distance vector of length -by-1, where m is the number of objects in the original dataset. You can generate such a vector with the pdist function. Y can also be a more general dissimilarity matrix conforming to the output format of pdist.

Z = linkage(Y,'method') computes a hierarchical cluster tree using the algorithm specified by 'method', where 'method' can be any of the following character strings that identify ways to create the cluster hierarchy. Their definitions are explained in Mathematical Definitions.

'single'
Shortest distance (default)
'complete'
Largest distance
'average'
Average distance
'centroid'
Centroid distance. The output Z is meaningful only if Y contains Euclidean distances.
'ward'
Incremental sum of squares

The output, Z, is an (m-1)-by-3 matrix containing cluster tree information. The leaf nodes in the cluster hierarchy are the objects in the original dataset, numbered from 1 to m. They are the singleton clusters from which all higher clusters are built. Each newly formed cluster, corresponding to row i in Z, is assigned the index m+i, where m is the total number of initial leaves.

Columns 1 and 2, Z(i,1:2), contain the indices of the objects that were linked in pairs to form a new cluster. This new cluster is assigned the index value m+i. There are m-1 higher clusters that correspond to the interior nodes of the hierarchical cluster tree.

Column 3, Z(i,3), contains the corresponding linkage distances between the objects paired in the clusters at each row i.

For example, consider a case with 30 initial nodes. If the tenth cluster formed by the linkage function combines object 5 and object 7 and their distance is 1.5, then row 10 of Z will contain the values (571.5). This newly formed cluster will have the index 10+30=40. If cluster 40 shows up in a later row, that means this newly formed cluster is being combined again into some bigger cluster.

Mathematical Definitions

The 'method' argument is a character string that specifies the algorithm used to generate the hierarchical cluster tree information. These linkage algorithms are based on various measurements of proximity between two groups of objects. If nr is the number of objects in cluster r and ns is the number of objects in cluster s, and xri is the ith object in cluster r, the definitions of these various measurements are as follows:

Example

See Also

cluster, clusterdata, cophenet, dendrogram, inconsistent, kmeans, pdist, silhouette, squareform


  lillietest logncdf