Statistics Toolbox    
pdist

Pairwise distance between observations

Syntax

Description

Y = pdist(X) computes the Euclidean distance between pairs of objects in m-by-n matrix X, which is treated as m vectors of size n. For a dataset made up of m objects, there are pairs.

The output, Y, is a vector of length , containing the distance information. The distances are arranged in the order (1,2), (1,3), ..., (1,m), (2,3), ..., (2,m), ..., ..., (m-1,m). Y is also commonly known as a similarity matrix or dissimilarity matrix.

To save space and computation time, Y is formatted as a vector. However, you can convert this vector into a square matrix using the squareform function so that element i,j in the matrix, where , corresponds to the distance between objects i and j in the original dataset.

Y = pdist(X,'metric') computes the distance between objects in the data matrix, X, using the method specified by 'metric', where 'metric' can be any of the following character strings that identify ways to compute the distance.

'euclidean'
Euclidean distance (default)
'seuclidean'
Standardized Euclidean distance. Each coordinate in the sum of squares is inverse weighted by the sample variance of that coordinate.
'mahalanobis'
Mahalanobis distance
'cityblock'
City Block metric
'minkowski'
Minkowski metric
'cosine'
One minus the cosine of the included angle between points (treated as vectors)
'correlation'
One minus the sample correlation between points (treated as sequences of values).
'hamming'
Hamming distance, the percentage of coordinates that differ
'jaccard'
One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ

Y = pdist(X,distfun,p1,p2,...) accepts a function handle to a distance function of the form

taking as arguments two q-by-n matrices XI and XJ each of which contains rows of X, plus zero or more additional arguments, and returning a q-by-1 vector of distances d, whose kth element is the distance between the observations XI(k,:) and XJ(k,:). The arguments p1,p2,... are passed directly to the function distfun.

Y = pdist(X,'minkowski',p) computes the distance between objects in the data matrix, X, using the Minkowski metric. p is the exponent used in the Minkowski computation which, by default, is 2.

Mathematical Definitions of Methods

Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xr and xs are defined as follows:

where

Examples

See Also

cluster, clusterdata, cmdscale, cophenet, dendrogram, inconsistent, linkage, silhouette, squareform


  pdf perms