Statistics Toolbox | ![]() ![]() |
Classical multidimensional scaling
Syntax
Description
Y = cmdscale(D)
takes an n
-by-n
distance matrix D
, and returns an n
-by-p
configuration matrix Y
. Rows of Y
are the coordinates of n
points in p
-dimensional space for some p < n
. When D
is a Euclidean distance matrix, the distances between those points are given by D
. p
is the dimension of the smallest space in which the n
points whose interpoint distances are given by D
can be embedded.
[Y,e] = cmdscale(D)
also returns the eigenvalues of Y*Y'
. When D
is Euclidean, the first p
elements of e
are positive, the rest zero. If the first k
elements of e
are much larger than the remaining (n-k)
, then you can use the first k
columns of Y
as k
-dimensional points whose interpoint distances approximate D
. This can provide a useful dimension reduction for visualization, e.g., for k = 2
.
D
need not be a Euclidean distance matrix. If it is non-Euclidean or a more general dissimilarity matrix, then some elements of e
are negative, and cmdscale
choses p
as the number of positive eigenvalues. In this case, the reduction to p
or fewer dimensions provides a reasonable approximation to D
only if the negative elements of e
are small in magnitude.
You can specify D
as either a full dissimilarity matrix, or in upper triangle vector form such as is output by pdist
. A full dissimilarity matrix must be real and symmetric, and have zeros along the diagonal and positive elements everywhere else. A dissimilarity matrix in upper triangle form must have real, positive entries. You can also specify D
as a full similarity matrix, with ones along the diagonal and all other elements less than one. cmdscale
tranforms a similarity matrix to a dissimilarity matrix in such a way that distances between the points returned in Y
equal or approximate sqrt(1-D)
. To use a different transformation, you must transform the similarities prior to calling cmdscale
.
Examples
Generate some points in 4-dimensional space, but "close" to 3-dimensional space, then reduce them to distances only.
Find a configuration with those inter-point distances.
[Y,e] = cmdscale(D); % Four, but fourth one small dim = sum(e > eps^(3/4)) % Poor reconstruction maxerr2 = max(abs(pdist(X) - pdist(Y(:,1:2)))) % Good reconstruction maxerr3 = max(abs(pdist(X) - pdist(Y(:,1:3)))) % Exact reconstruction maxerr4 = max(abs(pdist(X) - pdist(Y))) % D is now non-Euclidean D = pdist(X,'cityblock'); [Y,e] = cmdscale(D); % One is large negative min(e) % Poor reconstruction maxerr = max(abs(pdist(X) - pdist(Y)))
See Also
References
[1] Seber, G.A.F., Multivariate Observations, Wiley, 1984
![]() | clusterdata | combnk | ![]() |