Statistics Toolbox | ![]() ![]() |
Example: Principal Components Analysis
Let us look at a sample application that uses nine different indices of the quality of life in 329 U.S. cities. These are climate, housing, health, crime, transportation, education, arts, recreation, and economics. For each index, higher is better; so, for example, a higher index for crime means a lower crime rate.
We start by loading the data in cities.mat
.
load cities whos Name Size Bytes Class categories 9x14 252 char array names 329x43 28294 char array ratings 329x9 23688 double array
The whos
command generates a table of information about all the variables in the workspace.
The cities data set contains three variables:
categories
, a string matrix containing the names of the indices.
names
, a string matrix containing the 329 city names.
ratings
, the data matrix with 329 rows and 9 columns.
Let's look at the value of the categories
variable.
categories categories = climate housing health crime transportation education arts recreation economics
Now, let's look at the first several rows of names
variable.
To get a quick impression of the ratings data, make a box plot.
These commands generate the plot below. Note that there is substantially more variability in the ratings of the arts and housing than in the ratings of crime and climate.
Ordinarily you might also graph pairs of the original variables, but there are 36 two-variable plots. Perhaps principal components analysis can reduce the number of variables we need to consider.
Sometimes it makes sense to compute principal components for raw data. This is appropriate when all the variables are in the same units. Standardizing the data is reasonable when the variables are in different units or when the variance of the different columns is substantial (as in this case).
You can standardize the data by dividing each column by its standard deviation.
Now we are ready to find the principal components.
The following sections explain the four outputs from princomp
:
![]() | Principal Components Analysis | The Principal Components (First Output) | ![]() |