Multivariate Statistics (Statistics Toolbox)

Statistics Toolbox

The Component Scores (Second Output)

The second output, newdata, is the data in the new coordinate system defined by the principal components. This output is the same size as the input data matrix.

A plot of the first two columns of newdata shows the ratings data projected onto the first two principal components.

plot(newdata(:,1),newdata(:,2),'+')
xlabel('1st Principal Component');
ylabel('2nd Principal Component');

Note the outlying points in the lower right corner.

The function gname is useful for graphically identifying a few points in a plot like this. You can call gname with a string matrix containing as many case labels as points in the plot. The string matrix names works for labeling points with the city names.

```
gname(names)
```

Move your cursor over the plot and click once near each point at the top right. As you click on each point, MATLAB labels it with the proper row from the names string matrix. When you are finished labeling points, press the Return key.

Here is the resulting plot.

The labeled cities are the biggest population centers in the United States. Perhaps we should consider them as a completely separate group. If we call gname without arguments, it labels each point with its row number.

We can create an index variable containing the row numbers of all the metropolitan areas we chose.

metro = [43 65 179 213 234 270 314];
names(metro,:)

ans =
   Boston, MA                  
   Chicago, IL                 
   Los Angeles, Long Beach, CA 
   New York, NY                
   Philadelphia, PA-NJ         
   San Francisco, CA           
   Washington, DC-MD-VA

To remove these rows from the ratings matrix, type the following.

rsubset = ratings;
nsubset = names;
nsubset(metro,:) = [];
rsubset(metro,:) = [];
size(rsubset)
ans =
   322     9

To practice, repeat the analysis using the variable rsubset as the new data matrix and nsubset as the string matrix of labels.

The Principal Components (First Output) The Component Variances (Third Output)