Statistics Toolbox | ![]() ![]() |
The Component Scores (Second Output)
The second output, newdata
, is the data in the new coordinate system defined by the principal components. This output is the same size as the input data matrix.
A plot of the first two columns of newdata
shows the ratings data projected onto the first two principal components.
plot(newdata(:,1),newdata(:,2),'+') xlabel('1st Principal Component'); ylabel('2nd Principal Component');
Note the outlying points in the lower right corner.
The function gname
is useful for graphically identifying a few points in a plot like this. You can call gname
with a string matrix containing as many case labels as points in the plot. The string matrix names
works for labeling points with the city names.
Move your cursor over the plot and click once near each point at the top right. As you click on each point, MATLAB labels it with the proper row from the names
string matrix. When you are finished labeling points, press the Return key.
The labeled cities are the biggest population centers in the United States. Perhaps we should consider them as a completely separate group. If we call gname
without arguments, it labels each point with its row number.
We can create an index variable containing the row numbers of all the metropolitan areas we chose.
metro = [43 65 179 213 234 270 314]; names(metro,:) ans = Boston, MA Chicago, IL Los Angeles, Long Beach, CA New York, NY Philadelphia, PA-NJ San Francisco, CA Washington, DC-MD-VA
To remove these rows from the ratings matrix, type the following.
rsubset = ratings; nsubset = names; nsubset(metro,:) = []; rsubset(metro,:) = []; size(rsubset) ans = 322 9
To practice, repeat the analysis using the variable rsubset
as the new data matrix and nsubset
as the string matrix of labels.
![]() | The Principal Components (First Output) | The Component Variances (Third Output) | ![]() |