Demos (Statistics Toolbox)

Statistics Toolbox

Example: aoctool with Sample Data

The Statistics Toolbox has a small data set named carsmall with information about cars. It is a good sample data set to use with aoctool. You can also use aoctool with your own data.

To start the demonstration, load the data set.

load carsmall
who
Your variables are:
Acceleration   Horsepower     Model_Year     
Cylinders      MPG            Origin         
Displacement   Model          Weight

Suppose we want to study the relationship between the weight of a car and its mileage, and whether this relationship has changed over the years.

Next, start up the tool.

[h,atab,ctab,stats] = aoctool(Weight,MPG,Model_Year);
Note: 6 observations with missing values have been removed.

The graphical output consists of the following main window, plus a table of coefficient estimates and an analysis of variance table.

The group of each data point is coded by its color and symbol, and the fit for each group has the same color as the data points.

The initial fit models the y variable, MPG, as a linear function of the x variable, Weight. Each group has a separate line. The coefficients of the three lines appear in the figure titled ANOCOVA Coefficients. You can see that the slopes are roughly -0.0078, with a small deviation for each group:

Model year 70:

Model year 76:

Model year 82:

Model year 70:
Model year 76:
Model year 82:

Notice that the three fitted lines have slopes that are roughly similar. Could they really be the same? The Model_Year*Weight interaction expresses the difference in slopes, and the ANOVA table shows a test for the significance of this term. With an F statistic of 5.23 and a p-value of 0.0072, the slopes are significantly different.

To examine the fits when the slopes are constrained to be the same, return to the ANOCOVA Prediction Plot window and use the Model pop-up to select a Parallel Lines model. The window updates to show the graph below.

Though this fit looks reasonable, we know it is significantly worse than the Separate Lines model. Use the Model pop-up again to return to the original model.

The following sections focus on two other interesting aspects of aoctool:

Confidence Bounds

Now we have estimates of the relationship between MPG and Weight for each Model_Year, but how accurate are they? We can superimpose confidence bounds on the fits by examining them one group at a time. In the Model_Year menu at the lower right of the figure, change the setting from All Groups to 82. The data and fits for the other groups are dimmed, and confidence bounds appear around the 82 fit.

The dashed lines form an envelope around the fitted line for model year 82. Under the assumption that the true relationship is linear, these bounds provide a 95% confidence region for the true line. Note that the fits for the other model years are well outside these confidence bounds for Weight values between 2000 and 3000.

Sometimes it is more valuable to be able to predict the response value for a new observation, not just estimate the average response value. Like the polytool function, the aoctool function has a Bounds menu to change the definition of the confidence bounds. Use that menu to change from Line to Observation. The resulting wider intervals reflect the uncertainty in the parameter estimates as well as the randomness of a new observation.

Also like the polytool function, the aoctool function has crosshairs you can use to manipulate the Weight and watch the estimate and confidence bounds along the y-axis update. These values appear only when a single group is selected, not when All Groups is selected.

Multiple Comparisons

We can perform a multiple comparison test by using the stats output from aoctool as input to the multcompare function. The multcompare function can test either slopes, intercepts, or population marginal means (the heights of the four lines evaluated at the mean X value). In this example, we have already determined that the slopes are not all the same, but could it be that two are the same and only the other one is different? We can test that hypothesis.

multcompare(stats,0.05,'on','','s')
ans =
    1.0000    2.0000   -0.0012    0.0008    0.0029
    1.0000    3.0000    0.0013    0.0051    0.0088
    2.0000    3.0000    0.0005    0.0042    0.0079

This matrix shows that the estimated difference between the intercepts of groups 1 and 2 (1970 and 1976) is 0.0008, and a confidence interval for the difference is [-0.0012, 0.0029]. There is no significant difference between the two. There are significant differences, however, between the intercept for 1982 and each of the other two. The graph shows the same information.

Note that the stats structure was created in the initial call to the aoctool function, so it is based on the initial model fit (typically a separate-lines model). If you change the model interactively and want to base your multiple comparisons on the new model, you need to run aoctool again to get another stats structure, this time specifying your new model as the initial model.

The aoctool Demo The randtool Demo