Statistics Toolbox    
multcompare

Multiple comparison test of means or other estimates

Syntax

Description

c = multcompare(stats) performs a multiple comparison test using the information in the stats structure, and returns a matrix c of pairwise comparison results. It also displays an interactive figure presenting a graphical representation of the test.

In a one-way analysis of variance, you compare the means of several groups to test the hypothesis that they are all the same, against the general alternative that they are not all the same. Sometimes this alternative may be too general. You may need information about which pairs of means are significantly different, and which are not. A test that can provide such information is called a "multiple comparison procedure."

When you perform a simple t-test of one group mean against another, you specify a significance level that determines the cutoff value of the t statistic. For example, you can specify the value alpha = 0.05 to insure that when there is no real difference, you will incorrectly find a significant difference no more than 5% of the time. When there are many group means, there are also many pairs to compare. If you applied an ordinary t-test in this situation, the alpha value would apply to each comparison, so the chance of incorrectly finding a significant difference would increase with the number of comparisons. Multiple comparison procedures are designed to provide an upper bound on the probability that any comparison will be incorrectly found significant.

The output c contains the results of the test in the form of a five-column matrix. Each row of the matrix represents one test, and there is one row for each pair of groups. The entries in the row indicate the means being compared, the estimated difference in means, and a confidence interval for the difference.

For example, suppose one row contains the following entries.

These numbers indicate that the mean of group 2 minus the mean of group 5 is estimated to be 8.2206, and a 95% confidence interval for the true mean is [1.9442, 14.4971].

In this example the confidence interval does not contain 0.0, so the difference is significant at the 0.05 level. If the confidence interval did contain 0.0, the difference would not be significant at the 0.05 level.

The multcompare function also displays a graph with each group mean represented by a symbol and an interval around the symbol. Two means are significantly different if their intervals are disjoint, and are not significantly different if their intervals overlap. You can use the mouse to select any group, and the graph will highlight any other groups that are significantly different from it.

c = multcompare(stats,alpha) determines the confidence levels of the intervals in the c matrix and in the figure. The confidence level is 100*(1-alpha)%. The default value of alpha is 0.05.

c = multcompare(stats,alpha,'displayopt') enables the graph display when 'displayopt' is 'on' (default) and suppresses the display when 'displayopt' is 'off'.

c = multcompare(stats,alpha,'displayopt','ctype') specifies the critical value to use for the multiple comparison, which can be any of the following.

ctype
Meaning
'hsd', or 'tukey-kramer'
Use Tukey's honestly significant difference criterion. This is the default, and it is based on the Studentized range distribution. It is optimal for balanced one-way ANOVA and similar procedures with equal sample sizes. It has been proven to be conservative for one-way ANOVA with different sample sizes. According to the unproven Tukey-Kramer conjecture, it is also accurate for problems where the quantities being compared are correlated, as in analysis of covariance with unbalanced covariate values.
'lsd'
Use Tukey's least significant difference procedure. This procedure is a simple t-test. It is reasonable if the preliminary test (say, the one-way ANOVA F statistic) shows a significant difference. If it is used unconditionally, it provides no protection against multiple comparisons.
'bonferroni'
Use critical values from the t distribution, after a Bonferroni adjustment to compensate for multiple comparisons. This procedure is conservative, but usually less so than the Scheffé procedure.
'dunn-sidak'
Use critical values from the t distribution, after an adjustment for multiple comparisons that was proposed by Dunn and proved accurate by idák. This procedure is similar to, but less conservative than, the Bonferroni procedure.
'scheffe'
Use critical values from Scheffé's S procedure, derived from the F distribution. This procedure provides a simultaneous confidence level for comparisons of all linear combinations of the means, and it is conservative for comparisons of simple differences of pairs.

c = multcompare(stats,alpha,'displayopt','ctype','estimate') specifies the estimate to be compared. The allowable values of estimate depend on the function that was the source of the stats structure, according to the following table.

Source
Allowable Values of Estimate
'anova1'
Ignored. Always compare the group means.
'anova2'
Either 'column' (the default) or 'row' to compare column or row means.
'anovan'
Ignored. Always compare the population marginal means as specified by the dim argument.
'aoctool'
Either 'slope', 'intercept', or 'pmm' to compare slopes, intercepts, or population marginal means. If the analysis of covariance model did not include separate slopes, then 'slope' is not allowed. If it did not include separate intercepts, then no comparisons are possible.
'friedman'
Ignored. Always compare average column ranks.
'kruskalwallis'
Ignored. Always compare average group ranks.

c = multcompare(stats,alpha,'displayopt','ctype','estimate',dim) specifies the population marginal means to be compared. This argument is used only if the input stats structure was created by the anovan function. For n-way ANOVA with n factors, you can specify dim as a scalar or a vector of integers between 1 and n. The default value is 1.

For example, if dim = 1, the estimates that are compared are the means for each value of the first grouping variable, adjusted by removing effects of the other grouping variables as if the design were balanced. If dim = [1 3], population marginal means are computed for each combination of the first and third grouping variables, removing effects of the second grouping variable. If you fit a singular model, some cell means may not be estimable and any population marginal means that depend on those cell means will have the value NaN.

Population marginal means are described by Milliken and Johnson (1992) and by Searle, Speed, and Milliken (1980). The idea behind population marginal means is to remove any effect of an unbalanced design by fixing the values of the factors specified by dim, and averaging out the effects of other factors as if each factor combination occurred the same number of times. The definition of population marginal means does not depend on the number of observations at each factor combination. For designed experiments where the number of observations at each factor combination has no meaning, population marginal means can be easier to interpret than simple means ignoring other factors. For surveys and other studies where the number of observations at each combination does have meaning, population marginal means may be harder to interpret.

[c,m] = multcompare(...) returns an additional matrix m. The first column of m contains the estimated values of the means (or whatever statistics are being compared) for each group, and the second column contains their standard errors.

[c,m,h] = multcompare(...) returns a handle h to the comparison graph. Note that the title of this graph contains instructions for interacting with the graph, and the x-axis label contains information about which means are significantly different from the selected mean. If you plan to use this graph for presentation, you may want to omit the title and the x-axis label. You can remove them using interactive features of the graph window, or you can use the following commands.

Example

Let's revisit the anova1 example testing the material strength in structural beams. From the anova1 output we found significant evidence that the three types of beams are not equivalent in strength. Now we can determine where those differences lie. First we create the data arrays and we perform one-way ANOVA.

Among the outputs is a structure that we can use as input to multcompare.

The third row of the output matrix shows that the differences in strength between the two alloys is not significant. A 95% confidence interval for the difference is [-5.6, 1.6], so we cannot reject the hypothesis that the true difference is zero.

The first two rows show that both comparisons involving the first group (steel) have confidence intervals that do not include zero. In other words, those differences are significant. The graph shows the same information.

See Also
anova1, anova2, anovan, aoctool, friedman, kruskalwallis

References

[1]  Hochberg, Y., and A. C. Tamhane, Multiple Comparison Procedures, 1987, Wiley.

[2]  Milliken, G. A., and D. E. Johnson, Analysis of Messy Data, Volume 1: Designed Experiments, 1992, Chapman & Hall.

[3]  Searle, S. R., F. M. Speed, and G. A. Milliken, "Population marginal means in the linear model: an alternative to least squares means," American Statistician, 1980, pp. 216-221.


  moment mvnpdf