Statistics Toolbox    
kruskalwallis

Kruskal-Wallis nonparametric one-way Analysis of Variance (ANOVA)

Syntax

Description

p = kruskalwallis(X) performs a Kruskal-Wallis test for comparing the means of columns of the m-by-n matrix X, where each column represents an independent sample containing m mutually independent observations. The Kruskal-Wallis test is a nonparametric version of the classical one-way ANOVA. The function returns the p-value for the null hypothesis that all samples in X are drawn from the same population (or from different populations with the same mean).

If the p-value is near zero, this casts doubt on the null hypothesis and suggests that at least one sample mean is significantly different than the other sample means. The choice of a critical p-value to determine whether the result is judged "statistically significant" is left to the researcher. It is common to declare a result significant if the p-value is less than 0.05 or 0.01.

The kruskalwallis function displays two figures. The first figure is a standard ANOVA table, calculated using the ranks of the data rather than their numeric values. Ranks are found by ordering the data from smallest to largest across all groups, and taking the numeric index of this ordering. The rank for a tied observation is equal to the average rank of all observations tied with it. For example, the following table shows the ranks for a small sample.

X value
1.4
2.7
1.6
1.6
3.3
0.9
1.1
Rank
3
6
4.5
4.5
7
1
2

The entries in the ANOVA table are the usual sums of squares, degrees of freedom, and other quantities calculated on the ranks. The usual F statistic is replaced by a chi-square statistic. The p-value measures the significance of the chi-square statistic.

The second figure displays box plots of each column of X (not the ranks of X).

p = kruskalwallis(X,group) uses the values in group (a character array or cell array) as labels for the box plot of the samples in X, when X is a matrix. Each row of group contains the label for the data in the corresponding column of X, so group must have length equal to the number of columns in X.

When X is a vector, kruskalwallis performs a Kruskal-Wallis test on the samples contained in X, as indexed by input group (a vector, character array, or cell array). Each element in group identifies the group (i.e., sample) to which the corresponding element in vector X belongs, so group must have the same length as X. The labels contained in group are also used to annotate the box plot.

It is not necessary to label samples sequentially (123, ...). For example, if X contains measurements taken at three different temperatures, -27°, 65°, and 110°, you could use these numbers as the sample labels in group. If a row of group contains an empty cell or empty string, that row and the corresponding observation in X are disregarded. NaNs in either input are similarly ignored.

p = kruskalwallis(X,group,'displayopt') enables the table and box plot displays when 'displayopt' is 'on' (default) and suppresses the displays when 'displayopt' is 'off'.

[p,table] = kruskalwallis(...) returns the ANOVA table (including column and row labels) in cell array table. (You can copy a text version of the ANOVA table to the clipboard by using the Copy Text item on the Edit menu.)

[p,table,stats] = kruskalwallis(...) returns a stats structure that you can use to perform a follow-up multiple comparison test. The kruskalwallis test evaluates the hypothesis that all samples have the same mean, against the alternative that the means are not all the same. Sometimes it is preferable to perform a test to determine which pairs of means are significantly different, and which are not. You can use the multcompare function to perform such tests by supplying the stats structure as input.

Assumptions

The Kruskal-Wallis test makes the following assumptions about the data in X:

The classical one-way ANOVA test replaces the first assumption with the stronger assumption that the populations have normal distributions.

Example

Let's revisit the same material strength study that we used with the anova1 function, to see if the nonparametric Kruskal-Wallis procedure leads to the same conclusion. Recall we are studying the strength of beams made from three alloys:

This time we try both classical and Kruskal-Wallis anova, omitting displays:

Now the classical ANOVA test does not find a significant difference, but the nonparametric procedure does. This illustrates one of the properties of nonparametric procedures - they are often not severely affected by changes in a small portion of the data.

Reference

[1]  Hollander, M., and D. A. Wolfe, Nonparametric Statistical Methods, Wiley, 1973.

See Also
anova1, boxplot, multcompare


  kmeans ksdensity