Statistics Toolbox    
robustfit

Robust regression

Syntax

Description

b = robustfit(X,Y) uses robust regression to fit Y as a function of the columns of X, and returns the vector b of coefficient estimates. The robustfit function uses an iteratively reweighted least squares algorithm, with the weights at each iteration calculated by applying the bisquare function to the residuals from the previous iteration. This algorithm gives lower weight to points that do not fit well. The results are less sensitive to outliers in the data as compared with ordinary least squares regression.

[b,stats] = robustfit(X,Y) also returns a stats structure with the following fields:

The robustfit function estimates the variance-covariance matrix of the coefficient estimates as V = inv(X'*X)*stats.s^2. The standard errors and correlations are derived from V.

[b,stats] = robustfit(X,Y,'wfun',tune,'const') specifies a weight function, a tuning constant, and the presence or absence of a constant term. The weight function 'wfun' can be any of the names listed in the following table.

Table 12-2:
Weight function
Meaning
Tuning constant
'andrews'
w = (abs(r)<pi) .* sin(r) ./ r
1.339
'bisquare'
w = (abs(r)<1) .* (1 - r.^2).^2
4.685
'cauchy'
w = 1 ./ (1 + r.^2)
2.385
'fair'
w = 1 ./ (1 + abs(r))
1.400
'huber'
w = 1 ./ max(1, abs(r))
1.345
'logistic'
w = tanh(r) ./ r
1.205
'talwar'
w = 1 * (abs(r)<1)
2.795
'welsch'
w = exp(-(r.^2))
2.985

The value r in the weight function expression is equal to

where resid is the vector of residuals from the previous iteration, tune is the tuning constant, h is the vector of leverage values from a least squares fit, and s is an estimate of the standard deviation of the error term.

The quantity MAD is the median absolute deviation of the residuals from their median. The constant 0.6745 makes the estimate unbiased for the normal distribution. If there are p columns in the X matrix (including the constant term, if any), the smallest p-1 absolute deviations are excluded when computing their median.

In addition to the function names listed above, 'wfun' can be 'ols' to perform unweighted ordinary least squares.

The argument tune overrides the default tuning constant from the table. A smaller tuning constant tends to downweight large residuals more severely, and a larger tuning constant downweights large residuals less severely. The default tuning constants, shown in the table, yield coefficient estimates that are approximately 95% as efficient as least squares estimates, when the response has a normal distribution with no outliers. The value of 'const' can be 'on' (the default) to add a constant term or 'off' to omit it. If you want a constant term, you should set 'const' to 'on' rather than adding a column of ones to your X matrix.

As an alternative to specifying one of the named weight functions shown above, you can write your own weight function that takes a vector of scaled residuals as input and produces a vector of weights as output. You can specify 'wfun' using @ (for example, @myfun) or as an inline function.

Example

Let's see how a single erroneous point affects least squares and robust fits. First we generate a simple dataset following the equation y = 10-2*x plus some random noise. Then we change one y value to simulate an outlier that could be an erroneous measurement.

We use both ordinary least squares and robust fitting to estimate the equations of a straight line fit.

A scatter plot with both fitted lines shows that the robust fit (solid line) fits most of the data points well but ignores the outlier. The least squares fit (dotted line) is pulled toward the outlier.

See Also
regress, robustdemo

References

[1]  DuMouchel, W.H., and F.L. O'Brien (1989), "Integrating a Robust Option into a Multiple Regression Computing Environment," Computer Science and Statistics: Proceedings of the 21st Symposium on the Interface, Alexandria, VA: American Statistical Association.

[2]  Holland, P.W., and R.E. Welsch (1977), "Robust Regression Using Iteratively Reweighted Least-Squares," Communications in Statistics: Theory and Methods, A6, 813-827.

[3]  Huber, P.J. (1981), Robust Statistics, New York: Wiley.

[4]  Street, J.O., R.J. Carroll, and D. Ruppert (1988), "A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares," The American Statistician, 42, 152-154


  robustdemo rowexch