Large-Scale Algorithms (Optimization Toolbox)

Optimization Toolbox

Trust-Region Methods for Nonlinear Minimization

Many of the methods used in the Optimization Toolbox are based on trust-regions, a simple yet powerful concept in optimization.

To understand the trust-region approach to optimization, consider the unconstrained minimization problem, , where the function takes vector arguments and returns scalars. Suppose you are at a point in n-space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximate with a simpler function which reasonably reflects the behavior of function in a neighborhood around the point x. This neighborhood is the trust region. A trial step is computed by minimizing (or approximately minimizing) over N. This is the trust-region subproblem,

(4-1)

The current point is updated to be if ; otherwise, the current point remains unchanged and N, the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach to minimizing are how to choose and compute the approximation (defined at the current point ), how to choose and modify the trust region N, and how accurately to solve the trust-region subproblem. This section focuses on the unconstrained problem. Later sections discuss additional complications due to the presence of constraints on the variables.

In the standard trust-region method ([8]), the quadratic approximation is defined by the first two terms of the Taylor approximation to at x; the neighborhood is usually spherical or ellipsoidal in shape. Mathematically the trust-region subproblem is typically stated

(4-2)

where is the gradient of at the current point x, is the Hessian matrix (the symmetric matrix of second derivatives), is a diagonal scaling matrix, is a positive scalar, and || . || is the 2-norm. Good algorithms exist for solving Eq. 4-2 (see [8]); such algorithms typically involve the computation of a full eigensystem and a Newton process applied to the secular equation

Such algorithms provide an accurate solution to Eq. 4-2. However, they require time proportional to several factorizations of H. Therefore, for large-scale problems a different approach is needed. Several approximation and heuristic strategies, based on Eq. 4-2, have been proposed in the literature ([2],[10]). The approximation approach followed in the Optimization Toolbox is to restrict the trust-region subproblem to a two-dimensional subspace ([1],[2]). Once the subspace has been computed, the work to solve Eq. 4-2 is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace is determined with the aid of a preconditioned conjugate gradient process described below. The toolbox assigns , where is in the direction of the gradient g, and is either an approximate Newton direction, i.e., a solution to

(4-3)

or a direction of negative curvature,

(4-4)

The philosophy behind this choice of is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

A framework for the Optimization Toolbox approach to unconstrained minimization using trust-region ideas is now easy to describe:

Formulate the two-dimensional trust-region subproblem.
Solve Eq. 4-2 to determine the trial step .
If then .
Adjust .

These four steps are repeated until convergence. The trust-region dimension is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e., See [6],[9] for a discussion of this aspect.

The Optimization Toolbox treats a few important special cases of f with specialized functions: nonlinear least-squares, quadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case. These special cases are discussed in later sections.

Large-Scale Algorithms Preconditioned Conjugate Gradients