Instructor: Hasan A. Poonawala
Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA
Topics:
Wolfe Conditions
Backtracking Line Search
Convergence properties
A function is Lipschitz on a set if for any
For twice differentiable functions, being differentiable is equivalent to for all
Descent Lemma
Let be a twice differentiable function whose gradient is Lipschitz continuous over some convex set . Then, for all , by Taylor’s Theorem,
This Lemma says that we can define a local quadratic function that is an upper bound for near a point .
If and , we get
To ensure decrease, we need
If , then
If , then
Descent Lemma:
The smallest bound on occurs when :
But what happens when we don’t know ?
This is a preview of backtracking
Replaces curvature condition with a reducing sequence of .
Example values: , ,
Informally
Will
Weaker
Will
Still weaker
Will
Theorem
Consider any iteration of the form , where is a descent direction and satisfies the Wolfe conditions. Suppose that is bounded below in and that is continuously differentiable in an open set containing the level set . Assume also that the gradient is Lipschitz continuous on , then
If we choose such that for all , then the Zoutendijk condition implies that
Similar results apply to chosen using backtracking line search.
Informally
How fast does
Less informally
The convergence rate measures how fast the error in the solution decreases. The error can be in terms of the optimum , the optimal value , or .
Example
If for constant , then the convergence is linear. Moreover, the number of iterations .
We know that whatever point we are looking must be a stationary one, it must satisfy .
Newton’s method applies the Newton root-finding algorithm to :
Apply root finding to :
When , we are minimizing the quadratic approximation which need not have minimum at
Example: Quadratic Function
Consider where
What happens when ?
Theorem 3.5
Suppose that is twice differentiable and that the Hessian is Lipschitz continuous in a neighborhood of a solution at which the sufficient conditions are satisfied. Consider the iteration where . Then
Here, we directly look at , using
Under the assumption that is -Lipschitz, using Taylor’s theorem, we have
For , we can rewrite this expression using
For , we can rewrite this expression using as
If , then we get quadratic convergence.
Although Newton’s method is very attractive in terms of its convergence properties the solution, it requires modification before it can be used at points that are far from the solution.
The following modifications are typically used:
A search parameter is introduced where is selected to minimize .
A popular selection method is backtracking line search.
General class of algorithms is given by
For small , it can be shown that
In practice, Newton’s method must be modified to accommodate the possible non-positive definiteness of at regions far from the solution.
Common approach: for some .
This can be regarded as a compromise between SD ( very large) and Newton’s method ().
Levenberg-Marquardt performs Cholesky factorization for a given value of as follows
This factorization checks implicitly for positive definiteness.
If the factorization fails (matrix not PD) is increased.
Step direction is found by solving .
Computes as a solution to the secant equation:
Then solve
One dimension
Algorithms for updating :
Broyden, Fletcher, Goldfarb, and Shanno (BFGS):
Symmetric Rank-One (SR1):
Algorithms for updating :
Initialization: Start with
Step size: for all may interfere with approximating the Hessian
You can now minimize (when unconstrained) iteratively. Find LM by
Systems Optimization I • ME 647 Home