Algorithms for Minimization Without Derivatives
eBook - ePub

Algorithms for Minimization Without Derivatives

Richard P. Brent

Share book
  1. 208 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Algorithms for Minimization Without Derivatives

Richard P. Brent

Book details
Book preview
Table of contents
Citations

About This Book

This outstanding text for graduate students and researchers proposes improvements to existing algorithms, extends their related mathematical theories, and offers details on new algorithms for approximating local and global minima. None of the algorithms requires an evaluation of derivatives; all depend entirely on sequential function evaluation, a highly practical scenario in the frequent event of difficult-to-evaluate derivatives.
Topics include the use of successive interpolation for finding simple zeros of a function and its derivatives; an algorithm with guaranteed convergence for finding a minimum of a function of one variation; global minimization given an upper bound on the second derivative; and a new algorithm for minimizing a function of several variables without calculating derivatives. Many numerical examples augment the text, along with a complete analysis of rate of convergence for most algorithms and error bounds that allow for the effect of rounding errors.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Algorithms for Minimization Without Derivatives an online PDF/ePUB?
Yes, you can access Algorithms for Minimization Without Derivatives by Richard P. Brent in PDF and/or ePUB format, as well as other popular books in Mathematics & Optimization. We have over one million books available in our catalogue for you to explore.

Information

Year
2013
ISBN
9780486143682

1

INTRODUCTION AND SUMMARY

Section 1

INTRODUCTION

Consider the problem of finding an approximate zero or minimum of a function of one real variable, using limited-precision arithmetic on a sequential digital computer. The function f may not be differentiable, or the derivative fâ€Č may be difficult to compute, so a method which uses only computed values of f is desirable. Since an evaluation of f may be very expensive in terms of computer time, a good method should guarantee to find a correct solution, to within some prescribed tolerance, using only a small number of function evaluations. Hence, we study algorithms which depend on evaluating f at a small number of points, and for which certain desirable properties are guaranteed, even in the presence of rounding errors.
Slow, safe algorithms are seldom preferred in practice to fast algorithms which may occasionally fail. Thus, we want algorithms which are guaranteed to succeed in a reasonable time even for the most “difficult” functions, yet are as fast as commonly used algorithms for “easy” functions. For example, bisection is a safe method for finding a zero of a function which changes sign in a given interval, but from our point of view it is not an acceptable method, because it is just as slow for any function, no matter how well behaved, as it is in the worst possible case (ignoring the possibility that an exact zero may occasionally be found by chance). As a contrasting example, consider the method of successive linear interpolation, which converges superlinearly to a simple zero of a C1 function, provided that the initial approximation is good and rounding errors are unimportant. This method is not acceptable either, for in practice there may be no way of knowing in advance if the zero is simple, if the initial approximation is sufficiently good to ensure convergence, or if the effect of rounding errors is important.
In Chapter 4 we describe an algorithm which, by combining some of the desirable features of bisection and successive linear interpolation, does come close to satisfying our requirements: it is guaranteed to converge (i.e., halt) after a reasonably small number of function evaluations, and the rate of convergence for well-behaved functions is so fast that a less reliable algorithm is unlikely to be preferred on grounds of speed.
An analogous algorithm, which finds a local minimum of a function of one variable by a combination of golden section search and successive parabolic interpolation, is described in Chapter 5. This algorithm fails to satisfy one of our requirements: in certain applications where repeated one-dimensional minimizations are required, and where accuracy is not very important, a faster (though less reliable) method is preferable. One such application, finding local minima of functions of several variables without calculating derivatives, is discussed in Chapter 7. (Note that wherever we consider minima we could equally well consider maxima.)
Most algorithms for minimizing a nonlinear function of one or more variables find, at best, a local minimum. For a function with several local minima, there is no guarantee that the local minimum found is the global (i.e., true or lowest) minimum. Since it is the global minimum which is of interest in most applications, this is a serious practical disadvantage of most minimization algorithms, and our algorithm given in Chapter 5 is no exception. The usual remedy is to try several different starting points and, perhaps, vary some of the parameters of the minimization procedure, in the hope that the lowest local minimum found is the global minimum. This approach is inefficient, as the same local minimum may be found several times. It is also unreliable, for, no matter how many starting points are tried, it is impossible to be quite sure that the global minimum has been found.
In Chapter 6 we discuss the problem of finding the global minimum to within a prescribed tolerance. It is possible to give an algorithm for solving this problem, provided that a little a priori information about the function to be minimized is known. We describe an efficient algorithm, applicable if an upper bound on f″ is known, and we show how this algorithm can be used recursively to find the global minimum of a function of several variables. Unfortunately, because the amount of computation involved increases exponentially with the number of variables, the recursive method is practical only for functions of less than four variables. For functions of more variables, we still have to resort to the unreliable “trial and error” method, unless special information about the function to be minimized is available.
Thus, we are led to consider practical methods for finding local (unconstrained) minima of functions of several variables. As before, we consider methods which depend on evaluating the function at a small number of points. Unfortunately, without imposing very strict conditions on the functions to be minimized, it is not possible to guarantee that an n-dimensional minimization algorithm produces results which are correct to within some prescribed tolerance, or that the effect of rounding errors has been taken into account. We have to be satisfied with algorithms which nearly always give correct results for the functions likely to arise in practical applications.
As suggested by the length of our bibliography, there has recently been considerable interest in the unconstrained minimization problem. Thus, we can hardly expect to find a good method which is completely unrelated to the known ones. In Chapter 7 we take one of the better methods which does not use derivatives, that of Powell (1964), and modify it to try to overcome some of the difficulties observed in the literature. Numerical tests suggest that our proposed method is faster than Powell's original method, and just as reliable. It also compares quite well with a different method proposed by Stewart (1967), at least for functions of less than ten variables. (We have few numerical results for non-quadratic functions of ten or more variables.)
ALGOL implementations of all the above algorithms are given. Most testing was done with ALGOL W (Wirth and Hoare (1966)) on IBM 360/67 and 360/91 computers. As ALGOL W is not widely used, we give ALGOL 60 procedures (Naur (1963)), except for the n-dimensional minimization algorithm. FORTRAN subroutines for the one-dimensional zero-finding and minimization algorithms are given in the Appendix.
To recapitulate, we describe algorithms, and give ALGOL procedures, for solving the following problems efficiently, using only function (not derivative) evaluations:
1.Finding a zero of a function of one variable if an interval in which the function changes sign is given;
2.Finding a local minimum of a function of one variable, defined on a given interval;
3.Finding, to within a prescribed tolerance, the global minimum of a function of one or more variables, given upper bounds on the second derivatives;
4.Finding a local minimum of a function of several variables.
For the first three algorithms, rigorous bounds on the error and the number of function evaluations required are established, taking the effect of rounding errors into account. Some results concerning the order of convergence of the first two algorithms, and preliminary results on interpolation and divided differences, are also of interest.

Section 2

SUMMARY

In this section we summarize the main results of the following chapters. A more detailed discussion is given at the appropriate places in each chapter. This summary is intended to serve as a guide to the reader who is interested in some of our results, but not in others. To assist such a reader, an attempt has been made to keep each chapter as self-contained as possible.

Chapter 2

In Chapter 2 we collect some results on Taylor series, Lagrange interpolation, and divided differences. Most of these results are needed in Chapter 3, and the casual reader might prefer to skip Chapter 2 and refer back to it when necessary. Some of the results are similar to classical ones, but instead of assuming that f has n + 1 continuous derivatives, we only assume that f(n) is Lipschitz continuous, and the term f(n + 1) (Ο) in the classical results is replaced by a number which is bounded in absolute value by a Lipschitz constant. For example, Lemmas 2.3.1, 2.3.2, 2.4.1, and 2.5.1 are of this nature. Since a Lipschitz continuous function is differentiable almost everywhere, these results are not surprising, although they have not been found in the literature, except where references are given. (Sometimes Lipschitz conditions are imposed on the derivatives of functions of several variables: see, for example, Armijo (1966) and McCormick (1969).) The proofs are mostly similar to those for the classical results.
Theorem 2.6.1 is a slight generalization of some results of Ralston (1963, 1965) on differentiating the error in Lagrange interpolation. It is included both for its independent interest, and because it may be used to prove a slightly weaker form of Lemma 3.6.1 for the important case q = 2. (A proof along these lines is sketched in Kowalik and Osborne (1968).)
An interesting result of Chapter 2 is Theorem 2.6.2, which gives an expression for the derivative of the error in Lagrange interpolation at the points of interpolation. It is well ...

Table of contents