eBook - ePub

Nonlinear Algebra in an ACORN

Name: Nonlinear Algebra in an ACORN
Author: Martin J Lee, Ken Tsang

With Applications to Deep Learning

Martin J Lee, Ken Tsang

Share book

92 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Nonlinear Algebra in an ACORN

With Applications to Deep Learning

Martin J Lee, Ken Tsang

Book details

Book preview

Table of contents

Citations

About This Book

-->

A simple algorithm for solving a set of nonlinear equations by matrix algebra has been discovered recently — first by transforming them into an equivalent matrix equation and then finding the solution analytically in terms of the inverse matrix of this equation. With this newly developed ACORN (Adaptive Constrained Optimal Robust Nonlinear) algorithm, it is possible to minimize the objective function [constructed from the functions in the nonlinear set of equations] without computing its derivatives.

This book will present the details of ACORN algorithm and how it is used to solve large scale nonlinear equations with an innovative approach ACORN Magic [minimization algorithms gathered in a cloud].

The ultimate motivation of this work is its application to optimization. In recent years, with the advances in big-data, optimization becomes an even more powerful tool in knowledge discovery. ACORN Magic is the perfect choice in this kind of application because of that fact that it is fast, robust and simple enough to be embedded in any type of machine learning program.

--> Sample Chapter(s)
Foreword
Chapter 1: An Overview of Optimization --> Contents:

An Overview of Optimization
The ACORN Approach to Optimization
Application to Pedagogical Examples
A Data-Modeling Example in Accelerator Physics
Applications to Machine Learning and Neural Networks
Appendices:
- Code for Plotting the Basin of Attraction (BOA) in Octave
- ACORN Code Implementation in Octave
- Details for Optimizing the Objective Function in NN8
- Details for Optimizing the Objective Function in NN481

--> -->
Readership: Students and researchers of optimization in solving nonlinear equations in scientific and engineering problems.
-->Optimization;ACORN;Nonlinear Equations;Neural Networks00

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Nonlinear Algebra in an ACORN an online PDF/ePUB?

Yes, you can access Nonlinear Algebra in an ACORN by Martin J Lee, Ken Tsang in PDF and/or ePUB format, as well as other popular books in Matematica & Ottimizzazione. We have over one million books available in our catalogue for you to explore.

Information

Publisher

WSPC

Year

2018

ISBN

9789813271531

Topic

Matematica

Subtopic

Ottimizzazione

Chapter 1

An Overview of Optimization

1.1Two Simple Optimization Schemes

A simple and intuitive technique to find the minimum of a function is the “gradient descent” method. It is simple because it is an iterative optimization algorithm involving just the first derivative. To find a local minimum of an objective function, one starts with an initial guess (point) and takes successive steps with step-size proportional to the negative of the gradient of the objective function at its current position (point). In other words, user initializes the procedure with a guess and takes a small step to slide down along the local slope to the next point where the objective function should be smaller if the step taken is sufficiently small. Repeat the procedure until the local minimum is reached, or stop when the maximum number of steps is exceeded.

In practice, there is one potential flaw with the gradient descent rule, because we should take smaller steps down the gradient when the gradient is large, so as not to miss the minimum. On the other hand, we would like to take larger steps at locations where the gradient is small so that the algorithm converges to a minimum faster. However, with a fixed constant of proportionality in the gradient descent rule, we do just the opposite of what we should.

Another issue is that the curvature of the objective function may not be the same in all directions in an N-dimensional problem with N > 1. For example, if there is a long and narrow valley in the N-dimensional surface that serves as a graphical visualization of the objective function, the component of the gradient in the direction that points along the base of the valley is small while the component along the valley wall is quite large. This results in a path of the successive iteration points more in the downward direction of the wall even though we should move a longer way along the base and just a small bit along the wall, in order to reach the minimum.

In the “gradient descent” method we assume the objective function is differentiable, we use its first derivatives to guide us to the optimal point.

The second common iterative numerical method to optimize an objective function is to apply Newton’s method to the first derivatives of a twice-differentiable objective function, in order to find the roots of the first derivative. Newton’s method, which can be found in any elementary collection of numerical recipes, is a popular root-finder for solving nonlinear equation. In this algorithm, the second derivative of the objective function, i.e. the curvature, is needed. The main advantage of this technique is its rapid convergence. However, the rate of convergence is sensitive to the initial guess, and sometimes may miss the local minimum entirely.

1.2The Levenberg-Marquardt Optimization Algorithm

Levenberg [Levenberg (1944)] observed that the convergence rates of the simple gradient descent method and the Newton iterations are somewhat complementary to each other. To take advantage of both, he proposed an algorithm, whose update rule is a blend of the two with an adjustable relative weighting factor favoring one of the algorithms depending on the situation. In effect, the Levenberg rule follows closely with Newton’s method if the objective function is reduced after such an update. If the objective function goes up, then the weighting factor is changed so that the step-size is determined mostly by the gradient descent.

However, in Levenberg’s original algorithm, the problem due to a narrow and gentle valley with steep walls in the N-dimensional surface of the objective function was still not yet addressed. It was due to Marquardt’s insight [Marquardt (1963)] that the relative weighting factor was later modified by inclusion of the curvatures of the objective function (i.e. the diagonal terms of the Hessian matrix) so that the gradient descent step-size is reduced along the steep valley wall and increased along the narrow and gentle valley basin, since the Hessian matrix was already calculated and used in the Newtonian part of the algorithm anyway.

From that point on, the Levenberg-Marquardt (LM) minimization algorithm becomes an industrial standard. The LM method is by no means optimal and is somewhat heuristic, but it works extremely well in practice. For moderately sized models (of a few hundred parameters), the LM method is much faster than the gradient descent method alone. The LM algorithm is more robust than Newton’s method, which means that in many cases it finds a solution even if it starts so far off the final minimum that Newton’s method failed.

However, there are a number of drawbacks associated this kind of algorithms which makes this popular method neither satisfying nor reliable. A few of them can be listed here:

(1)Computation of first and second derivatives is required, so that it is time consuming, especially in high dimension; and the derivatives may not exist;

(2)Matrix inversion is needed as part of the update, so that the cost of update becomes prohibitive after the model size increases to a few thousand parameters;

(3)At most one local minimum can be located each time, no exhaustive search of all the minima and hence no knowledge of the global minimum is provided;

(4)In cases when there are multi-minima within the domain of study, gradient descent in general and Levenberg-Marquardt in particular would likely be trapped in a local minimum and not able to locate the global minimum.

1.3General Characteristics of Optimization Algorithms

Starting from these two archaic approaches to optimization, a plethora of algorithms have been developed in the last fifty years for optimization of smooth objective functions under various constraints. We refer serious readers to an extensive text Numerical Optimization by Nocedal and Wright [Nocedal and Wright (2006)] for this purpose.

At this point we would like to take note on the general characteristics of any numerical optimization algorithm in N-dimensional space, because this will help our readers to achieve a deeper level of understanding in the chapters below. Both the optimization and root-finding are mathematical problems that share many features in common, as illustrated in the discussions above how Levenberg and Marquardt were motivated to come up with the LM algorithm.

Solving the optimization or root-finding problem requires the identification of a particular object (i.e. point) in a space of possibilities (i.e. the N-dimensional space of interest) that satisfies certain requirements. In the former it is to search for the object that associates with a minimum in the objective function, and in the latter with a zero value in the objective function. This space that we are searching, can be, and usually is, enormous due to its high dimensionality.

Consequently, the optimization or root-finding problems that are to be solved this way can be seen as search problems. Any numerical algorithm to serve this goal is just a recipe of searching routine that hopefully will lead us to the object we are after in this vast searching space.

This kind of searching routine starts from an initial point in the searching space, usually supplied by the user using an educated guess. From that point on, the searching routine will take control of its own destiny by entering into an iterative loop that generates successive answers to the problem at hand. Each answer is supposedly a bit better than the one from previous iteration. The set of rules used to move from current iteration to the next distinguishes one algorithm from another. The process ends when either the maximum number of iteration is reached, or the absolute difference between the last two successive answers is so small that they are regarded as practically the same. If the latter case is true, the searching routine is converging to a final answer for our problem. Otherwise, the searching routine is not converging within the number of iterations specified by the user.

When the searching routine fails to converge to an answer, we can either increase the maximum number of iterations and try again, or give up and start all over with a different algorithm. The reality is there is no guarantee that any of the optimization algorithm presented in literature will definitely lead to the solution of our specific problem. The arguments that motivate an algorithm are usually heuristic and cannot be confused with a mathematical proof of convergence.

Most optimization algorithms require the specification of certain parameters that control the rate of convergence, besides the initial guess. In the case of the LM algorithm, it is the parameter that controls the switching between the Newton’s method and gradient descent method. To assure good convergence, the control-parameter has to be tuned from time to time, depending on the problem of interest.

Regardless all the detail inner working of an algorithm, a user ultimately judges the usefulness of an algorithm by the following properties, as suggested by Nocedal and Wright [Nocedal and Wright (2006)]:

(1)Robustness: It should converge to a solution on a variety of problems, for all reasonable values of initial guesses.

(2)Efficiency: It should not require excessive computational time (i.e. algorithm should converge within a reasonable number of iterations) or storage.

(3)Accuracy: It should not be overly sensitive to the arithmetic rounding errors that occur when the algorithm is implemented on a computer.

1.4Optimization Algorithms and Dynamical Systems

From the previous discussion above, it becomes obvious that an optimization algorithm has a strong resemblance to a dynamical system. Mathematically, a dynamical system is defined as a system with a mapping that governs the time evolution of a point in the N-dimensional space, which is referred to as the state space because it represents the totality of all possible states of the dynamical system. Often, the mapping is deterministic, that is, given a fixed time-step, only one possible future state follows from the current state.

A physical example of a continuous dynamical system is the orbital motion of Earth around the Sun. At any instance in time the Earth moves from its previous location to a new location ...