Nonlinear Algebra in an ACORN
eBook - ePub

Nonlinear Algebra in an ACORN

With Applications to Deep Learning

Martin J Lee, Ken Tsang

Compartir libro
  1. 92 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Nonlinear Algebra in an ACORN

With Applications to Deep Learning

Martin J Lee, Ken Tsang

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

-->

A simple algorithm for solving a set of nonlinear equations by matrix algebra has been discovered recently — first by transforming them into an equivalent matrix equation and then finding the solution analytically in terms of the inverse matrix of this equation. With this newly developed ACORN (Adaptive Constrained Optimal Robust Nonlinear) algorithm, it is possible to minimize the objective function [constructed from the functions in the nonlinear set of equations] without computing its derivatives.

This book will present the details of ACORN algorithm and how it is used to solve large scale nonlinear equations with an innovative approach ACORN Magic [minimization algorithms gathered in a cloud].

The ultimate motivation of this work is its application to optimization. In recent years, with the advances in big-data, optimization becomes an even more powerful tool in knowledge discovery. ACORN Magic is the perfect choice in this kind of application because of that fact that it is fast, robust and simple enough to be embedded in any type of machine learning program.

--> Sample Chapter(s)
Foreword
Chapter 1: An Overview of Optimization --> Contents:

  • An Overview of Optimization
  • The ACORN Approach to Optimization
  • Application to Pedagogical Examples
  • A Data-Modeling Example in Accelerator Physics
  • Applications to Machine Learning and Neural Networks
  • Appendices:
    • Code for Plotting the Basin of Attraction (BOA) in Octave
    • ACORN Code Implementation in Octave
    • Details for Optimizing the Objective Function in NN8
    • Details for Optimizing the Objective Function in NN481

--> -->
Readership: Students and researchers of optimization in solving nonlinear equations in scientific and engineering problems.
-->Optimization;ACORN;Nonlinear Equations;Neural Networks00

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Nonlinear Algebra in an ACORN un PDF/ePUB en línea?
Sí, puedes acceder a Nonlinear Algebra in an ACORN de Martin J Lee, Ken Tsang en formato PDF o ePUB, así como a otros libros populares de Matematica y Ottimizzazione. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial
WSPC
Año
2018
ISBN
9789813271531
Categoría
Matematica
Categoría
Ottimizzazione
Chapter 1
An Overview of Optimization
1.1Two Simple Optimization Schemes
A simple and intuitive technique to find the minimum of a function is the “gradient descent” method. It is simple because it is an iterative optimization algorithm involving just the first derivative. To find a local minimum of an objective function, one starts with an initial guess (point) and takes successive steps with step-size proportional to the negative of the gradient of the objective function at its current position (point). In other words, user initializes the procedure with a guess and takes a small step to slide down along the local slope to the next point where the objective function should be smaller if the step taken is sufficiently small. Repeat the procedure until the local minimum is reached, or stop when the maximum number of steps is exceeded.
In practice, there is one potential flaw with the gradient descent rule, because we should take smaller steps down the gradient when the gradient is large, so as not to miss the minimum. On the other hand, we would like to take larger steps at locations where the gradient is small so that the algorithm converges to a minimum faster. However, with a fixed constant of proportionality in the gradient descent rule, we do just the opposite of what we should.
Another issue is that the curvature of the objective function may not be the same in all directions in an N-dimensional problem with N > 1. For example, if there is a long and narrow valley in the N-dimensional surface that serves as a graphical visualization of the objective function, the component of the gradient in the direction that points along the base of the valley is small while the component along the valley wall is quite large. This results in a path of the successive iteration points more in the downward direction of the wall even though we should move a longer way along the base and just a small bit along the wall, in order to reach the minimum.
In the “gradient descent” method we assume the objective function is differentiable, we use its first derivatives to guide us to the optimal point.
The second common iterative numerical method to optimize an objective function is to apply Newton’s method to the first derivatives of a twice-differentiable objective function, in order to find the roots of the first derivative. Newton’s method, which can be found in any elementary collection of numerical recipes, is a popular root-finder for solving nonlinear equation. In this algorithm, the second derivative of the objective function, i.e. the curvature, is needed. The main advantage of this technique is its rapid convergence. However, the rate of convergence is sensitive to the initial guess, and sometimes may miss the local minimum entirely.
1.2The Levenberg-Marquardt Optimization Algorithm
Levenberg [Levenberg (1944)] observed that the convergence rates of the simple gradient descent method and the Newton iterations are somewhat complementary to each other. To take advantage of both, he proposed an algorithm, whose update rule is a blend of the two with an adjustable relative weighting factor favoring one of the algorithms depending on the situation. In effect, the Levenberg rule follows closely with Newton’s method if the objective function is reduced after such an update. If the objective function goes up, then the weighting factor is changed so that the step-size is determined mostly by the gradient descent.
However, in Levenberg’s original algorithm, the problem due to a narrow and gentle valley with steep walls in the N-dimensional surface of the objective function was still not yet addressed. It was due to Marquardt’s insight [Marquardt (1963)] that the relative weighting factor was later modified by inclusion of the curvatures of the objective function (i.e. the diagonal terms of the Hessian matrix) so that the gradient descent step-size is reduced along the steep valley wall and increased along the narrow and gentle valley basin, since the Hessian matrix was already calculated and used in the Newtonian part of the algorithm anyway.
From that point on, the Levenberg-Marquardt (LM) minimization algorithm becomes an industrial standard. The LM method is by no means optimal and is somewhat heuristic, but it works extremely well in practice. For moderately sized models (of a few hundred parameters), the LM method is much faster than the gradient descent method alone. The LM algorithm is more robust than Newton’s method, which means that in many cases it finds a solution even if it starts so far off the final minimum that Newton’s method failed.
However, there are a number of drawbacks associated this kind of algorithms which makes this popular method neither satisfying nor reliable. A few of them can be listed here:
(1)Computation of first and second derivatives is required, so that it is time consuming, especially in high dimension; and the derivatives may not exist;
(2)Matrix inversion is needed as part of the update, so that the cost of update becomes prohibitive after the model size increases to a few thousand parameters;
(3)At most one local minimum can be located each time, no exhaustive search of all the minima and hence no knowledge of the global minimum is provided;
(4)In cases when there are multi-minima within the domain of study, gradient descent in general and Levenberg-Marquardt in particular would likely be trapped in a local minimum and not able to locate the global minimum.
1.3General Characteristics of Optimization Algorithms
Starting from these two archaic approaches to optimization, a plethora of algorithms have been developed in the last fifty years for optimization of smooth objective functions under various constraints. We refer serious readers to an extensive text Numerical Optimization by Nocedal and Wright [Nocedal and Wright (2006)] for this purpose.
At this point we would like to take note on the general characteristics of any numerical optimization algorithm in N-dimensional space, because this will help our readers to achieve a deeper level of understanding in the chapters below. Both the optimization and root-finding are mathematical problems that share many features in common, as illustrated in the discussions above how Levenberg and Marquardt were motivated to come up with the LM algorithm.
Solving the optimization or root-finding problem requires the identification of a particular object (i.e. point) in a space of possibilities (i.e. the N-dimensional space of interest) that satisfies certain requirements. In the former it is to search for the object that associates with a minimum in the objective function, and in the latter with a zero value in the objective function. This space that we are searching, can be, and usually is, enormous due to its high dimensionality.
Consequently, the optimization or root-finding problems that are to be solved this way can be seen as search problems. Any numerical algorithm to serve this goal is just a recipe of searching routine that hopefully will lead us to the object we are after in this vast searching space.
This kind of searching routine starts from an initial point in the searching space, usually supplied by the user using an educated guess. From that point on, the searching routine will take control of its own destiny by entering into an iterative loop that generates successive answers to the problem at hand. Each answer is supposedly a bit better than the one from previous iteration. The set of rules used to move from current iteration to the next distinguishes one algorithm from another. The process ends when either the maximum number of iteration is reached, or the absolute difference between the last two successive answers is so small that they are regarded as practically the same. If the latter case is true, the searching routine is converging to a final answer for our problem. Otherwise, the searching routine is not converging within the number of iterations specified by the user.
When the searching routine fails to converge to an answer, we can either increase the maximum number of iterations and try again, or give up and start all over with a different algorithm. The reality is there is no guarantee that any of the optimization algorithm presented in literature will definitely lead to the solution of our specific problem. The arguments that motivate an algorithm are usually heuristic and cannot be confused with a mathematical proof of convergence.
Most optimization algorithms require the specification of certain parameters that control the rate of convergence, besides the initial guess. In the case of the LM algorithm, it is the parameter that controls the switching between the Newton’s method and gradient descent method. To assure good convergence, the control-parameter has to be tuned from time to time, depending on the problem of interest.
Regardless all the detail inner working of an algorithm, a user ultimately judges the usefulness of an algorithm by the following properties, as suggested by Nocedal and Wright [Nocedal and Wright (2006)]:
(1)Robustness: It should converge to a solution on a variety of problems, for all reasonable values of initial guesses.
(2)Efficiency: It should not require excessive computational time (i.e. algorithm should converge within a reasonable number of iterations) or storage.
(3)Accuracy: It should not be overly sensitive to the arithmetic rounding errors that occur when the algorithm is implemented on a computer.
1.4Optimization Algorithms and Dynamical Systems
From the previous discussion above, it becomes obvious that an optimization algorithm has a strong resemblance to a dynamical system. Mathematically, a dynamical system is defined as a system with a mapping that governs the time evolution of a point in the N-dimensional space, which is referred to as the state space because it represents the totality of all possible states of the dynamical system. Often, the mapping is deterministic, that is, given a fixed time-step, only one possible future state follows from the current state.
A physical example of a continuous dynamical system is the orbital motion of Earth around the Sun. At any instance in time the Earth moves from its previous location to a new location ...

Índice