Originally published in 1996 as a special issue journal, Artificial Intelligence Applications on Wall Street, presents a series of articles derived from papers at the Third International Conference on Artificial Intelligence Applications on Wall Street. The volume addresses how Artificial Intelligence can be used to address the variety of issues in that arise in the world of investments, such as synthetic instruments, forecasting and surveillance. It examines the potential problems surrounding economic assumption of rationality in a global market, and how artificial intelligence can push the bounds of rationality.
Information Systems Department Stern School of Business, New York University, New York, USA
In the analysis and prediction of real-world systems, two of the key problems are nonstationarity (often in the form of switching between regimes) and overfitting (particularly serious for noisy processes). This article addresses these problems using gated experts, consisting of a (nonlinear) gating network and several (also nonlinear) competing experts. Each expert learns to predict the conditional mean, and each expert adapts its width to match the noise level in its regime. The gating network learns to predict the probability of each expert given the input. This article focuses on the case where the gating network bases its decision on information from the inputs. This can be contrasted to hidden Markov models where the decision is based on the previous state(s) (i.e., on the output of the gating network at the previous time step), as well as to averaging over several predictors. In contrast gated experts soft-partition the input space. This article discusses the underlying statistical assumptions, derives the weight update rules, and compares the performance of gated experts to standard methods on three time series: (1) a computer-generated series, obtained by randomly switching between two nonlinear processes, (2) a time series from the Santa Fe Time Series Competition (the light intensity of a laser in chaotic state), and (3) the daily electricity demand of France, a real-world multivariate problem with structure on several timescales. The main results are (1) the gating network correctly discovers the different regimes of the process, (2) the widths associated with each expert are important for the segmentation task (and they can be used to characterize the subprocesses), and (3) there is less overfttting compared to single networks (homogeneous multilayer perceptrons), since the experts learn to match their variances to the (local) noise levels. This can be viewed as matching the local complexity of the model to the local complexity o f the data.
INTRODUCTION
Different Regimes with Different Noise Levels: The Need for Gated Experts
Conventional time series models are global models. They can be linear, assuming that the next value is a linear superposition of preceding values (Yule, 1927), or they can be nonlinear, conveniently described in the quite general language of neural networks with hidden units (Rumelhart et al., 1986; Lapedes & Farber, 1987). Such single, global, and traditionally univariate models are well suited to problems with stationary dynamics.
However, the assumption of stationarity is violated in many real-world time series. An important subclass of nonstationarity is piecewise stationarity (also called stationarity by parts, and multistationarity), where the series switches between different regimes. For example, regimes of electricity demand depend on the seasons, and regimes of financial forecasts depend on the economy (e.g., expansion and contraction, also called growth and recession) (Granger, 1994; Diebold & Rudebusch, 1996). Although a single global model can, in principle, emulate any function, including regime switching, it is often very hard to extract such an unstructured, global model from the data. In particular, trying to learn regimes with different noise levels by a single network is a mismatch, since the network will extract features that do not generalize well in some regime (local overfitting) before it has learned all it potentially could in another regime (local underfitting). A final motivation for different experts in different regions is that they can individually focus on that subset of input variables relevant for their specific region. This turns out to be particularly advantageous in modeling multivariate problems where different variables are important in different regimes.
Addressing these problems, we present a class of models for time series prediction that we call gated experts. They were introduced into the connectionist community as the mixture of experts (Jacobs et al., 1991) and are also called society of experts (Rumelhart et al., 1995). We use the term gated experts for nonlinearly gated nonlinear experts. The input space can be split nonlinearly through the hidden units of the gating network, and the subprocesses can be nonlinear through the hidden units of the expert networks.
The basic idea behind gated experts is simple: rather than using a single global model, we learn several local models (the experts) from the data. Simultaneously, we learn to split the input space. The problem is that the splitting of the input space is unknown because the only information available is the next value of the series. This requires blending supervised and unsupervised learning: the supervised component learns to predict the (observed) next value, and the unsupervised component discovers the (hidden) regimes. Since the only observable is the combination of the gate and the experts, many different ways of splitting the input space and fitting local models are possible. This trade-off between flexibility in the gates and flexibility in the experts is an important degree of freedom in this model class.
Summarizing, the key elements of gated experts are as follows:
nonlinear gate and experts
soft-partitioning the input space
adaptive noise levels (variances) of the experts.
In contrast to related work (e.g., Hamilton, 1994; Jordan & Jacobs, 1994) we allow the noise-level parameter associated with each individual expert to adapt separately to the data. In our experience, expert-specific variances are important for two reasons: to facilitate the segmentation (areas of different predictability are grouped together), and to prevent overfitting (different regimes are approximated with different accuracy). This is a new approach to the problem of overfitting.
Related Work
Gated experts have a solid statistical basis. This can be compared to prior connectionist work addressing segmentation of temporal sequences. Elman (1990) uses the size of the errors, and Doutriaux and Zipser (1990) use large changes in the activations of the hidden units to indicate segmentation. Levin (1991) adds a set of auxiliary input units that encode a (discrete) state, set to fixed values in training (supervised) and estimated in testing. In this architecture the single network has the difficult task of learning two potentially quite different mappings across the same set of units. Gated experts can also be compared and contrasted to connectionist architectures with local basis functions: whereas the architecture of radial basis functions (Broomhead & Lowe, 1988; Casdagli, 1989; Poggio & Girosi, 1990) does split up the input space into local regions (as opposed to global sigmoids), there is no incentive in the learning algorithm to find regions defined by similar structure, noise level, or dynamics.
In the time series community, the idea of splitting an input space into subspaces is not new. One of the first examples is the threshold autoregressive (TAR) model (Tong & Lim, 1980). In contrast to gated experts, the splits there are very simple and ad hoc; there is no probabilistic interpretation. TAR models still are quite popular in economics and econometrics. Typically, a cut in one of the inpu...
Table of contents
Cover
Half Title
Title Page
Copyright Page
Table of Contents
ACKNOWLEDGMENT
INTRODUCTION TO THE SPECIAL ISSUE
ASSESSING ALTERNATIVE TECHNOLOGIES FOR THE COST-EFFECTIVE COMPUTATION OF DERIVATIVES
DESIGNING FINANCIAL SWAPS WITH CLP(R)
EMBEDDING TECHNICAL ANALYSIS INTO NEURAL NETWORK BASED TRADING SYSTEMS
FINANCIAL FORECASTING USING GENETIC ALGORITHMS
FORECASTING FOREIGN EXCHANGE RATES USING RECURRENT NEURAL NETWORKS
TIME SERIES ANALYSIS AND PREDICTION USING GATED EXPERTS WITH APPLICATION TO ENERGY DEMAND FORECASTS
ALCOD IDSS; ASSISTING THE AUSTRALIAN STOCK MARKET SURVEILLANCE TEAM'S REVIEW PROCESS
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Artificial Intelligence Applications on Wall Street by Stephen Slade in PDF and/or ePUB format, as well as other popular books in Business & Business General. We have over one million books available in our catalogue for you to explore.