eBook - ePub

Big Data and Machine Learning in Quantitative Investment

Name: Big Data and Machine Learning in Quantitative Investment
ISBN: 9781119522218

Tony Guida,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Big Data and Machine Learning in Quantitative Investment

Tony Guida,

About this book

Get to know the 'why' and 'how' of machine learning and big data in quantitative investment

Big Data and Machine Learning in Quantitative Investment is not just about demonstrating the maths or the coding. Instead, it's a book by practitioners for practitioners, covering the questions of why and how of applying machine learning and big data to quantitative finance.

The book is split into 13 chapters, each of which is written by a different author on a specific case. The chapters are ordered according to the level of complexity; beginning with the big picture and taxonomy, moving onto practical applications of machine learning and finally finishing with innovative approaches using deep learning.

• Gain a solid reason to use machine learning

• Frame your question using financial markets laws

• Know your data • Understand how machine learning is becoming ever more sophisticated

Machine learning and big data are not a magical solution, but appropriately applied, they are extremely effective tools for quantitative investment — and this book shows you how.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Big Data and Machine Learning in Quantitative Investment by Tony Guida in PDF and/or ePUB format, as well as other popular books in Business & Finance. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Business

Subtopic

Finance

Index

Business

CHAPTER 1
Do Algorithms Dream About Artificial Alphas?

Michael Kollo

1.1 INTRODUCTION

The core of most financial practice, whether drawn from equilibrium economics, behavioural psychology, or agency models, is traditionally formed through the marriage of elegant theory and a kind of ‘dirty’ empirical proof. As I learnt from my years on the PhD programme at the London School of Economics, elegant theory is the hallmark of a beautiful intellect, one that could discern the subtle tradeoffs in agent‐based models, form complex equilibrium structures and point to the sometimes conflicting paradoxes at the heart of conventional truths. Yet ‘dirty’ empirical work is often scoffed at with suspicion, but reluctantly acknowledged as necessary to give substance and real‐world application. I recall many conversations in the windy courtyards and narrow passageways, with brilliant PhD students wrangling over questions of ‘but how can I find a test for my hypothesis?’.

Many pseudo‐mathematical frameworks have come and gone in quantitative finance, usually borrowed from nearby sciences: thermodynamics from physics, Eto's Lemma, information theory, network theory, assorted parts from number theory, and occasionally from less high‐tech but reluctantly acknowledged social sciences like psychology. They have come, and they have gone, absorbed (not defeated) by the markets.

Machine learning, and extreme pattern recognition, offer a strong focus on large‐scale empirical data, transformed and analyzed at such scale as never seen before for details of patterns that lay undetectable to previous inspection. Interestingly, machine learning offers very little in conceptual framework. In some circles, it boasts that the absence of a conceptual framework is its strength and removes the human bias that would otherwise limit a model. Whether you feel it is a good tool or not, you have to respect the notion that process speed is only getting faster and more powerful. We may call it neural networks or something else tomorrow, and we will eventually reach a point where most if not all permutations of patterns can be discovered and examined in close to real time, at which point the focus will be almost exclusively on defining the objective function rather than the structure of the framework.

The rest of this chapter is a set of observations and examples of how machine learning could help us learn more about financial markets, and is doing so. It is drawn not only from my experience, but from many conversations with academics, practitioners, computer scientists, and from volumes of books, articles, podcasts and the vast sea of intellect that is now engaged in these topics.

It is an incredible time to be intellectually curious and quantitatively minded, and we at best can be effective conduits for the future generations to think about these problems in a considered and scientific manner, even as they wield these monolithic technological tools.

1.2 REPLICATION OR REINVENTION

The quantification of the world is again a fascination of humanity. Quantification here is the idea that we can break down patterns that we observe as humans into component parts and replicate them over much larger observations, and in a much faster way. The foundations of quantitative finance found their roots in investment principles, or observations, made by generations and generations of astute investors, who recognized these ideas without the help of large‐scale data.

The early ideas of factor investing and quantitative finance were replications of these insights; they did not themselves invent investment principles. The ideas of value investing (component valuation of assets and companies) are concepts that have been studied and understood for many generations. Quantitative finance took these ideas, broke them down, took the observable and scalable elements and spread them across a large number of (comparable) companies.

The cost to achieving scale is still the complexity in and nuance about how to apply a specific investment insight to a specific company, but these nuances were assumed to diversify away in a larger‐scale portfolio, and were and are still largely overlooked.¹ The relationship between investment insights and future returns were replicated as linear relationships between exposure and returns, with little attention to non‐linear dynamics or complexities, but instead, focusing on diversification and large‐scale application which were regarded as better outcomes for modern portfolios.

There was, however, a subtle recognition of co‐movement and correlation that emerged from the early factor work, and it is now at the core of modern risk management techniques. The idea is that stocks that have common characteristics (let's call it a quantified investment insight) have also correlation and co‐dependence potentially on macro‐style factors.

This small observation, in my opinion, is actually a reinvention of the investment world which up until then, and in many circles still, thought about stocks in isolation, valuing and appraising them as if they were standalone private equity investments. It was a reinvention because it moved the object of focus from an individual stock to a common ‘thread’ or factor that linked many stocks that individually had no direct business relationship, but still had a similar characteristic that could mean that they would be bought and sold together. The ‘factor’ link became the objective of the investment process, and its identification and improvement became the objective of many investment processes – now (in the later 2010s) it is seeing another renaissance of interest. Importantly, we began to see the world as a series of factors, some transient, some long‐standing, some short‐ and some long‐term forecasting, some providing risk and to be removed, and some providing risky returns.

Factors represented the invisible (but detectable) threads that wove the tapestry of global financial markets. While we (quantitative researchers) searched to discover and understand these threads, much of the world focused on the visible world of companies, products and periodic earnings. We painted the world as a network, where connections and nodes were the most important, while others painted it as a series of investment ideas and events.

The reinvention was in a shift in the object of interest, from individual stocks to a series of network relationships, and their ebb and flow through time. It was subtle, as it was severe, and is probably still not fully understood.² Good factor timing models are rare, and there is an active debate about how to think about timing at all. Contextual factor models are even more rare and pose especially interesting areas for empirical and theoretical work.

1.3 REINVENTION WITH MACHINE LEARNING

Reinvention with machine learning poses a similar opportunity for us to reinvent the way we think about the financial markets, I think in both the identification of the investment object and the way we think of the financial networks.

Allow me a simple analogy as a thought exercise. In handwriting or facial recognition, we as humans look for certain patterns to help us understand the world. On a conscious, perceptive level, we look to see patterns in the face of a person, in their nose, their eyes and their mouth. In this example, the objects of perception are those units, and we appraise their similarity to others that we know. Our pattern recognition then functions on a fairly low dimension in terms of components. We have broken down the problem into a finite set of grouped information (in this case, the features of the face), and we appraise those categories. In modern machine learning techniques, the face or a handwritten number is broken down into much smaller and therefore more numerous components. In the case of a handwritten number, for example, the pixels of the picture are converted to numeric representations, and the patterns in the pixels are sought using a deep learning algorithm.

We have incredible tools to take large‐scale data and to look for patterns in the sub‐atomic level of our sample. In the case of human faces or numbers, and many other things, we can find these patterns through complex patterns that are no longer intuitive or understandable by us (consciously); they do not identify a nose, or an eye, but look for patterns in deep folds of the information.³ Sometimes the tools can be much more efficient and find patterns better, quicker than us, without our intuition being able to keep up.

Taking this analogy to finance, much of asset management concerns itself with financial (fundamental) data, like income statements, balance sheets, and earnings. These items effectively characterize a company, in the same way the major patterns of a face may characterize a person. If we take these items, we may have a few hundred, and use them in a large‐scale algorithm like machine learning, we may find that we are already constraining ourselves heavily before we have begun.

The ‘magic’ of neural networks comes in their ability to recognize patterns in atomic (e.g. pixel‐level) information, and by feeding them higher constructs, we may already be constraining their ability to find new patterns, that is, patterns beyond those already identified by us in linear frameworks. Reinvention lies in our ability to find new constructs and more ‘atomic’ representations of investments to allow these algorithms to better find patterns. This may mean moving away from the reported quarterly or annual financial accounts, perhaps using higher‐frequency indicators of sales and revenue (relying on alternate data sources), as a way to find higher frequency and, potentially, more connected patterns with which to forecast price movements.

Reinvention through machine learning may also mean turning our attention to modelling financial markets as a complex (or just expansive) network, where the dimensionality of the problem is potentially explosively high and prohibitive for our minds to work with. To estimate a single dimension of a network is to effectively estimate a covariance matrix of n × n. Once we make this system endogenous, many of the links within the 2D matrix become a function of other links, in which case the model is recursive, and iterative. And this is only in two dimensions. Modelling the financial markets like a neural network has been attempted with limited application, and more recently the idea of supply chains is gaining popularity as a way of detecting the fine strands between companies. Alternate data may well open up new explicitly observable links between companies, in terms of their business dealings, that can form the basis of a network, but it's more likely that prices will move too fast, and too much, to be simply determined by average supply contracts.

1.4 A MATTER OF TRUST

The reality is that patterns that escape our human attention will be either too subtle, or too numerous, or too fast in the data. Our inability to identify with them in an intuitive way, or to construct stories around them, will naturally cause us to mistrust them. Some patterns in the data will be not useful for investment (e.g. noise, illiquid, and/or uninvestable), so these will quickly end up on the ‘cutting room floor’.⁴ But many others will be robust, and useful, but entirely unintuitive, and perhaps obfuscated to us. Our natural reaction will be to question ourselves, and if we are to use them, ensure that they are part of a very large cohort of signals, so as to diversify questions about a particular signal in isolation.

So long as our clients are humans as well, we will face communication challenges, especially during times of weak performance. When performance is strong, opaque investment processes are less questioned, and complexity can even be considered a positive, differentiating characteristic. However, on most occasions, an opaque investment process that underperforms is quickly mistrusted. In many examples of modern investment history, the ‘quants’ struggled to explain their models in poor performance periods and were quickly abandoned by investors. The same merits of intellectual superiority bestowed upon them rapidly became weaknesses and points of ridicule.

Storytelling, the art of wrapping complexity in comfortable and familiar anecdotes and analogies, feels like a necessary cost of using technical models. However, the same can be a large barrier to innovation in finance. Investment beliefs, and our capability to generate comfortable anecdotal stories, are often there to reconfirm commonly held intuitive investment truths, which in turn are supported by ‘sensible’ patterns in data.

If innovation means moving to ‘machine patterns’ in finance, with greater complexity and dynamic characteristic...

Cover
Table of Contents
CHAPTER 1: Do Algorithms Dream About Artificial Alphas?
CHAPTER 2: Taming Big Data
CHAPTER 3: State of Machine Learning Applications in Investment Management
CHAPTER 4: Implementing Alternative Data in an Investment Process
CHAPTER 5: Using Alternative and Big Data to Trade Macro Assets
CHAPTER 6: Big Is Beautiful: How Email Receipt Data Can Help Predict Company Sales
CHAPTER 7: Ensemble Learning Applied to Quant Equity: Gradient Boosting in a Multifactor Framework
CHAPTER 8: A Social Media Analysis of Corporate Culture
CHAPTER 9: Machine Learning and Event Detection for Trading Energy Futures
CHAPTER 10: Natural Language Processing of Financial News
CHAPTER 11: Support Vector Machine‐Based Global Tactical Asset Allocation
CHAPTER 12: Reinforcement Learning in Finance
CHAPTER 13: Deep Learning in Finance: Prediction of Stock Returns with Long Short‐Term Memory Networks
Biography
End User License Agreement