Advances in Financial Machine Learning
eBook - ePub

Advances in Financial Machine Learning

Marcos Lopez de Prado

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Advances in Financial Machine Learning

Marcos Lopez de Prado

Book details
Book preview
Table of contents
Citations

About This Book

Learn to understand and implement the latest machine learning innovations to improve your investment performance

Machine learning (ML) is changing virtually every aspect of our lives. Today, ML algorithms accomplish tasks that ā€“ until recently ā€“ only expert humans could perform. And finance is ripe for disruptive innovations that will transform how the following generations understand money and invest.

In the book, readers will learn how to:

  • Structure big data in a way that is amenable to ML algorithms
  • Conduct research with ML algorithms on big data
  • Use supercomputing methods and back test their discoveries while avoiding false positives

Advances in Financial Machine Learning addresses real life problems faced by practitioners every day, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their individual setting.

Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Advances in Financial Machine Learning an online PDF/ePUB?
Yes, you can access Advances in Financial Machine Learning by Marcos Lopez de Prado in PDF and/or ePUB format, as well as other popular books in Business & Investments & Securities. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2018
ISBN
9781119482109

PART 1
Data Analysis

  1. Chapter 2 Financial Data Structures
  2. Chapter 3 Labeling
  3. Chapter 4 Sample Weights
  4. Chapter 5 Fractionally Differentiated Features

CHAPTER 2
Financial Data Structures

2.1 MOTIVATION

In this chapter we will learn how to work with unstructured financial data, and from that to derive a structured dataset amenable to ML algorithms. In general, you do not want to consume someone elseā€™s processed dataset, as the likely outcome will be that you discover what someone else already knows or will figure out soon. Ideally your starting point is a collection of unstructured, raw data that you are going to process in a way that will lead to informative features.

2.2 ESSENTIAL TYPES OF FINANCIAL DATA

Financial data comes in many shapes and forms. Table 2.1 shows the four essential types of financial data, ordered from left to right in terms of increasing diversity. Next, we will discuss their different natures and applications.
TABLE 2.1 The Four Essential Types of Financial Data
Fundamental Data Market Data Analytics Alternative Data
  • Assets
  • Liabilities
  • Sales
  • Costs/earnings
  • Macro variables
  • . . .
  • Price/yield/implied volatility
  • Volume
  • Dividend/coupons
  • Open interest
  • Quotes/cancellations
  • Aggressor side
  • . . .
  • Analyst recommendations
  • Credit ratings
  • Earnings expectations
  • News sentiment
  • . . .
  • Satellite/CCTV images
  • Google searches
  • Twitter/chats
  • Metadata
  • . . .

2.2.1 Fundamental Data

Fundamental data encompasses information that can be found in regulatory filings and business analytics. It is mostly accounting data, reported quarterly. A particular aspect of this data is that it is reported with a lapse. You must confirm exactly when each data point was released, so that your analysis uses that information only after it was publicly available. A common beginnerā€™s error is to assume that this data was published at the end of the reporting period. That is never the case.
For example, fundamental data published by Bloomberg is indexed by the last date included in the report, which precedes the date of the release (often by 1.5 months). In other words, Bloomberg is assigning those values to a date when they were not known. You could not believe how many papers are published every year using misaligned fundamental data, especially in the factor-investing literature. Once you align the data correctly, a substantial number of findings in those papers cannot be reproduced.
A second aspect of fundamental data is that it is often backfilled or reinstated. ā€œBackfillingā€ means that missing data is assigned a value, even if those values were unknown at that time. A ā€œreinstated valueā€ is a corrected value that amends an incorrect initial release. A company may issue multiple corrections for a past quarterā€™s results long after the first publication, and data vendors may overwrite the initial values with their corrections. The problem is, the corrected values were not known on that first release date. Some data vendors circumvent this problem by storing multiple release dates and values for each variable. For example, we typically have three values for a single quarterly GDP release: the original released value and two monthly revisions. Still, it is very common to find studies that use the final released value and assign it to the time of the first release, or even to the last day in the reporting period. We will revisit this mistake, and its implications, when we discuss backtesting errors in Chapter 11.
Fundamental data is extremely regularized and low freque...

Table of contents