Advances in Financial Machine Learning
📖 eBook - ePub

Advances in Financial Machine Learning

Marcos Lopez de Prado

Share book
ePUB (mobile friendly)
Available on iOS & Android
📖 eBook - ePub

Advances in Financial Machine Learning

Marcos Lopez de Prado

Book details
Book preview
Table of contents

About This Book

Learn to understand and implement the latest machine learning innovations to improve your investment performance

Machine learning (ML) is changing virtually every aspect of our lives. Today, ML algorithms accomplish tasks that – until recently – only expert humans could perform. And finance is ripe for disruptive innovations that will transform how the following generations understand money and invest.

In the book, readers will learn how to:

  • Structure big data in a way that is amenable to ML algorithms
  • Conduct research with ML algorithms on big data
  • Use supercomputing methods and back test their discoveries while avoiding false positives

Advances in Financial Machine Learning addresses real life problems faced by practitioners every day, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their individual setting.

Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance.

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.



Data Analysis

  1. Chapter 2 Financial Data Structures
  2. Chapter 3 Labeling
  3. Chapter 4 Sample Weights
  4. Chapter 5 Fractionally Differentiated Features

Financial Data Structures


In this chapter we will learn how to work with unstructured financial data, and from that to derive a structured dataset amenable to ML algorithms. In general, you do not want to consume someone else’s processed dataset, as the likely outcome will be that you discover what someone else already knows or will figure out soon. Ideally your starting point is a collection of unstructured, raw data that you are going to process in a way that will lead to informative features.


Financial data comes in many shapes and forms. Table 2.1 shows the four essential types of financial data, ordered from left to right in terms of increasing diversity. Next, we will discuss their different natures and applications.
TABLE 2.1 The Four Essential Types of Financial Data
Fundamental Data Market Data Analytics Alternative Data
  • Assets
  • Liabilities
  • Sales
  • Costs/earnings
  • Macro variables
  • . . .
  • Price/yield/implied volatility
  • Volume
  • Dividend/coupons
  • Open interest
  • Quotes/cancellations
  • Aggressor side
  • . . .
  • Analyst recommendations
  • Credit ratings
  • Earnings expectations
  • News sentiment
  • . . .
  • Satellite/CCTV images
  • Google searches
  • Twitter/chats
  • Metadata
  • . . .

2.2.1 Fundamental Data

Fundamental data encompasses information that can be found in regulatory filings and business analytics. It is mostly accounting data, reported quarterly. A particular aspect of this data is that it is reported with a lapse. You must confirm exactly when each data point was released, so that your analysis uses that information only after it was publicly available. A common beginner’s error is to assume that this data was published at the end of the reporting period. That is never the case.
For example, fundamental data published by Bloomberg is indexed by the last date included in the report, which precedes the date of the release (often by 1.5 months). In other words, Bloomberg is assigning those values to a date when they were not known. You could not believe how many papers are published every year using misaligned fundamental data, especially in the factor-investing literature. Once you align the data correctly, a substantial number of findings in those papers cannot be reproduced.
A second aspect of fundamental data is that it is often backfilled or reinstated. “Backfilling” means that missing data is assigned a value, even if those values were unknown at that time. A “reinstated value” is a corrected value that amends an incorrect initial release. A company may issue multiple corrections for a past quarter’s results long after the first publication, and data vendors may overwrite the initial values with their corrections. The problem is, the corrected values were not known on that first release date. Some data vendors circumvent this problem by storing multiple release dates and values for each variable. For example, we typically have three values for a single quarterly GDP release: the original released value and two monthly revisions. Still, it is very common to find studies that use the final released value and assign it to the time of the first release, or even to the last day in the reporting period. We will revisit this mistake, and its implications, when we discuss backtesting errors in Chapter 11.
Fundamental data is extremely regularized and low freque...

Table of contents

Citation styles for Advances in Financial Machine LearningHow to cite Advances in Financial Machine Learning for your reference list or bibliography: select your referencing style from the list below and hit 'copy' to generate a citation. If your style isn't in the list, you can start a free trial to access over 20 additional styles from the Perlego eReader.
APA 6 Citation
Prado, M. L. (2018). Advances in Financial Machine Learning (1st ed.). Wiley. Retrieved from (Original work published 2018)
Chicago Citation
Prado, Marcos Lopez. (2018) 2018. Advances in Financial Machine Learning. 1st ed. Wiley.
Harvard Citation
Prado, M. L. (2018) Advances in Financial Machine Learning. 1st edn. Wiley. Available at: (Accessed: 14 October 2022).
MLA 7 Citation
Prado, Marcos Lopez. Advances in Financial Machine Learning. 1st ed. Wiley, 2018. Web. 14 Oct. 2022.