1
Introduction
What is Reproducible Finance?
Reproducible finance is a philosophy about how to do quantitative, data science-driven financial analysis. The root of this philosophy is that the data and code that lead to a decision or conclusion should be able to be understood and then replicated in an efficient way. The code itself should tell a clear story when read by a human, just as it tells a clear story when read by a computer. This book applies the reproducible philosophy to R code for portfolio management.
That reproducible philosophy will manifest itself in how we tackle problems throughout this book. More specifically, instead of looking for the most clever code or smartest algorithm, this book prioritizes readable, reusable, reproducible work flows using a variety of R packages and functions. We will frequently solve problems in different ways, writing code from different packages and using different data structures to arrive at the exact same conclusion. To repeat, we will solve the same problems in a multitude of ways with different coding paradigms.
The motivation for this masochistic approach is for the reader to become comfortable working with different coding paradigms and data structures. Our goal is to be fluent or at least conversational to the point that we can collaborate with a variety of R coders, understand their work and make our work understandable to them. It’s not enough that our work be reproducible for ourselves and other humans who possess our exact knowledge and skills. We want our work to be reproducible and reusable by a broad population of data scientists and quants.
Three Universes
This book focuses on three universes or paradigms for portfolio analysis with R. There are probably more than three fantastic paradigms but these are the three I encounter most frequently in industry.
xts
The first universe is what I call the xts world. xts is both a package and a type of object. xts stands for extensible time series. Most of our work in this book will be with time series, and indeed most financial work involves time series. An xts object is a matrix, that also, always, has a time index for the order of the data. It holds a time series, meaning it holds the observations and the times at which they occurred. An interesting feature of an xts object is that it holds dates in an index column. In fact that index column is considered column number zero, meaning it is not really a column at all. If we have an object called financial_data and wanted to access the dates, we would use index(financial_data).
Why is the date index not given its own column? Because it is impossible to have an xts object but not have a date index. If the date index were its own column, that would imply that it could be deleted or removed.
In the xts world, there are two crucial packages that we will use: quantmod and PerformanceAnalytics. quantmod is how we will access the internet and pull in pricing data. That data will arrive to us formatted as an xts object.
PerformanceAnalytics, as the name implies, has several useful functions for analyzing portfolio performance in an xts object, such as StdDev(), SharpeRatio(), SortinoRatio(), CAPM.Beta(). We will make use of this package in virtually all of the chapters.
To learn more, have a look at the documentation:
cran.r-project.org/web/packages/PerformanceAnalytics/index.html
tidyverse
The second universe is known throughout the R community as the ‘tidyverse’. The tidyverse is a collection of R packages for doing data science in a certain way. It is not specific to financial services and is not purpose built for time series analysis.
Within the tidyverse, we will make heavy use of the dplyr package for data wrangling, transformation and organizing. dplyr does not have built-in functions for our statistical calculations, but it does allow us to write our own functions or apply some other package’s functions to our data.
In this world, our data will be in a data frame, also called a tibble. Throughout this book, I will use those two interchangeably: data frame = tibble in this book.
Why is it called the tidy verse? Because it expects and wants data to be tidy, which means:
(1) each variable has its own column
(2) each observation is a row
(3) each value is a cell
Learn more here:
tidyr.tidyverse.org/
We will explore how to make data tidy versus non-tidy throughout the book.
tidyquant
The third universe is tidyquant, which includes the tidyquant, timetk and tibbletime packages. This universe takes a lot of the best features of xts, PerformanceAnalytics and the tidyverse and lets them play well together. For example, tidyquant allows us to apply a function from PerformanceAnalytics to a tidy data frame, without having to convert it to an xts object.
Learn more here:
business-science.io/r-packages.html
Those three universes will provide the structure to our code as we work through calculations. As a result, where possible, each chapter or substantive task will follow a similar pattern: solve it via xts, solve it via tidyverse, solve it via tidyquant and verify that the results are the same. In this way, we will become familiar with data in different formats and using different paradigms.
For some readers, it might become tedious to solve each of our tasks in three different ways and if you decide you are interested in just one paradigm, feel free to read just that code flow for each chapter. The code flow for each universe can stand on its own.
Data Visualization
Data visualization is where we translate numbers into shapes and colors, and it will get a lot of attention in this book. We do this work so that humans who do not wish to dig into our data and code can still derive value from what we do. This human communication is how our quiet quantitative toiling becomes a transcendent revenue generator or alpha-producing strategy, Even if we plan to implement algorithms and never share our work outside of our own firm, the ability to explain and communicate is hugely important.
To the extent that clients, customers, partners, bosses, portfolio managers and anyone else want actionable insights from us, data visualizations will most certainly be more prominent in the discussion than the nitty gritty of code, data or even statistics. I will emphasize data visualization throughout the book and implore you to spend as much or more time on data visualizations as you do on the rest of quantitative finance.
When we visualize our results, object structure will again play a a role. We will generally chart xts objects using the highcharter package and tidy objects using the ggplot2 package.
highcharter is an R package but Highcharts is a JavaScript library - the R package is a hook into the JavaScript library. Highcharts is fantastic for visualizing time series and it comes with great built-in widgets for viewing different time frames. I highly recommend it for visualizing financial time series but you do need to buy a license to use it in a commercial setting.
Learn more at:
www.highcharts.com and
cran.r-project.org/web/packages/highcharter/highcharter.pdf
ggplot2 is itself part of the tidyverse and as such it works best when data is tidy (we will cover what that word ‘tidy’ means when applied to a data object). It is one of the most popular data visualization packages in the R world.
Learn more at:
ggplot2.tidyverse.org/
Shiny Applications
Each of our chapters will conclude with the building of a Shiny application, so that by book’s end, you will have the tools to build a suite of Shiny apps and dashboards for portfolio analysis. What is Shiny?
Shiny is an R package that was created by Joe Cheng. It wraps R code into interactive web applications so R coders do not need to learn HTML, CSS or JavaScript.
Shiny applications are immeasurably useful for sharing our work with end users who might not want to read code or open an IDE. For example, a portfolio manager might want to build a portfolio and see how a dollar would have grown in that portfolio or how volatility has changed over time, but he or she does not want to see the code, data and functions used for the calculation. Or, another PM might love the work we did on Portfolio 1, and have a desire to apply that work to Portfolios 2, 3 and 4 but under different economic assumptions.
Shiny allows that PM to change input parameters on the fly, run R code under the hood for new analytic results (without knowing its R code), and build new data visualizations.
After completing this book you will be able to build several portfolio management-focused Shiny apps. You will not be an expert on the theory that underlies Shiny or its reactive framework, but you will have the practical knowledge to code functional and useful apps.
We will build the following Shiny applications:
1) Portfolio Returns
2) Portfolio Standard Deviation
3) Skewness and Kurtosis of Returns
4) Sharpe Ratio
5) CAPM Beta
6) Fama-French Factor Model
7) Asset Contribution to Portfolio Standard Deviation
8) Monte Carlo Simulation
You can see all of those applications at the Reproducible Finance website:
www.reproduciblefinance.com/shiny
The full source code for every app is also available at that site. It is not necessary to view the apps live on the internet, but doing so will make it easier to understand what the code is doing.
Packages
The following are the packages that we will be using in this book.
To install a package on your computer, run install.packag...