1
The Nature of Financial Data
In this chapter, I discuss the operation of financial markets and the nature of financial data. The emphasis is on data, especially data on returns of financial assets. Most of the data discussed are taken from various US markets in the late twentieth and early twenty-first centuries. The emphasis, however, is not on the specific data that I used, or data that I have put on the website for the book, but rather on current data that you can get from standard sources on the internet.
The objectives of the chapter are to
⢠provide a general description of financial assets, the markets and mechanisms for trading them, and the kinds of data we use to understand them;
⢠describe and illustrate graphical and computational statistical methods for exploratory analysis of financial data;
⢠display and analyze historical financial data, and identify and describe key properties of various types of financial data, such as their volatility and frequency distributions;
⢠discuss the basics of risk and return.
The statistical methods of this chapter are exploratory data analysis, or EDA. We address EDA methods in Chapter 2.
I introduce and describe a number of concepts and terms used in finance (āPEā, ābook valueā, āGAAPā, āSKEWā, āEBITDAā, ācall optionā, and so on). I will generally italicize these terms at the point of first usage, and I list most of them in the Index. Many of these terms will appear in later parts of the book.
The computations and the graphics for the examples were produced using R. There is, however, no discussion in this chapter of computer code or of internet data sources. In the appendix to this chapter beginning on page 139, I describe some basics of R and then how to use R to obtain financial data from the internet and bring it into R for analysis. In that appendix, I refer to specific datasets used in the chapter. The R code for all of the examples in this chapter is available at the website for the book.
Financial data come in a variety of forms. The data may be prices at which transactions occur, they may be quoted rates at which interest accrues, they may be reported earnings over a specified time, and so on. Financial data often are derived quantities computed from observed and/or reported quantities, such as the ratio of the price of a share of stock and the share of annual earnings represented by a share of stock.
Financial data are associated either with specific points in time or with specific intervals of time. The data themselves may thus be prices at specified times in some specified currency, they may be some kind of average of many prices at a specified time (such as an index), they may be percentages (such as interest rates at specified times or rates of change of prices during specified time intervals), they may be nonnegative integers (such as the number of shares traded during specified time intervals), or they may be ratios of instantaneous quantities and quantities accumulated over specified intervals of time (such as price-to-earnings ratios).
When the data are prices of assets or of commodities, we often refer generically to the unit of currency as the numeraire. Returns and derivative assets are valued in the same numeraire. Most of the financial data in this book are price data measured in United States dollars (USD).
Our interest in data is usually to understand the underlying phenomenon that gave rise to the data, that is, the data-generating process. In the case of financial data, the data-generating process consists of the various dynamics of āthe marketā.
The purpose of the statistical methods discussed in this book is to gain a better understanding of data-generating processes, particularly financial data-generating processes.
An analysis of financial data often begins with a plot or graph in which the horizontal axis represents time. Although the data may be associated with specific points in time, as indicated in the plot on the left in Figure 1.1, we often connect the points with line segments as in the plot on the right in Figure 1.1. This is merely a visual convenience, and does not indicate anything about the quantity between two points at which it is observed or computed. The actual data is a discrete, finite set.
Models and Data; Random Variables and Realizations: Notation
We will often discuss random variables and we will discuss data, or realizations of random variables. Random variables are mathematical abstractions useful in statistical models. Data are observed values. We often use upper-case letters to denote random variables, X, Y, Xi, and so on. We use corresponding lowercase letters to denote observed values or realizations of the random variables, x, y, xi, and so on.
I prefer to be very precise in making distinctions between random variables and their realizations (in notation, the conventional distinction is upper-case letters for random variables and lower case for realizations), but the resulting notation often is rather cumbersome. Generally, therefore, in this book I will use simple notation and may use either upper- or lower-case whether I refer to a random variable or a realization of one.
We form various functions of both random variables and data that have similar interpretations in the context of the model and the corresponding dataset. These functions are means, variances, covariances, and so on. Sometimes, but not always, in the case of observed data, we will refer to these computed values as āsample meansā, āsample variancesā, and so on.
An important concept involving a random variable is its expected value or expectation, for which I use the symbol E(Ā·). It is assumed that the reader is familiar with this terms. (It will be formally defined in Section 3.1.)
The expected value of a random variable is its mean, which we often denote by μ, and for a random variable X, we may write
Another important concept involving a random variable is its variance, often denoted by Ļ2 and for which I use the symbol V(Ā·). The variance is defined as the expected value of the squared difference of the random variable from its mean, and for the random variable X, we may write
| (1.2) |
The square root of the variance is called...