In the introduction section of this textbook, you read about how data are used in different situations, what data might be and how we come across data, and many examples from obesity to competitive eating to ticket sales and pricing. In this chapter, you will be introduced to more technical concepts, such as analytics, data, data types, some key concepts and various statistical analyses to develop a baseline, before moving to the chapters covering application of analytics in functional areas of sport. Please remember that this chapter is not developed to replace a statistics textbook. It is, rather, a brief summary of some relevant statistical concepts and key analyses. If needed, please refer back to statistics textbooks for more detailed information.
Every organization would benefit from executing their business with efficiency, and sport organizations are no exception. Due to the saturated market, it is especially important for sport organizations to function with maximum efficiency and to make smart business decisions. Today, business decisions are not done with hunches; but they are based on analytics. Davenport and Harris (2007) defined analytics as “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact based management to derive decisions and actions” (p.7).
The sport organizations with the analytical mindset generate and collect data through various internal and external sources, and analyze business performance to derive insights and make fact-based decisions to create competitive advantage, and increase the effectiveness and efficiency of the organization. Sport organizations could utilize analytics in certain functional areas or organization-wide. Most commonly, sport organizations use analytics to:
• analyze athlete performance to make decisions on the starting line-up, game plans and which players to sign/draft/trade;
• predict and prevent player injuries;
• assess value of athletes to their brand;
• examine effectiveness of various marketing activities;
• segment existing fans and estimate their value;
• predict retention of fans; and
• develop an incident prevention model based on past incidents.
While the benefit of analytics to a sport organization is obvious, what “data” is might not be so clear to all. Let’s turn our focus to understanding what data is and some important concepts about it.
Data is information in a variety of forms such as numbers, words, pictures, video, measurements, observations, and so on. It can be raw and unorganized, or also transformed into a format that is useable. Today, sport organizations have access to a vast amount of data including transactional data (e.g. sales, cost and inventory), non-operational data (e.g. industry sales, macroeconomic data) and meta-data (e.g. data definitions) (Frand, n.d.). In order to achieve benefits from analytics and make good decisions, organizations should begin the process by asking questions about data before jumping into data collection, and should utilize systematically assembled data (Davenport & Harris, 2007):
This will be dependent on the objectives of the organization and the questions they want to answer in the pursuit of achieving their objectives. Every organization sets business objectives to gain a competitive advantage in the market place, and these objectives are based on the current status of the sport property in the market, what they want to accomplish, and their resources and competencies. Based on these components, organizations set functional objectives and identify operational metrics to measure their performance in achieving the objectives. For example, if a fitness facility aims to have a set number of active memberships monthly, they can simply count the number of memberships. While the number of memberships shows if they met their goal or not, this might not provide enough information to the administrators especially if the facility didn’t meet the goal. Looking at retention rate for current members and the number of new memberships acquired would provide more detailed insights on why they failed to meet their goal. As you see a variety of data is relevant in answering one question, and data will provide insights only if you start with a question and collect relevant data.
Based on the type of data needed, the source of data will change. Sport organizations can obtain data both from internal sources and external sources. Internal data could be gathered from finance, manufacturing, research and development, and human resources departments. Marketing departments can also provide internal data such as Return on Investment (ROI) metrics of advertisements (more details are provided in Chapter 4
). External data could be gathered from suppliers and customers, and also could be purchased from a third party such as Nielsen TV ratings or Scarborough customer data.
Once the type of data needed and how to acquire it are decided, the next question to tackle is how much data is needed? The answer to this question is “it depends.” In some cases, a sport organization might have data of an entire population, whereas in other cases they can only access a sample. How much data one needs is especially important when analysis is done with data collected from a sample that requires power analysis to identify adequate sample size. In addition to sample size, representativeness of the sample is also an important concern for the accuracy of findings. These concepts will be covered in more detail in the “Some key statistical concepts” section of this chapter.
The next step that requires attention is the quality of the data. A large data set is not always the answer. Quality data is needed to achieve valid and reliable results. Some of the important aspects of data quality are completeness, accuracy, consistency, and currency.
• Completeness: Availability of all necessary and relevant data.
• Accuracy: Reflecting real-life situations and being precise.
• Consistency: Being consistent between systems with common definitions and standardization, and avoiding duplicate records in data.
• Currency: Being updated periodically – daily, weekly, monthly.
Most often data is extracted from its source in raw format, and needs to be cleaned by removing incorrect, incomplete and duplicate information and then transformed to be useable. Once data goes through the cleansing and transformation processes and is stored in a database, it becomes ready for analyses. Here, understanding the type of data becomes important, because the type of data and the level of measurement dictate the type of analyses one could perform. Data could be qualitative (descriptive information) or quantitative (numeric information), and quantitative data could be further classified as discrete or continuous. Discrete data can only have certain values (integers), and negative values and decimals are not possible. On the other hand, continuous data can have infinite possibilities with no gaps (e.g. 1.1, 1.135, 1.2, and 2.367) (Lomax, 2007). For example, the number of tickets sold would be an example of discrete data due to ticket sales numbers being integer numbers, and height of athletes or time in a race would be examples of continuous data.
Another important concept to understand about the quantitative data is the levels of measurement which are classified in four levels:
• Nominal: At nominal level of measurement, numbers are used to classify data. Most of you are familiar with this type of data in classification of genders such as assigning 1 to males and 2 to females in your data set. In this type of classification, numbers do not mean anything other than showing a classification and do not have an order. If we go back to our example of classification of genders, 2 is not higher or better than 1 in any way and numbers are used solely to classify groups.
• Ordinal: This type of scale displays some type of order between the numbers with respect to the characteristic being measured. For example, at a road race, the runner who completes the course in the shortest time would be ranked as first, and the others finishing the race following the winner would be ranked second, third and so on based on their time. Although rank order of 1, 2 and 3 seem to have equal distance between them, the differences between the numbers are approximate and unequal. Going back to our example, the difference between the time of first and second runners is not expected to be the same as the difference between the time of second and third runners, and so on. Therefore, an ordinal scale communicates an order; but does not claim equal distance between the points on the scale.
• Interval: Similar to ordinal scale, interval scale orders the measurements, but it also provides equal distances between the points on the scale. One of the common examples of interval scale is IQ scores. Average IQ score is 100, and the difference between IQ scores of 80 and 90 is equal to the difference between scores of 100 and 110. In addition, lower scores show lower IQ levels and higher scores show higher IQ levels. One important aspect of an interval scale is not having a true zero point which means a zero on an interval scale does not indicate an absence of the property that is being measured. Therefore, it cannot be said an individual with an IQ score of 140 is twice as smart as another individual with an IQ score of 70.
• Ratio: The ratio scale carries all characteristics of interval scale and also has a true zero which indicates the absence of the quality being measured. Going back to the runner example, at the beginning of the race, the clock is set to zero minutes and seconds, and if the winner finished a 5-kilometer road race in 18 minutes, he could be said to be twice as fast as a runner who finished the race in 36 minutes.
Before moving into various analyses, it is important to remember some key statistical concepts. Statistics in general is divided into two types, descriptive statistics and inferential statistics. Descriptive statistics summarize and describe data via frequencies, central tendency, measures of dispersion and distribution characteristics. Some examples from the sport world would be batting average in baseball, number of turnovers or steals in basketball, demographic characteristics of a team’s fan base in percentages or counts, and so on. These statistics could be calculated based on a sample or could be calculated for an entire population and would be called parameters. A sample is “a subset of a population,” and a population is defined as “consisting of all members of a well-defined group” (Lomax, 2007, p.6). Traditionally, analyses often rely on a sample and inferences are made about a population from the sample data via inductive reasoning, which is called inferential statistics. In this process, how the sample is acquired is extremely important as inferential statistics are based on the assumption that sampling is done randomly. Simple random sampling is selecting a sample from a population with a process that gives each observation an equal and independent chance of being selected (Lomax, 2007). The importance of simple random sampling relies on the idea that the sample will be representative of the population and the results of inferential statistics will be generalizable to the population. For example, if we were to ask our season ticket holders about their experience at our games, we could reach out to all season ticket holders or survey a sample of them. For the sake of this example, let’s assume that we decided to collect data from a sample of season ticket holders who were randomly selected from the entire season ticket holder pool. If our sample was large enough, then the results derived from the sample would be generalizable to all of our season ticket holders. This brings us to the topic of adequate sample size and limitations of small sample size in inferential statistics. The main idea is as sample size increases, we are sampling a larger portion of the population and therefore the sample becomes more representative of the population (Lomax, 2007).
Hypothesis testing is another concept to cover before moving into types of analyses. Hypothesis testing is a decision-making method where two competing decisions, which are known as null hypothe...