1 Statistical modelling and sport business analytics
Vanessa Ratten and Petrus Usmanij
Introduction
This chapter focuses on the distinctive subject area of statistical modelling and data analytics in sport. Statistical modelling has made a major contribution to our understanding of sport business (Ratten, 2017). Traditionally the sport industry has always used data in terms of understanding performance and resulting outcomes (Jones et al., 2017). However, the advances in computing technology have given rise to new statistical techniques and data analysis methods (Akter et al., 2016). This has meant current sport analytics can be contrasted with traditional analytics in two main ways. Firstly, sport analytics now incorporates more advanced informatics that enables a deeper analysis of data. More sport organisations are using cloud computing to store data and this has led to increasingly more complex analytics. In the past it was hard to store data but this has changed with more longitudinal data becoming available. Thus, it is now easier to compare current and past performance statistics. In addition, the way data is collected has changed with more athletes and clubs measuring on- and off-field performance. Sport apps have emerged in the marketplace as a way for both amateur and professional athletes to track their statistics (Miragaia et al., 2017). This has resulted in more interest in data analytics and the benefits that derive from statistical analysis. At the same time, the video industry has catapulted into one of the largest entertainment industries. Sport games and in particular fantasy sport have become popular. In addition, online sport gambling has emerged as a new source of entertainment. Due to the global nature of many sport events there has been a resulting increase in the use of data analytics for sport betting.
Secondly, social media has meant there is a need for comparative and real-time data that can be individualised based on personal needs. The number of views of social media posts and demographics of viewers have become more commonly used to track impact. Thus, new sources of data have emerged from social media that were previously not considered as being useful. Consequently, new businesses have surfaced in that marketplace that track online sentiment and behaviour (Douglas et al., 2020). This has impacted sport businesses that can utilise this data to design better marketing campaigns (Ratten & Ratten, 2011).
The chapters in this book begin to address the need for more research on statistical modelling and sport business analytics. Specific conceptualisations of sport data can advance our understanding of analytics by broadening its perspective. The purpose of this chapter is to describe in more detail why a sport focus on statistics and analytics is necessary as well as the kind of theoretical and practical questions that need to be addressed. This chapter and its content are based on the observation that whilst there is an abundance of research on statistics and analytics, the sport context has been excluded from mainstream management and organisation scholarship. Whilst there are exceptions in the sport business field, there is still a lack of comprehensive analysis about the distinguishing features of sport business analytics. Thus, in order to progress the innovations in the sport industry, many issues about analytics are likely to have important managerial implications.
Data analytics
The sport industry is one of the largest global industries and impacts many other areas of society including health and education (Zhang, 2015). There are many different ways to derive and analyse data regarding sport depending on its content. Some data will be more important than others so it is useful to have the right statistical techniques to analyse it. This means there are different ways people can interpret sport data and its implications for business. Due to more people being involved in sport both from a leisure and professional perspective, there has been an evolution in sport data (Ratten, 2011). There are a wide array of themes in sport business analytics that include finance, management and psychology. Therefore, there are a variety of different ways to examine sport business analytics.
Data analytics has experienced rapid growth and become an important way people and organisations can gain a competitive advantage. The economic and cultural foundations of many communities are built around sport (Zhang et al., 2014). Moreover, there has been an increase in the need for more complex forms of statistical modelling when in reality basic statistics can produce good results. There are three broad categories of data analytics methods: descriptive, prescriptive and predictive (Hazen et al., 2018). Descriptive data analytics involves understanding the processes and information contained within the data. This enables the system under analysis to be described in a way that makes sense to sport managers. To do this, visualisation of the data might be needed in order to see trends. This enables inferences about the data to be made as a way of understanding what information the data contains. Prescriptive data analytics involves focusing on the causes for the data. This means the policies and reasons for the data are considered. Based on the results of the data analytics there can then be specific policies developed. Predictive data analytics involves trying to forecast future events. This is useful in understanding what is currently happening and what is likely to occur in the future.
There are different statistical procedures and methodological techniques being used in sport. These techniques are further advancing based on new technological innovations. Both qualitative and quantitative analysis techniques are used in sport. Traditionally, qualitative research was analysed using less sophisticated methods than was quantitative research. This came from a reluctance of qualitative researchers to use statistical packages as they preferred different methods. This has changed with more people becoming computer literate and becoming comfortable with the use of computers for statistics. This has resulted in an increase of usage by qualitative researchers in statistical packages such as NVivo and qualitative comparative analysis used to analyse interview transcripts. This has meant the need by qualitative researchers to explore in more depth hidden meaning in their data. Computer programs can often pick up patterns in a way that is easier to analyse. This means a change in the way data analytics is viewed by qualitative researchers. In addition, there has been an alteration in the perception of sample sizes needed to produce good research results. This is due to new data analytics techniques being used.
Kim and Lee (2019) suggest that common statistical mistakes regarding sport research can be categorised according to pre-, peri- and post-analysis. In the pre-analysis stage, mistakes can be considered in terms of statistical assumptions, random sampling and causality (Kim & Lee, 2019). There are set assumptions that are common to most statistical analysis that enable standard procedures to be followed. This includes the normality, linearity and homoscedasticity (Kim & Lee, 2019). In most textbooks about statistics these assumptions need to be met in order for analysis to be conducted correctly. These assumptions mean that often they are assumed without being checked. This results in there being a lot of trust in the researcher taking the right approach in terms of statistical analysis. This trust is common amongst all types of statistical analysis that rely more on the output of the analysis rather than considering how it was inputted into the statistical package. Kim and Lee (2019) give the example of multivariate normality being an underlying assumption that is rarely tested in sport management literature.
Data quality
Data needs to be evaluated in terms of its quality in order for researchers to assess its overall impact in the marketplace. Clarke (2016) suggests there are seven main ways to assess data quality. Firstly, syntactical validity relates to confirming that the data relates to the definition of the data item. Secondly, appropriate identity association relates to how the data refers to its intended meaning. Thirdly, appropriate attribute association, focuses on how the data has a real-world meaning. Fourthly, appropriate attribute signification refers to when there is no ambiguity about what the data represents. Fifthly, accuracy in terms of the data is measured in the right way. Sixthly, precision refers to how the level of detail is recorded in the data. Seventhly, temporal applicability means that there is an associated time and date meaning.
Clarke (2016) suggests that information quality can be assessed in six main ways. Firstly, by its theoretical relevance in terms of the information having an impact on current thinking. Secondly, practical relevance in terms of the information having a real life impact. Thirdly, currency in terms of the information being a current topic. Fourthly, completeness in terms of the information being reliable and there is no misinterpretation. Fifthly, controls referring to how the information has been analysed. Sixthly, auditability, meaning the information can be tracked to its source of origin. There are numerous ways data can be captured including via sensors and biometric sensing. Due to the various ways of assessing information quality it is also useful to focus on the benefits of data analytics.
Benefits of data analytics
There are numerous benefits associated with the use of data analytics, including infrastructure, operational, organisational, managerial and strategic benefits (Wang et al., 2018). Infrastructure benefits refer to ways to reduce the costs associated with information technology usage. This is useful in eliminating system redundancy and making processes work better. To do this, data should be able to be transferred quickly amongst different entities in an organisation. In addition, an organisation’s stakeholders should be able to receive and transmit information quickly. This will enable better usage of technology systems and reduce maintenance costs.
Operational benefits refer to advances in the way decisions and actions are taken. This helps to provide better services and quicken internal processes. As there is a large quantity of data being received by sport organisations, the most important information needs to be acted upon. This can be difficult when large quantities of information are continually being transmitted. Once information is received it needs to result in better processes.
Organisational benefits refer to ways in which organisations can improve based on data analytics. This means ensuring cross-functional communication is improved through better information flows. Increasingly sport organisations are needing to collaborate with other organisations on joint projects, so it is important they have the right internal structures in place to make this happen. This enables data to be shared in order to provide new content sources.
Managerial benefits refer to ways leaders can use information obtained from data analysis to improve their services. This is important particularly in highly competitive markets when timing is of the essence. Entrepreneurial managers will want to use information from data analytics to benefit their organisation, but to do this they need the support of organisational members. This ensures the optimisation of decisions regarding changes that will impact the day-to-day running of their organisation. More strategic decisions will need to be considered in a detailed manner in order to obtain viable options.
Strategic benefits are about the long-term consequences of information derived from data analytics. This includes ways to forecast changes and to utilise scenario planning. By considering future needs, an organisation can commit to finding the right resources and people. This will ensure greater competitiveness in the long run for sport organisations using big data.
Importance of big data to sport
Big data has been viewed as the fourth paradigm of science (Strawn, 2012). This is due to the way big data has revolutionised the global economy. Mishra et al. (2017: 555) states “data are no more measured in terms of gigabyte or terabyte (TB) but in petabyte (PB), exabyte (EB) and zettabyte (ZB)”. The amount of data generated in society is increasing as the result of technological advancements in the way information is communicated. Big data has the capability to transform existing business processes by facilitating innovation (Brown et al., 2011). Business ecosystems are changing as the result of big data and analytics. The amount of data will increase in the future, so new analytics techniques are needed to cope with this change. The main types of data are structured, semi-structured and unstructured (Mishra et al., 2017). Structured data refers to spreadsheets and other forms of information that are in a set style. Unstructured data is audio, images, text and video (Mishra et al., 2017). Most data is in an unstructured format. Semi-structured data involves data that is formatted but has some degree of flexibility.
Lee (2017) suggests there are six main challenges in big data: quality, security, privacy, investment justification, management and shortage of data scientists. The quality of big data is important as it enables good decisions to be made. This ensures confidence that the right kind of data has been collected and analysed. The quality can be assessed through a number of metrics including usefulness and timeliness. It is important that data is of a high quality in order to reduce errors.
Data security refers to how well the data is protected in terms of its storage and confidentiality. It is important that data does not get misappropriated as its contents could benefit competitors. Thus, there needs to be proper security measures in place to prevent corporate espionage. This includes mechanisms to ensure data does not get inadvertently given to others. To do this, sport organisations need to have mechanisms in place to ensure the security of their data. This includes encryptions and safety mechanisms.
Data often includes private information about individuals and organisations. This can include time, location and usage patterns. Due to the large amount of data being collected, there can sometimes be inadvertently personal information included. This means that sport organisations need to ensure they have informed consent from individuals to use their information. Increasingly more emphasis has been placed on privacy concerns such as purchasing preferences because of the way this information can be used. Thus, whilst sport organisations need data they also need to consider privacy issues.
Investment justification refers to how much money is spent on data analytics. As computer programs constantly are being updated the amount of resources spent on data analytics needs to be evaluated in terms of the potential return. Sometimes it can take a large amount of financial resources to buy the right software and hardware equipment. Thus, there needs to be a consideration of projected financial and non-financial returns.
Data management involves how data is organised in terms of its collection then analysis. In order to find out if the data is useful it can be helpful to analyse its content. This can be achieved through streaming analytics that help to assess real-time impact. There is also a need to explore the data to see how it can be used.
Theoretical implications
The majority of sport management theory has been tested using quantitative methods that rely on large sample sizes. This has meant many sport management scholars wanting to publish in the top journals have tended to use quantitative data analysis techniques. The complexity of the statistical analysis has increased with originally basic regression analysis used, but this has progressed to structural equation modelling that tests for moderation and mediation between variables. Traditional statistical tec...