Chapter 1
A Brief Overview of Data Mining and Analytics in Games
Günter Wallner
Contents
1.1 Introduction
1.2 Applications
1.2.1 Data Analytics to Improve Design and Player Experience
1.2.2 Data Analytics to Inform Business Decisions
1.2.3 Data Analytics to Innovate and Optimize Game Technology
1.2.4 Data Analytics to Empower Players and Foster Community Building
1.2.5 Summary
1.3 Limitations
1.4 Visual Analytics
1.5 Conclusions
References
1.1 Introduction
The twenty-first century has been repeatedly proclaimed to be the century of data. The increasing processing capabilities of computers and the proliferation of Internet-enabled devices have made it easier than ever to gather more and more data about every aspect of our daily lives. The massive amount of data produced every day, however, also needs to be transformed into actual information and knowledge to be of real value and to be of actual use for decision-making. This requires adequate techniques and tools that help uncover hidden and valuable information or patterns within the collected data. Otherwise, the large volumes of data are just that—data bare any deeper meaning. This is the goal of data mining (Kantardzic, 2011). While data mining has been and is receiving considerable attention to be able to cope with the ever-increasing data volumes, the term started to appear in the late 1980s, early 1990s (Coenen, 2011; Dong & Pei, 2007). Data mining is not a single technique but rather a conglomerate of methods, techniques, and algorithms, usually applied in an iterative or explorative process.
The twenty-first century is, however, not only considered to be the age of data mining but has also been coined the ludic century by game designer Eric Zimmerman (2015)—an age that is characterized by play. It may thus only be fitting that during the last decade or so, data mining has found its way into game production and has become a crucial part of game development and maintenance. This has led to the emergence of the new field of “game analytics”—broadly speaking, the application of analytics to game development and research (Drachen, El-Nasr, & Canossa, 2013). It is “the practice of analyzing recorded game information to facilitate future design decisions” (Medler, 2009, p. 188). Game analytics uses data mining techniques to discover patterns and to extract information from game-related data, especially player behavioral data. As it is often the case with new fields, the establishment of game analytics can hardly be tied to a specific point in time or be ascribed to a single factor. Instead, the emergence of game data mining and analytics may rather be attributed to a coincidence of several developments.
The first steps into the direction of game data mining have presumably been made at the turn of the century when online games such as EverQuest (Sony Online Entertainment, 1999) slowly started to track data about gameplay (cf. Weber, 2018). A couple of years later, in 2003, one of the first articles on how to improve game design through data mining was published (Kennerly, 2003). Although specifically focused on massive multiplayer online games such as the aforementioned EverQuest, many of the described techniques also applied to other types of games. Again, a couple of years later, first articles in popular media appeared with a Wired article (Thompson, 2007) on how Microsoft relied on scientific methods to inform games user research being one of the earlier examples. However, it was not until about 2010 when data mining and analytics really started to gain momentum (Weber, 2018). In 2008, Microsoft Game Studios published one of the first research articles (Kim et al., 2008) on how tracking user behavior in games can contribute greatly to the design of video games. As of 2009, Medler (2009) attested that it is hard to find a digital game that does not allow recording of gameplay in some way or the other but also noted that analyzing the recorded information to inform design is still in its infancy. A year later, Zoeller (2010) presented the telemetry suite that BioWare was using for analyzing tracked behavioral data of players, and Schoenblum (2010) presented the data collection backend developed at Epic Games. From then on, things happened very quickly, and by now game analytics has become prevalent across the industry and a major aspect of games research. It is an area that has seen substantial growth in the last 10 years and is still evolving rapidly.
This growing interest in data mining and analytics has been spurred by several developments and technical advances. First, the wide adoption of Internet-enabled gaming devices allows developers nowadays to remotely and unobtrusively track the behavior of a large numbers of players. Before that, playtesting usually happened by bringing customers in-house and observing them in a laboratory-style setting while they are playing the game. Consequently, this happened at a much smaller scale, and the invited players may not have been representative of the whole player population of the game. However, as games have become a mainstream phenomenon and are being played by an increasingly diverse audience, it has become a matter of particular interest to create games that appeal to a wide range of players. In this sense, data mining can be a valuable tool for acquiring representative data. The possibilities offered by the Internet and modern mobile devices, as well as advances in web technology, have also paved the way for new types of games, such as massively multiplayer online games or social network games played on social media platforms such as Facebook which, in turn, attract new audiences. These games are played by hundreds to thousands of players simultaneously and who may even interact with each other. This complexity makes such games challenging to develop, requiring extensive testing with a large player base to properly balance the game, to ensure a satisfying player experience, and to resolve and avoid technical issues. Remote data collection offers a natural and convenient way to gather such large-scale and long-term datasets. Moreover, production budgets of video games have risen considerably in the past years, with budgets of tens to hundreds of millions of dollars not being uncommon anymore. For example, Grand Theft Auto V (Rockstar North, 2013) had an estimated development budget of $137.5 million (Sinclair, 2013), and development costs for Gran Turismo 5 (Polyphony Digital, 2010) were reported to be $60 million (Remo, 2009). Even if these are extreme examples, and not all budgets are this high, the required investments pose a great financial risk for developers in case a game fails. Through gathering actual in-game data, developers have a means to meet audience expectations and, in turn, achieve financial success. The ever-increasing production budgets also caused developers to find ways to extend the lifespan of games and to search for new business models to alleviate the associated risks. Among these are subscription-based services, downloadable content, micro-transactions (purchasing of virtual goods for a very small amount of money), or free-to-play games (games that are basically free to play with monetization happening through micro-transactions). Some developers have started to view games-as-a-service rather than as a one-time purchase. A recent report from DFC Intelligence (Cole, 2018) suggests that the growth of EA and Activision, two of the biggest publishers in the industry, can to a large extent be attributed to this service model. In such a model, revenue is also generated after the initial release using subscriptions or, for instance, by providing new content—spread over a longer period of time—in order to uphold audience interest. Retention of players is essential for such business models, and data mining and analytics offers a valuable approach to monitor and study player engagement. All these developments have been fueled by advances in data storage and processing capabilities (Coenen, 2011), which allow analysts to efficiently process the large volumes of data as they appear in game development today.
1.2 Applications
Consequently, data mining and analytics has been applied to a variety of purposes within game production and research. Four broad and common application areas are briefly discussed in the following.
1.2.1 Data Analytics to Improve Design and Player Experience
Game development is a highly creative process that optimally needs to undergo continuous and critical evaluation to ensure that the final game is engaging and offers a satisfying player experience. This is the primary goal of games user research (GUR), which aims to “help game designers reach their design goals by applying scientific and UX [User Experience] design principles, and by understanding players” (IGDA GRUX, 2018). As data mining, GUR is not a single technique but rather a collection of qualitative and quantitative methods, such as playtesting (Fullerton, Swain, & Hoffman, 2004; Mirza-Babaei, Moosajee, & Drenikow, 2016), biometrics (Nacke, 2015), interviews (Bromley, 2018), and surveys (Brühlmann & Mekler, 2018). Over the years, analytics has become a valuable addition and by now constitutes an essential component of GUR (cf. El-Nasr, Desurvire, Aghabeigi, & Drachen, 2013). Analytics offers many benefits for complementing existing methodologies as telemetry data promises a large-scale and objective view on player behavior (i.e., the data is not biased by players’ subjective opinions), which would be difficult, or even impossible, to obtain through other methods. Unsurprisingly, data analytics has thus found broad application in GUR so far, reaching from developing behavioral profiles of player activity (Drachen, Thurau, Sifa, & Bauckhage, 2013) over the study of virtual economies (Castronova et al., 2009; Morrison & Fontenla, 2013) to all aspects of balancing, such as extracting reoccurring behavioral patterns to detect dominant strategies (Bosc, Kaytoue, Raïssi, & Boulicaut, 2013; Wallner, 2015). Apart from that, there is also a large body of work focusing on spatial and spatio-temporal aspects of gameplay (Drachen et al., 2014; Kang, Kim, Park, & Kim, 2013; Wallner & Kriglstein, 2012), which is of particular importance as movement forms one of the most important mechanics in nearly all games. Analytics may also be used in combination with qualitative and observational GUR methods (Desurvire & El-Nasr, 2013) in order to provide context to each other, although triangulating the different data sources is not straightforward (Mirza-Babaei, Wallner, McAllister, & Nacke, 2014).
1.2.2 Data Analytics to Inform Business Decisions
As discussed previously, the game industry is actively seeking alternative business models to the traditional pay-once format to reach new costumers. These models, such as free-to-play or subscription-based services, rely on keeping customers engaged over extended time periods and on providing attractive spending opportunities to generate revenue. At the same time, the number of games being released each year, and thus the number of games competing for customers, steadily increases. For instance, as of December 17, 2018, the statistics site SteamSpy counts 4,696 games being released on the digital distribution platform Steam1 in 2016, while already 7,047 games were released in 2017 and 8,882 in 2018 (Galyonkin, 2017). Inevitably, it has become more challenging to acquire new customers and to stand out from the plethora of games already on the market. With the business becoming fiercer, analytics-based solutions provide ample opportunities to support business decisions. Analytics can provide insights into player retention and churn, such as how long players keep playing or where they are quitting (Bauckhage et al., 2012; Hadiji et al., 2014; Xie, Devlin, Kudenko, & Cowling, 2015). It can help answer questions concerning conversion rates, that is, what makes a player of free-to-play games convert into a paying customer (Fields & Cotton, 2011; Hanner & Zarnekow, 2015) and about the purchasing behavior of players, for example, for which in-game content players are willing to pay and why (Hamari et al., 2017; Lehdonvirta, 2009). In addition, analytics is vital for the prediction of customer lifetime value (the total amount of revenue earned from a player) (Chen, Guitart, del Río, & Periáñez, 2018; Sifa et al., 2015) and for customer acquisition, for instance, to help plan marketing campaigns (Williams, 2015a). Moreover, players also exert influence on each other, which should not be underestimated, as a game’s community can contribute greatly to the success or failure of a game. In this sense, analytics can help to build stronger communities (see Section 1.2.4) and to improve community management (Williams, 2015b). These and other related possibilities are testaments of the value and potential of data analytics for business intelligence in game development.
1.2.3 Data Analytics to Innovate and Optimi...