In light of the developments towards a datafication of society, there is a need to reinvent and adapt our research approaches in order to make them more relevant and useful. This demands a creative and somewhat anarchistic approach to existing theories and methods.
Sociologist John Law argues, while acknowledging that conventional research methods are indeed useful in some cases, that there is an urgent need to âremake social science in ways better equipped to deal with mess, confusion and relative disorderâ (Law, 2004, p. 11). The need to go beyond methods as we know them is underpinned by the fact that social science is not very good at understanding âthings that are complex, diffuse and messyâ. This is because the simple and clear descriptions that most conventional research methods aim for âdonât work if what they are describing is not itself very coherentâ (Law, 2004, p. 2). Especially in light of the high level of complexity of twenty-first-century networked society, it is imperative that we develop more ambivalent methodologies to account for our increasingly ambivalent object of study.
Paul Feyerabend, the refreshingly provocative enfant terrible of the philosophy of science, wrote on the âcomplexity of human changeâ and on the âunpredictable character of the ultimate consequencesâ of peopleâs actions. âAre we really to believeâ, he asked, âthat the naive and simple-minded rules which methodologists take as their guide are capable of accounting for such a maze of interactions?â (Feyerabend, 1975, p. 9). This question is today more relevant than ever, as datafication affects not only research practices, but also society as such, and thereby the very object of study of our research. Social expressions in the age of the internet are fragmented and entangled, in a system of platforms and relations enabled through digital technologies.
As argued by Nick Couldry and Andreas Hepp (2017), we now live in an age of deep mediatisation, where media can no longer be seen as specific channels of centralised content. Rather, media are now better understood as platforms for enacting social life (Dijck, Poell, and Waal, 2018). This is symptomatic of a transition from a mass media system to a social media ecology. The transformation has been described in terms of a rise of âmass self-communicationâ (Castells, 2009), ânetworked individualismâ (Rainie and Wellman, 2012), and âconnective actionâ (Bennett and Segerberg, 2012). In sum, such perspectives argue that politics, opinions, and ideas, as well as social life in general now function in accordance with a much more decentralised and democratic logic (Ito, 2008), but also in more volatile and âviralâ ways (Sampson, 2012). This represents something much more than a mere technological transition. Following ongoing processes of digitalisation and datafication, our social world is suffused with technological media of communication that bring about a refiguring of the world in, and on, which we act. As argued by Couldry and Hepp (2017), social relations today are actualised through a system of variously connected digital platforms, that bring about a much more intense embedding of media in social processes than was ever the case before. Now there is a need to adapt social science theories and methods in hybrid ways to better account for this situation.
The digital society has been characterised as a âwicked systemâ (Törnberg, 2017, p. 52), the analysis of which demands a critical methodological pluralism. In fact, most social systems have this emergent property of wickedness to some degree â a âcombination of complexity and complicatedness that entails plasticity and deep ontological uncertaintyâ (Törnberg, 2017, p. 25). In the specific case of social media and politics, internet researcher Helen Margetts and her colleagues argue that âsocial media are a source of instability and turbulence in political lifeâ, which creates an uncertain environment (Margetts et al., 2017, p. 74). They suggest that:
As these authors argue, there is indeed a complexity (and complicatedness) of factors, levels, forces, and influences involved, at all levels of the social â especially in the digital society. And this book, in essence, is about approaching this complexity analytically, with a theoretical and methodological openness that can account for this turbulent, wicked, anarchistic, and ambivalent nature.
Datafication
The ongoing development of the internet and social media increasingly transforms our lives into data. Vast amounts of information about individuals and their interactions are being generated and recorded â directly and indirectly â voluntarily and involuntarily â for free and for profit. These volumes of data offer unforeseen and exciting opportunities for social research. It is because of this that we have witnessed in recent years the rise of the much-hyped phenomenon of big data. Alongside this development, computational methods have become increasingly popular also in scholarly areas where they have not been commonly used before.
âBig dataâ refers broadly to the handling and analysis of massively large datasets. According to a popular definition, big data conforms with three Vs. It has volume (enormous quantities of data), velocity (is generated in real-time), and variety (can be structured, semi-structured, or unstructured). Various writers and researchers have suggested a number of other criteria be added to this, such as exhaustivity, relationality, veracity, and value. Big data has indeed been a mantra in the fields of commercial marketing and political campaigning throughout the last decade. High hopes and strong beliefs have been connected with how these new types of data â enabled by peopleâs use of the internet, social media, and technological devices â might be collected and analysed to generate knowledge about how to get people to click on adverts, or to buy things or ideas. Similar methods are also becoming more and more used in fields such as healthcare and urban planning.
All of this is a consequence of what can be called the datafication of social life. This is what happens when âwe have massive amounts of data about many aspects of our lives, and, simultaneously, an abundance of inexpensive computing powerâ (Schutt and OâNeil, 2013, p. 4). Also beyond the internet and social media, there has been an increased influence of data into most industries and sectors. There has been huge interest, and many efforts made, to try to extract new forms of insight and generate new kinds of value in a variety of settings. As explained on Wikipedia (2018), lately âthe term âbig dataâ tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data setâ. As underlined by internet researchers Kate Crawford and danah boyd, âbig dataâ is in fact a poorly chosen term. This is because its alleged power is not mainly about its size, but about its capacity to compare, connect, aggregate, and cross-reference many different types of datasets (that also often happen to be big). They define big data as:
From a critically sociological perspective, Lupton (2014, p. 101) argues that the hype that surrounds the new technological possibilities afforded by big data analytics contribute to the belief that such data are âraw materialsâ for information â that they contain the untarnished truth about society and sociality. In reality, each step of the process in the generation of big data relies on a number of human decisions relating to selection, judgement, interpretation, and action. Therefore, the data that we will have at hand are always configured via beliefs, values, and choices that ââcookâ the data from the very beginning so that they are never in a ârawâ stateâ. So, there is no such thing as raw data, even though the orderliness of neatly harvested and stored big datasets can create an illusion to the contrary.
Sociologist David Beer (2016, p. 149) argues that we now live in âa culture that is shaped and populated with numbersâ, where trust and interest in anything that cannot be quantified diminishes. Furthermore, in the age of big data, there is an obsession with causation. As boyd and Crawford (2012, p. 665) argue, the mirage and mythology of big data demand that a number of critical questions are raised with regard to âwhat all this data means, who gets access to what data, how data analysis is employed, and to what endsâ. There is a risk that the lure of big data will sideline other forms of analysis, and that other alternative methods with which to analyse the beliefs, choices, expressions, and strategies of people are pushed aside by the sheer volume of numbers. âBigger data are not always better dataâ, they write, and the analysis of them will not necessarily lead to insights about society that are more true than what can be achieved through other data and methods.
Many popular examples exist for illustrating how datafication is growing exponentially intense, the most famous one being Mooreâs Law, according to which computers and their memory and storage will become ever more powerful by each unit of time (Moore, 1965). Another telling comparison is this one: The Great Library of Alexandria, which was established in the third century BCE, was regarded as the centre of knowledge in the ancient world. It was believed to hold within it the sum total of all human knowledge. Its entire collection has been estimated by historians to have been the size of 1,200 million terabytes. Today however, we have enough data in the world to give more than 300 times as much data to each person alive (Cukier and Mayer-Schoenberger, 2013).
We are no doubt in the midst of an ongoing data explosion, and along with it the development of âdata scienceâ. Data science is an interdisciplinarily oriented specialisation at the intersection of statistics and computer science, focusing on machine learning and other forms of algorithmic processing of large datasets to âliberate and create meaning from raw dataâ rather than on hypothesis testing (Efron and Hastie, 2016, p. 451). Data science is a successor to the form of âdata analysisâ proposed by the statistician John W. Tukey, whose analytical framework focused on âlooking at data to see what it seems to sayâ, making partial descriptions and trying âto look beneath them for new insightsâ. In his exploratory vein, Tukey (1977, p. v) also emphasised that this type of analysis was concerned âwith appearance, not with confirmationâ. This focus on mathematical structure and algorithmic thinking, rather than on inferential statistical justification, is a precursor to the flourishing of data science in the wake of datafication.
All the things that people do online in the context of social media generate vast volumes of sociologically interesting data. Such data have been approached in highly data-driven ways within the field of data science, where the aim is often to get a general picture of some particular social pattern or process. Being data-driven is not a bad thing, but there must always be a balance between data and theory â between information and its interpretation. This is where sociology and social theory come into the picture, as they offer a wide range of conceptual frameworks, theories, that can aid in the analysis and understanding of the large amounts and many forms of social data that are proliferated in todayâs world.
But in those cases where we see big data being analysed, there is far too often a disconnect between the data and the theory. One explanation for this may be that the popularity and impact of data science makes its data-driven ethos spill over also into the academic fields that try to learn from it. This means that we risk forgetting about theoretical analysis, which may fade in the light of sparkling infographics.
It is my argument that the social research that relies heavily on the computational amassing and processing of data must also have a theoretical sensitivity to it. While purely computational methods are extremely helpful when wrangling the units of information, the meanings behind the messy social data which are generated in this age of datafication can be better untangled if we also make use of the rich interpretive toolkit provided by sociological theories and theorising. The data do not speak for themselves, even though some big data evangelists have claimed that to be the case (Anderson, 2008).
Big data and data science are partly technological phenomena, which are about using computing power and algorithms to collect and analyse comparatively large datasets of, often, unstructured information. But they are also most prominently cultural and political phenomena that come along with the idea that huge unstructured datasets, often based on social media interactions and other digital traces left by people, when paired with methods like machine learning and natural language processing, can offer a higher form of truth which can be computationally distilled rather than interpretively achieved.
Such mythological beliefs are not new, however, as there has long been, if not a hierarchy, at least a strict division of research methods within the cultural and social sciences, where some methods â those that have come to be labelled âquantitativeâ, and that analyse data tables with statistical tools â have been vested with an âaura of tru...