To analyse social and behavioural phenomena in our digitalized world, it is necessary to understand the main research opportunities and challenges specific to online and digital data. This book presents an overview of the many techniques that are part of the fundamental toolbox of the digital social scientist.
Placing online methods within the wider tradition of social research, Giuseppe Veltri discusses the principles and frameworks that underlie each technique of digital research. This practical guide covers methodological issues such as dealing with different types of digital data, construct validity, representativeness and big data sampling. It looks at different forms of unobtrusive data collection methods (such as web scraping and social media mining) as well as obtrusive methods (including qualitative methods, web surveys and experiments). Special extended attention is given to computational approaches to statistical analysis, text mining and network analysis.
Digital Social Research will be a welcome resource for students and researchers across the social sciences and humanities carrying out digital research (or interested in the future of social research).
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Digital Social Research by Giuseppe A. Veltri in PDF and/or ePUB format, as well as other popular books in Social Sciences & Sociology. We have over one million books available in our catalogue for you to explore.
All social research methods have underlying assumptions about human nature and, in particular, about the way people make decisions, create their opinions and behave. In fact, this is one of the aspects that differentiates between the various different disciplines within the social sciences. Psychology, economics and sociology all have different models of how people behave. Besides the theoretical implications of such differences, the consequences for the type of methodology employed are substantial. Depending on which underlying model is selected, a particular research method is considered appropriate to study human behaviour.
For a long time, in economics but not only, a model of how people make decisions, known as the ‘rational choice theory’, has been considered the baseline. According to this model, people’s preferences have a well-defined structure and the choice between courses of action is an almost automatic mechanism in which the individual applies his or her system of preferences to a limited set of options (for example, the set of products that fall within the budget available). In other social sciences, the most common underlying model of human behaviour was unbalanced by an ‘oversocialized view’, ‘a conception of people as so overwhelmingly sensitive to the opinions of others, and hence obedient to the dictates of consensually developed norms and values, internalized through socialization, that obedience is not burdensome but unthinking and automatic’ (Granovetter, 2017: 11; see also Di Maggio, 1997).
Both models look primarily at people’s conscious thought processes and determine what they think, believe and how they act. Deviation from either economic rationality or forms of ‘sociological rationality’ were labelled as ‘irrational’. In such a context, the way to elicit data and study people’s behaviour relies on what people themselves report, about their opinions, social norms, attitudes and beliefs. These are often defined as ‘self-reported’ data, meaning that researchers rely on participants of a study to report on something they have done or on what they think or believe. Surveys and interviews of all sorts are examples of self-reported data. In contrast to this approach, there are observational or behavioural data, data about the actual actions and behaviour carried out by someone. To better appreciate the difference, let’s take the example of asking someone how many times she or he goes to the gym every month, and compare the response to the actual tracking of their movements, for example, on a GPS phone or watch. The two pieces of information can differ dramatically. Researchers in the social sciences have learned to live with limitations of self-reported data, such as the social desirability bias (Kreuter et al., 2009). The concept of social desirability rests on the notion that there are social norms governing some behaviours and attitudes and that people may misrepresent themselves in order to appear to comply with these norms. This is the reason why participants might provide inaccurate information about their behaviour to researchers. At the same time, people have difficulty in verbalizing accurately what they have done, felt and thought. Recalling events from memory is not easy either (Gaskell et al., 2000). In other words, self-reported measures have their limitations, but they have been the most common way of conducting social research related to human behaviour.
However, the biggest challenge to self-reported data has come from a shift in the model of human behaviour. Since the late 1990s, psychologists have distinguished between two systems of thought with different capacities and processes (Kahneman, 2011; Kahneman and Frederick, 2002; Metcalfe and Mischel, 1999; Sloman, 1996; Smith and DeCoster, 2000; Lichtenstein and Slovic, 2006), which have been referred to as System 1 and System 2 (Evans and Stanovich, 2013). System 1 (S1) is made up of intuitive thoughts of great capacity, is based on associations acquired through experience and quickly and automatically calculates information. System 2 (S2), on the other hand, involves low-capacity reflective thinking, is based on rules acquired through culture or formal learning, and calculates information in a relatively slow and controlled manner. The processes associated with these systems have been defined as Type 1 (fast, automatic, unconscious) and Type 2 (slow, conscious, controlled) respectively (see Table 1.1). The perspective of the dual system became increasingly popular, even outside the academy, after the publication of Daniel Kahneman’s book Thinking, Fast and Slow (2011). Kahneman was awarded the Nobel Memorial Prize in 2002 for his contribution to the explanation of individual economic behaviour through the elaboration of the ‘prospect theory’ (see Kahneman and Tversky, 2008).
Table 1.1 A schematic overview of the two ‘systems of thinking’ underlying human behaviour
System 1
System 2
Quick, automatic, no effort, no sense of voluntary control
Slow, effortful, attention to mental activities requiring it
Continuous construal of what is going on at any instant
Good at cost/benefit analysis, but lazy and saddled by decision paralysis (cognitive overload)
Characteristics
Characteristics
Quick (reflexive)
Deliberate (reflective)
Heuristic-based
Conscious
Use shortcuts
Rule-based
When it plays
When it plays
When speed is critical Avoid decision paralysis
May take over when System 1 cannot process data
When System 2 is lazy or not activated (not worth, no energy, lack of awareness)
May correct/override System 1 if effort shows that intuition or impulse is wrong
The so-called ‘dual model’ of the mind is now the most supported way of understanding human behaviour at the individual level. The model has also been applied outside psychology, for example in sociology (Moore, 2017; Lizardo et al., 2016) and in political science (Achen and Bartels, 2017), and the implications of Kahneman and Tversky’s work have led to the research programme known as behavioural economics, which has had a great impact on traditional micro-economics theory.
From the initial underlying model of human behaviour based on the ‘theory of rational decision-making’, or rational choice theory, the current model portrays human beings as characterized by ‘bounded rationality’ – in other words, they are rational with limits, in which the ‘irrational’ is not some mysterious and almost metaphysical force, but instead the outcome of systematic error and biases originated by how our cognition and emotions work (and interact).
A more precise model of human behaviour and decision-making has implications for social science research methodology and in particular for the aforementioned distinction between self-reported and observational/behavioural data. The dual mode of thinking brings back the importance of unconscious thought processes, but also of contextual and environmental influences on human behaviour, something that is highly problematic in studies using self-reported measures and instruments only. Traditionally, collecting behavioural data has been very difficult and expensive for social scientists. To keep track of people’s actual behaviour could be done only for small groups of people and for a very limited amount of time. The availability of digital data has brought us a large increase in behavioural data; we now have digital traces of people’s actual behaviour that were quite simply never available before.
The combined effect of a relatively new and powerful foundational model of human behaviour and decision-making offered by the dual model together with the availability of behavioural data thanks to the digital traces recorded by a multitude of services and tools is very promising for social scientists. Before continuing this line of argument, let’s clarify one point that might be the object of criticism. Considering human behaviour as the outcome of mutual influences of conscious acting and unconscious heuristics, of biases and environmental influences, does not mean a return to a form of reductionism in which people’s opinions count for nothing. Self-reported data will remain an important source of information for social scientists, but, at the same time, the availability of behavioural data will function as complementary data to understand complex social phenomena. If we cross-tabulate that typology of data with the modality of human behaviour and decision-making – as shown in Table 1.2 – the complementarity becomes clearer.
Table 1.2 Cross-tabulation between typology of data and modes of behaviour
Typology of data
Type of human behaviour
Self-reported
System 2, rational deliberation, attitudes conscious description
Behavioural/observational
System 1, heuristics use and context/environment influence
The distinction between self-reported and behavioural data is no longer mainly theoretical because the new opportunities for collecting the latter are unprecedented. Such opportunity opens up new research directions, as well as the possibility of reviewing current theories and existing models. Table 1.2 reports a distinction that is useful particularly for those who are interested in studying human behaviour at the micro and meso levels, that is to say at the individual and the group levels of analysis, but it is less pertinent to the macro level.
According to Granovetter (2017: 13), both overrational and oversocialized models of human behaviour are atomistic in nature: ‘both share a conception of action by atomized actors. In the under-socialized account, atomization results from narrow pursuit of self-interest; in the oversocialized one, from behavioural patterns having been internalized and thus little affected by ongoing social relations.’ Behavioural digital data can have, among other features, a great deal of information about social relations and people’s embeddedness; they can help overcome such an atomized view (we will return to this issue later in the book).
However, the increased availability of collecting data about people’s behaviour does not free us from biases generated by the design and aims of digital platforms. People’s behaviour is constrained by the platform they use; for example, it is not possible to write an essay on Twitter unless we decide to write it using a large number of individual tweets. There are, therefore, several potential sources of confounding factors, as we will further elaborate in the section below on construct validity.
Returning to the issue of the different levels of analysis, at all levels another distinction is relevant: the one between static and dynamic data. The large majority of data collected in the social sciences have been ‘static’ – that is, data collection has been carried out at a given time. The reason for this is because longitudinal data collection, data collected over a period of time, was very difficult and expensive. The only exceptions were analysis carried out on documents and data that were archived and therefore accessible – for example, newspapers but also administrative data collected by governments or other institutions. Relying on static data has produced an involuntary emphasis on theories that focus more on ‘structures’ rather than on processes (Abbott, 2001). In other words, it has been historically difficult for social scientists to observe the dynamic unfolding of social events, especially at the macro level, because collecting data for this purpose was extremely complex and demanding in terms of resources. Most surveys are cross-sectional, meaning that they are carried out once or twice, and the same applies to interviews and other forms of data collection. Digital data introduce a much-increased capacity for recording and using longitudinal data for social scientific purposes. Obviously, digital data have not been historically around for many decades, but future researchers might have at their disposal longitudinal datasets that were absent in the past. The dynamic nature of digital data might be more enriching than their raw size; ‘big data’ concerns not only size b...