Understanding China through Big Data
eBook - ePub

Understanding China through Big Data

Applications of Theory-oriented Quantitative Approaches

  1. 304 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Understanding China through Big Data

Applications of Theory-oriented Quantitative Approaches

About this book

Chen, He and Yan present a range of applications of multiple-source big data to core areas of contemporary sociology, demonstrating how a theory-guided approach to macrosociology can help to understand social change in China, especially where traditional approaches are limited by constrained and biased data.

In each chapter of the book, the authors highlight an application of theory-guided macrosociology that has the potential to reinvigorate an ambitious, open-minded and bold approach to sociological research. These include social stratification, social networks, medical care, and online behaviours among many others. This research approach focuses on macro-level social process and phenomena by using quantitative models to statistically test for associations and causalities suggested by a clearly hypothesised social theory. By deploying theory-oriented macrosociology where it can best assure macro-level robustness and reliability, big data applications can be more relevant to and guided by social theory.

An essential read for sociologists with an interest in quantitative and macro-scale research methods, which also provides fascinating insights into Chinese society as a demonstration of the utility of its methodology.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Publisher
Routledge
Year
2021
Print ISBN
9780367758257
eBook ISBN
9781000412352

Part I

Introduction

1 Bringing big data to quantitative macrosociology

Introduction

Big data burst onto the scene of social science nearly a decade ago. Coined by Manovich (2011) to describe datasets too large to be stored and analyzed by conventional software and personal computers, the term has become a data-sensitive meme in fields as varied as business, sports, journalism, science, and public health, entailing a near-universal pivot toward data-driven research, business, and governance (Edelmann et al., 2020; Langlois, Redden & Elmer, 2015; Mayer-Schönberger & Cukier, 2013; Veltri, 2017). The unprecedented scope and scale of big data and the variety of qualities—including variety, velocity, volume, and values—that it can sort in the process of digitally recording the traces of social transactional activities make it a compelling subject for research into the “social world” (Kitchin & McArdle, 2016; Savage & Burrows, 2007).
In the field of sociology, big data brings with it both high expectations and heated debate. On the one hand, it represents an enormous new source of “digital footprints” comprising individual actions and social transactions among billions of people in real and historic time, along with a battery of new approaches to collect, describe, and analyze them (Halford & Savage, 2017; McFarland, Lewis & Goldberg, 2016; Watts, 2012). This unprecedented wealth of information greatly accelerated expectations for its potential application to social science research and scholarship, suggesting that the very foundation of empirical studies in social science would be reconstructed (King, 2014).
Many scholars have pointed to the significance of big data in arming sociologists with access to new research resources and opportunities. For example, Lazer and Radford (2017) summarized five opportunities that big data can offer sociologists, namely, accessing meaningful social behavior, monitoring social phenomena, analyzing data on social systems, providing data for experiments, and supporting data heterogeneity. Evans and Aceves (2016) surveyed computational approaches for large-scale analyses on textual data, highlighting the use of machine learning for theorizing the nature of collective attention, social relationships, and communication lurking in enormous volumes of archives. Many robust big data analyses have emerged in recent years, focusing on the application of multiple-source big data to diverse topics in core areas of contemporary sociology. Overall, as Burrows and Savage (2014, p. 5) pointed out, “sociologists need to be prepared to intervene in the world of Big Data in order to ensure we command a voice in this new terrain.”
On the other hand, despite its promise, big data analytics in sociology has two key limitations. One is that without the theoretically informed and context-driven research that come from domain expertise, the purely computational approaches of big data analytics can cause research to devolve into speculative data mining. For sociology, big data applications relying on black-box tools conflict with the hermeneutic tradition that is at the core of the discipline (Kitchin, 2014a; Pasquale, 2015).
The other limitation of big data analytics is that despite its size, big data can still be biased; the agents, applications, and devices producing and collecting the data can themselves be either selective or manipulated. This points to the paradox that despite its name, big data is likely to be either “small,” representing only a subset of social transactions among particular demographics and thereby capturing partial and/or fragmented information (McFarland & McFarland, 2015; O’Brien, 2016; Park & Macy, 2015; Shaw, 2015); or “artifactual,” whereby social forces, including censorship, political robots, and system error manipulate the process of information production, leading to the proliferation of artifacts, errors, and anomalies (see Lazer & Radford, 2017).
Sociology is now at a crossroads. Although pressured by burgeoning intellectual forces, in particular those harnessing computational approaches and engaging with big data, sociologists still lack a clear road map leading to effective integration of big data analytics with contemporary sociology. Their resistance has much to do with skepticism born of the deficiencies in approaches to big data. More importantly, sociologists need to find some mode of study that can lead to something more than mere fancy analytical tools and exciting results; we need tools that lead to clear solutions, and we need templates for research that formally link data, theory, and methodology in more robust, scientific, and sociological ways. Put simply, we need to choose precisely where to insert big data into a range of key facets of empirical sociology—whether it should best be used to portray big pictures, unveil hidden structures, verify null hypotheses, or infer causality.
The answer is first to turn back to the data themselves and to ask not what makes big data exciting, but rather which dimensions of sociology big data is most aligned with. More precisely, can big data be a kind of macro-data? What is big data’s advantage when compared with other solutions in sociological inquiry, such as assembling survey data? In this chapter, we will address these concerns and show that the empirical strength of big data can be expected to elicit the emergence of a new type of research that we have so far largely ignored in the territory of empirical sociology: theory-guided quantitative macrosociology.
For sociology, despite an initial surge of interest and a powerful residual skepticism, big data has been expected to offer insights into each subfield of the discipline, not only because each facet of our daily lives has been penetrated in real time and over time by sophisticated big data apparatuses, but also because the recorded social environment—the entirety of human behavior, interaction, and thought—constitutes a panoramic data repertory that offers us a rare opportunity to inspect society in an entirely new way. It is important to note that big data is a composite of myriad transactions of myriad individuals. This reminds us that despite early claims that the sheer size of big data can attenuate many of its cons and biases (Mayer-Schönberger & Cukier, 2013), ultimately it is not the size of big data that matters but the ontological level of information that we can extract from it. That is, we should critically interrogate available big data to harness its strength at the macro-level and from a macro-perspective.
Theory-guided quantitative macrosociology has made notable inroads in its integration of big data in macro-level analysis. This novel approach has the potential to contribute to sociological studies by exploiting distant reading to get a big picture of the sizable unread portions of the corpus, which cannot be achieved by traditional qualitative approaches featuring close reading on selected archives and quantitative methods defined by model regressions on limited surveyed samples. The rich spatial and temporal dynamics available through this line of research is extremely promising.

Data assemblage versus big data

Sociologists today are daunted by the same big questions that consumed sociologists in the mid-twentieth century, including the relation between economy and culture, the factors that lead to social inequality, and whether and why social behaviors can be contagious. This is because when focusing on society from an ecological or systematic perspective, no single information package is sufficiently informative to capture the big picture over large temporal and spatial scales. Consequently, to explore the configuration and regulation of sociocultural environments, macro-sociologists tend to bypass quantitative methodology and resort to abstract theory constructs, which in turn often invite criticism for inducing tautology and ambiguity. While there are certainly some exceptional macro-analyses using quantitative approaches, particularly some transnational analyses in the traditional fields of sociology such as social stratification and inequality, macro-analyses remain relatively rare compared with individual or micro-level regressions, which are predominant in the arena of quantitative sociology, thanks to the availability of a vast amount of well-designed social surveys and the lack of data about macro-social indicators. This has cast a shadow across the entire realm of macrosociology, despite the claim of self-sufficiency that macrosociology shares with philosophy and the humanities.
There are two ways to tackle this problem. One, proposed by Halford and Savage (2017), is called “symphonic social science,” a term proposed to label a new methodology making use of data assemblage to test big theories. The other is big data itself, some inspiring empirical applications of which have been introduced in sociological areas.
Because accessing and deploying various sources of surveyed sample data is relatively easier than harnessing big data, assemblage of survey data has a distinct advantage; in fact, it can even be seen as a type of comparison analysis. Halford and Savage (2017) argued that the symphonic research paradigm in effect combines micro- and macro-level research and integrates information from conventional survey, regression statistics, and ethnographic and interview data under the same framework. By exploring the contradictions and complementarities of findings from diverse datasets, sociologists can pursue the understanding of major social questions in a symphonic way.
Specifically, Halford and Savage (2017) used three well-known books to illustrate symphonic social science research: Thomas Piketty’s Capital in the Twenty-First Century (2013), Robert Putnam’s Bowling Alone (2000), and Richard Wilkinson and Kate Pickett’s The Spirit Level (2011). The three works similarly deployed large-scale heterogeneous data assemblages and repurposed findings from multiple data sources instead of representative samples or ethnographic case studies. The three books thus “relied on the deployment of repeated ‘refrains,’ just as classical music symphonies introduce and return to recurring themes, with subtle modifications, so that the symphony as a whole is more than its specific themes” (Halford and Savage, 2017, p. 4). Compared to conventional sociology using formal models and championing parsimony, symphonic social science draws on a more aesthetic repertoire and sets more store in prolixity.
Still, Halford, and Savage (2017) conceded that symphonic projects are time-consuming and that they require significant workload and resources. The scope of those projects also demands long-form presentation, such as books rather than shorter works such as articles, to allow for the derivation of argument from empirical and theoretical resources. More importantly, assembling conventional survey data can only construct a data repertory containing information from surveyed samples. This suggests that data assemblage improves merely the scale of data, not the informativity of data. In this regard, key factors of a macro-analysis of interest, often featured by large-scale temporal and spatial scale, are very likely to be unavailable in conventional survey datasets. Big data therefore matters more for macrosociology.

Putting big data at the heart of macrosociology

Sociologists have long recognized the enormous potential of using big data to dissect social process and phenomena. In the last decade, especially over the past five years, pioneering sociologists have endeavored to link theory, data, and computational algorithms as a composite whole to gain sociological insight (Berman & Hirschman, 2018). In this section, we group reviews of works empirically exploring two aspects of big data applications: how to operationalize core theoretical constructs and map a big picture for sociocultural structures and trends; and how to quantify a certain variable that is hard to measure using survey data, for the sake of testing theories using conventional regression models. Although these two tasks are big-data-driven and theory-guided, the respective studies are organized and presented in different ways. This divergence has largely been ignored in present debates about big data’s application in social science.

Charting the sociocultural milieu for theorizing

For scholars and researchers determined to systematically examine the sociocultural milieu as a composite whole, big data is an uncontested resource. Almost all core constructs of macrosociology, such as social system, collective action, discourse, field, expression, and contagion, lurk in colossal volumes of digital archives, and many scholars have advocated mobilizing big data to help uncover and measure sociocultural meaning in digitalized and semantic archives (Bail, 2014; DiMaggio, 2015; Frade, 2016; Halford, Pope & Weal, 2012; Halford & Savage, 2017; Lee & Martin, 2015; MĂŒtzel, 2015). For example, a special issue in the journal Poetics was devoted to the theme of applying an array of topic models in cultural sociology, tracing the ontological tradition back to content analyses pioneered in the 1950s (Mohr & Bogdanov, 2013). The essence and strength of large-scale textual analysis lies in the synthesis adjoining conventional qualitative methods and novel computational techniques for big data analytics (Bail, 2014; Nelson, 2019), which can be counted on to advance our understanding of sociocultural processes.
As a result, cultural sociology is among the first sociology subfields to engage with big data, and it has made substantial progress in harnessing several computational approaches, ranging from accessing huge unstructured data to measure sociological meaning, to lifting the methodological capacity to empirically develop, derive, refine, and test sophisticated theories of the social origins of meaning, and to explore important theoretical constructs. Some have used a range of topic models to reveal how social position and structure (e.g., gender, organizations, and identities) work in shaping cognitive frames, discourse, and social logics in cultural archives, including organizational publications, governmental documents, academic journals, newspapers, and literature (Bail, 2012; DiMaggio, Nag & Blei, 2013; Jockers & Mimno, 2013; Mohr et al., 2013). Some have used large book corpora to map the temporal trends of tangible and intangible sociocultural phenomena and entities over a period of hundreds of years for a distant reading and comparison (Chen & Yan, 2016a, 2016b, 2018; Chen, Yan & Zhang, 2017; Chen, Yan et al., 2020; Chen, He et al., 2020; Guggenheim, 2014; Kozlowski, Taddy & Evans, 2019; Michel et al., 2011). Others have uncovered the hidden links among cultural products, such as published academic articles or music videos on YouTube or Twitter, to explore the evolution of networks as a whole and to extend relevant theories (Airoldi, Beraldo & Gandini, 2016; Foster, Rzhetsky & Evans, 2015; Goldenstein & Poschmann, 2019; Rzhetsky et al., 2015; Tangherlini & Leonard, 2013; Tinati et al., 2014).
These studies tend to provide an overview of social processes of interest in which operationalizing theory constructs serves to chart the milieu for theorizing. We know sociological theory can be divided into two subsets: concepts that trace social entities, and relationships that link and structure social entities. Although theory testing, especially testing the relationship between two social entities, remains central to quantitative research, big data can augment this line of analytical focus and clarify social concepts and structures by also “figuring out how to structure a mountain of data into meaningful categories of knowledge” (Goldberg, 2015, p. 3). In this mode of sociological investigation, sociologists with methodological expertise employ theorized concepts and structures to direct the process of exploiting the richness of big data. In turn, data directs the further investigations and the process of interpretation and theoretical derivation, just as Kitchin (2014b, p. 6) proposed: “Many supposed relationships within data sets can be quickly dismissed as trivial or absurd by domain experts, with others flagged as deserving more attention.”

Quantifying elusive indicators for theory testing

Two studies using textual analysis tools merit close inspection to show how big data analysis can help theory testing. One is Jockers and Mimno’s (2013) study on themes of 3,000 nineteenth-century works of fiction from the United Kingdom and the United States, using a topic model to reveal the topics of historic literature. The other is Bail’s (2012) investigation on how fringe anti-Muslim organizations influenced media discourse and became part of mainstream media, using discourse frames in the news media to quantify certain variables for further theory testing after a distant reading of the meaning of the large volumes of text. In both studies, textual analysis served merely as an instrument to quantify variables that are essential for model regression as the primary analysis.
Jockers and Mimno (2013) investigated the relationship between literary themes and sociodemographic attributes, such as authors’ gender, using an assembled corpora containing 3,279 works of fiction from the United States and Great Britain (including Ireland, Scotland, and Wales) from 1750 to 1899. They found that when themes had been identified through topic-modeling technology and assigned to each work, some themes exhibited a one-gender-dominant feature of the authors, suggesting that men and women might have chosen different themes in composing their fiction. For example, the authors of works categorized under the theme “female fashion” were mostly females, while the authors of works categorized as “enemies” were mainly males (the gender ratio of a given theme can be computed by comparing the proportions of words written by female and male authors that are assigned to the same theme).
However, to assert the presence of a skewed gender ratio for a certain theme, one needs more information about the range of proportions of male and female authors for this theme, because even if there were no underlying gender difference in topic use, it is still unlikely to observe an evenly divided (50:50) distribution. In the language of statistics, one needs to test for the null hypothesis that there is no gender distinction by estimating the probability of observing a gender difference under the framework of randomness. Therefore, having identified a range of topics as themes of the works of fiction on the c...

Table of contents

  1. Cover
  2. Half-Title
  3. Series
  4. Title
  5. Copyright
  6. Contents
  7. List of figures
  8. List of tables
  9. Preface
  10. PART I Introduction
  11. PART II Mapping public discourse and social stratification
  12. PART III Portraying social transformations and cultural practice
  13. PART IV Revealing public health and community wellness
  14. References
  15. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Understanding China through Big Data by Yunsong Chen,Guangye He,Fei Yan in PDF and/or ePUB format, as well as other popular books in Social Sciences & Regional Studies. We have over 1.5 million books available in our catalogue for you to explore.