Introduction
Corpus linguistics is a research approach that facilitates holistic investigations of language variation and use, producing a wealth of reliable and generalizable linguistic distributions that can be extensively analyzed and interpreted (Biber, Conrad, & Reppen, 1998; Friginal & Hardy, 2014). The corpus approach to discourse analysis (DA) utilizes methodological and computational innovations that allow scholars to ask novel, frequency-based research questions on existing linguistic phenomena across many speaking and writing contexts. The answers to these research questions produce information and perspectives that may either complement or reject assumptions from those taken in traditional discourse-analytic investigations. In addition, corpora can also provide a stronger argument for the view that variation in discourse is systematic, yet fluid, and can be described using empirical, quantitative methods (Biber, 1988; 1993; Friginal & Bristow, 2017). This argument is critical to DA studies that entail extensive technical, multifaceted data in order to broadly explain the interface between linguistic parameters existing within social, cultural, and contextual realms.
As presented in the original, invited chapters meticulously written for this handbook, the many characteristic features of spoken and written discourse have been described and interpreted using corpora, both on the macro and micro levels, producing a wide range of diverse and overlapping, often fascinating results. With corpus-based approaches, measurable and interpretable linguistic distributions and frequency data lead to the clearer identification of patterns and tendencies within general and specific discourses. By examining the role of these patterns on how discourses are formed or used, researchers are able to further illustrate, comprehend, and also deeply experience the reality that everyday discourse is amazingly varied, dynamic, and influenced by numerous contextual factors. The fact is that no one speaks or writes the same way all the time, and individuals constantly exploit the nuances of the languages they speak and write for a wide variety of purposes (Friginal & Hardy, 2014; Meyerhoff, 2004; Friginal & Bristow, 2017). Everyday discourse is pragmatic, practical, evolving, and unique to individuals or groups of connected individuals. Logical and sound conclusions and generalizations, therefore, can be formalized from corpus-based DA studies, along with their practical and pragmatic implications.
Frequency-based DA studies that are systematically accomplished using corpora may adhere to the following methodological parameters, adapted from Biber, Conrad and Reppen (1998), and Friginal (2018). These studies:
1.are empirical, analyzing the actual patterns of use in natural texts;
2.utilize a principled collection of naturally occurring speech or writing (i.e., the corpus) as the basis for analysis and interpretation;
3.make extensive use of computers and corpus tools for analysis, employing both automatic and interactive techniques;
4.rely on the combination of quantitative and qualitative analytical procedures, making use of evidence directly obtained from the corpus.
It is important to remember that, although corpora offer measurable descriptions of texts and registers, the researcher and subsequent consumers of these studies must still interpret these corpus-based findings (and, especially, answer the question: So what?) as accurately and consistently as possible (Friginal & Hardy, 2014; Biber et al., 1998). Extensive knowledge of the literature, related analytic approaches, and awareness of the clear limitations of computational tools must always be foregrounded. Interpretive techniques honed by discourse analysts over the years are certainly invaluable to the discipline. To summarize, corpus approaches can often be used in tandem with qualitative and discourse analytic methods, and corpora and frequency data can be statistically tested to figure out if a consistent, significant pattern exists. These concepts are successfully illustrated in the chapters collected for this handbook.
Key sections of the handbook and chapter overviews
One of the key elements in Sinclair’s (2005) definition of a corpus is that the collection of texts is used to represent a language or language variety. In other words, corpora are created for the purpose of better understanding a particular type of discourse. Thus, specific texts that together can serve as a characteristic example of the target variety or target domain are needed. Corpora are generally not created without particular research questions in mind. Corpora are planned, collected, organized, and analyzed in ways that discourse analysts have thought of studying from the inception of the idea to create them (Friginal & Hardy, 2014). These texts are important data, leading the analyst to make sense of structures and combinations of patterns across domains of discourse. There are multiple analytical options opening up during the process and increasingly becoming available to discourse analysts from computer programmers engaged in automatic processing of naturally occurring language. Vocabulary and syntactic distributions are now easily obtained and extracted from corpora and linguistic paradigms are further enhanced by text samples or extracts, a documentation of speaker role or demographic information, and opportunities to develop sound assumptions that may lead to significant decisions by individuals in broader social and policy-based research (Pickering, Friginal, & Staples, 2016).
The chapters collected for The Routledge Handbook of Corpus Approaches to Discourse Analysis underscore the diversity, breadth, and depth of corpus approaches to discourse analysis, compiling new and original research studies from notable scholars at major universities across the globe. This volume showcases recent developments influenced by the exponential growth in linguistic computing in the past several years, advances in corpus design and compilation, and the applications of reliable and accurate qualitative and interpretive techniques in analyzing patterns of spoken and written discourse, the two modes of language explored here. There are 34 empirical chapters organized into five primary sections. These are studies of (1) naturally occurring spoken, professional, and academic discourse; (2) (scripted) spoken discourse; (3) academic written discourse; (4) professional written discourse; and (5) media discourse. These sections are examined in depth by expert applied linguists and discourse analysts specializing in the study of business meetings, nurse–patient interactions, pilot–air traffic controller interactions, AAC in the workplace interactions, academic spoken language, academic interactions, press briefings, congressional hearings in the U.S., television news, diachronic film scripts and English dialogues from 1560–1760, student writing, research articles, literature and literary style in Portuguese, Twitter and social media, internet chatter (Reddit, Twitter), digital advertisements, historical books, legal statutes, U.S. language policy, British law reports, various newspaper texts, online humor (The Onion headlines), and entertainment news (in India). All these discourse domains are adequately represented by specialized corpora collected by the authors and explored using innovative approaches such as corpus-based multidimensional analysis, keyword and collocational analysis, informational and canonical discriminant function analysis, and related natural language processing techniques. Although English is the primary language in the volume (including English in L2 classroom and professional settings and world Englishes), non-English discourses in Spanish and Brazilian Portuguese are also included.
Several chapters make use of critical approaches to discourse analysis (CDA), and their various discussions highlight power structures and control or lead to careful recommendations for policy changes and improvements in pedagogy (e.g., classroom teaching and training in professional or workplace settings). For example, Kandil (Chapter 33) conceptualized a CDA study of the representation of the State of Qatar in Okaz, one of the most widely circulated newspapers from Saudi Arabia, before and after a 2017 political crisis between the two countries. Keyword lists, collocational patterns, and visual representations of data are presented and discussed, indicating a polarized representation of Qatar, with coverage after the blockade focusing mainly on negative aspects in support of targeted political claims. Related to language policy, corpus-based approaches have focused on the discourse level by examining how the language of the policy itself reproduces certain ideologies and inequalities through its discourse (Fitzsimmons-Doolan, 2019; Gales, 2009; Subtirelu, 2013), as noted by Lian in Chapter 9. The application of corpus-based approaches in this domain, therefore, provides a glimpse into the ideologies of policymakers, politicians, and individuals who create legislation for people they may or may not fully represent. Wilkinson (Chapter 32) documented the important contributions of corpus-based approaches to the analysis of LGBT identities in the past 20 years (especially the works of Baker 2005, 2014, and 2017) that strongly enhanced DA studies that identify discursive and semantic variation to questions of representation concerning identity.
The Routledge Handbook of Corpus Approaches to Discourse Analysis is key reading for both experienced and novice researchers working at the intersection of corpus linguistics and discourse analysis, as well as anyone interested in related fields and adjacent research approaches.
Bibliography
Baker, P. (2005). Public discourses of gay men. Abingdon: Routledge.
Baker, P. (2014b). “Bad wigs and screaming mimis”: Using corpus-assisted techniques to carry out critical discourse analysis of the representation of trans people in the British Press. In C. Hart & P. Cap (Eds.), Contemporary critical discourse studies (pp. 211–235). London: Bloomsbury.
Baker, P. (2017). Sexuality. In E. Friginal (Ed.), Studies in corpus-based sociolinguistics (pp. 159–177). New York: Routledge.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–257.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
Biber, D., Reppen, R., & Friginal, E. (2010). Research in corpus linguistics. In R.B. Kaplan (Ed.), The Oxford handbook of applied linguistics (2nd ed., pp. 548–570). Oxford: Oxford University Press.
Fitzsimmons-Doolan, S. (2019). Language ideologies of institutional language policy: Exploring variability by language policy register. Language Policy. https://doi.org/10.1007/s10993- 018-9479-1
Friginal, E. (2018). Corpus linguistics for English teachers: New tools, online resources, and classroom activities. New York: Routledge.
Friginal, E., & Bristow, M. (2017). Corpus approaches to sociolinguistics: Introduction and chapter overviews. In E. Friginal (Ed.)., Studies in corpus-based sociolinguistics. London: Routledge.
Friginal, E., & Hardy, J.A. (2014). Corpus-based sociolinguistics: A guide for students. New York: Routledge.
Gales, T. (2009). “Diversity” as enacted in US immigration politics and law: A corpus-based approach. Discourse and Society, 20(2), 223–240. https://doi.org/10.1177/0957926508099003
Meyerhoff, M. (2004). Introducing sociolinguistics. New York: Routledge.
Pickering, L., Friginal, E., & Staples, S. (Eds.) (2016). Talking at work: Corpus-based explorations of workplace discourse. London: Palgrave Macmillan.
Sinclair, J. (2005). Corpus and text – basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 1–16). Oxford: Oxbow Books.
Subtirelu, N.C. (2013). “English… it’s part of our blood”: Ideologies of language and nation in United States Congressional discourse. Journal of Sociolinguistics, 17(1), 37–65.