This edited collection brings together contemporary research that uses corpus linguistics to carry out discourse analysis. The book takes an inclusive view of the meaning of discourse, covering different text-types or modes of language, including discourse as both social practice and as ideology or representation.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Corpora and Discourse Studies by Anthony McEnery, Paul Baker, Anthony McEnery,Paul Baker in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Computer Science General

Index

Computer Science

1 Introduction

Paul Baker and Tony McEnery

This book houses a collection of 13 independent studies which use the corpus linguistics methodology in order to carry out discourse analysis. In this introductory chapter we first introduce the two main concepts of the book, corpus linguistics and discourse analysis, and cover the advantages of combining the two approaches. After discussing the existing key research and debates in this relatively new field we then outline the remainder of the book’s three-part structure with a brief description of each chapter.

Corpus linguistics

Corpus linguistics is a powerful methodology – a way of using computers to assist the analysis of language so that regularities among many millions of words can be quickly and accurately identified. Coming from Latin, a corpus is a body, so we may say that corpus linguistics is simply the study of a body of language – in many cases a very large body indeed. Such a body may consist of hundreds or thousands of texts (or excerpts of texts) that have been carefully sampled and balanced in order to be representative of a specific variety of language (e.g. nineteenth-century women’s fiction, British newspaper articles about poverty, political speeches, teenager’s text messages, Indian English, essays by Chinese students learning English). In order to facilitate more complex forms of analysis, many corpora are ‘tagged’, i.e. have explicit linguistic analyses introduced into them, usually in the form of mnemonic codes. This is often done automatically via computer software (for example, Amanda Potts in Chapter 14 uses a corpus of news articles tagged by a computer program called the USAS English tagger), although we note that in this volume Dan McIntyre and Brian Walker (Chapter 9) hand-tagged their corpus for different categories of discourse presentation as software was not able to make the distinctions they required. Automatic tagging performs well (although not at 100% accuracy) at grammatical or semantic tagging. For example, all of the words in a corpus may be automatically assigned codes which indicate their grammatical part of speech (noun, verb, adjective etc.) or which semantic group they are from (living things, conflict, economics etc.). Tagging can also occur at the level of the text itself, for example, all texts may be tagged according to the gender of the author, allowing us to easily separate out and compare language according to this variable.

Using specially designed software in conjunction with a corpus, analysts are given a unique view of language within which frequency information becomes highly salient. Hence it is no surprise that the concept of frequency drives many of the techniques associated with corpus linguistics, giving the field a quantitative flavour. Many of the chapters in this book employ two frequency-based techniques in particular – keywords and collocates. Keywords are words which are more frequent than expected in one corpus, when compared against a second corpus which often stands as a ‘reference’, usually being representative of a notional ‘standard language’. Keywords reveal words which may not be hugely frequent but are definitely statistically salient in some way. Collocation involves the identification of words which tend to occur near or next to each other a great deal, much more than would be expected if all the words in a corpus were ordered in a random jumble. Native speakers of a language have thousands of collocates stored in their memories and hearing or reading one word may often prime another, due to all of our previous experiences of hearing that word in a particular context. From an ideological point of view, collocates are extremely interesting, as if two words are repetitiously associated with each other, then their relationship can become reified and unquestioned (Stubbs, 1996: 195).

While the earliest stages of a corpus analysis tend to be quantitative, relying on techniques like keywords and collocates in order to give the research a focus, as a research project progresses, the analysis gradually becomes more qualitative and context-led, relying less on computer software. Once quantitative patterns have been identified, they need to be interpreted and this usually involves a second stage of analysis where the software acts as an aid to the researcher by allowing the linguistic data to be quickly surveyed.

For example, we may be interested in how many texts a word or feature occurs in, or whether it tends to occur at the beginning, middle or end of a text. Corpus tools often allow measures of dispersion to be taken into account, sometimes using a visual representation of a file, which can resemble a bar code, with each horizontal line indicating an occurrence of a particular word. Knowing if a word or feature is well-distributed across a corpus, or simply frequent because it occurs very frequently in a few texts, can be one way of understanding the context in which it is used. As well as position, it is essential to ascertain the way that the feature is used in the context of every utterance, sentence or paragraph it occurs in. A concordance table is simply a table of all of the occurrences of a word, phrase or other linguistic feature (e.g. grammatical or semantic tag) in a corpus, occurring with a few words of context either side. Concordance tables can be sorted, for example by ordering the table alphabetically according to the word immediately to the right or left of the word we are analysing. This helps to group together incidences of a word that occur in similar contexts so interpretations can be more easily made. In cases where a word may occur thousands of times, we may only want to examine a smaller sample of concordance lines, so again the software will randomly reduce or ‘thin’ the number of lines to a more manageable amount.

Such tools enable more qualitative forms of analysis to be carried out on corpora, although we argue that a third stage of analysis – explanation – involves positioning our descriptive and interpretative findings within a wider social context. This can mean engaging with many other forms of information. For example, analyses of twentieth-century English writing from many genres might show that over time people appear to be using second person pronouns more often.¹ Such a finding could be shown via analysis of frequency and keyword lists. Dispersion analyses may indicate that such pronouns are reasonably well dispersed over different registers of writing, although seem to have especially become more frequent over time in informational and official texts. Further analysis of context via reading concordance lines may indicate that they seem to be used to indicate a personal relationship between author and reader. However, such findings would need to be positioned in relationship to social context – what do we know about social developments in the twentieth century? Can phenomena like a move towards relaxed and more informal social conventions, a tendency to denote a less hierarchical style of address, a desire to make language more accessible or even increased use of persuasive language due to the capitalist imperative to position everyone as a consumer help to explain our finding about pronouns? If the aim of our research is to be critical or inspire social change, then a fourth stage may be more evaluative, pointing out the consequences of such uses of language (asking ‘who benefits?’ or who is potentially disempowered), perhaps making recommendations for good practice.

Corpus analysis does not need to critically evaluate its findings, and we argue that ‘curiosity’-based (as opposed to ‘action’-based) research has an important role to play in linguistics. Despite all of the chapters in this collection of corpus studies being positioned as research on discourse, and all of them engaging with description and interpretation stages, some move into explanation and critical evaluation too. This is due to the fact that there is more than one way of doing discourse analysis, as the following section will show.

Discourse analysis

‘Bid me discourse, I will enchant thine ear’

(Shakespeare, Venus and Adonis)

Somewhere between Shakespeare’s uplifting use of the word, and today, the word discourse has suffered something of an identity crisis. While the term language is largely understood to non-linguists, discourse can be an excluding shibboleth which does little to make academic research accessible or relevant to people who do not work or study in the social sciences. Part of the problem is that even among social scientists the term has a wide set of overlapping meanings. Compare the claim by Stubbs (1983:1) that discourse is ‘language above the sentence or above the clause’ with Fairclough (1992: 8) ‘Discourse constitutes the social … Discourse is shaped by relations of power, and invested with ideologies.’

And within this edited collection, an examination of some of the collocational patterns of discourse is revealing of its multiplicity of meanings. Sally Hunt (Chapter 13) refers to gendered discourses and discourse prosody, Jack Hardy (Chapter 8) uses discourse community (as do we in Chapter 12), Karin Aijmer (Chapter 5) analyses discourse markers, Dan McIntyre and Brian Walker (Chapter 9) refer to discourse presentation, while Daniel Hunt and Kevin Harvey (Chapter 7) mention medicalising discourse. As many of the chapters utilise somewhat different understandings of discourse, it is pertinent to ask what they have in common. One answer is that they broadly undertake to examine ‘language in use’ (Brown and Yule, 1983), a concept which is ideally suited to the corpus linguistic undertaking to base analysis on large collections of naturally-occurring language. In its highest sense then, all of corpus linguistics is discourse analysis.

Therefore, the chapters in this book were chosen because they demonstrate the range of different conceptualisations of discourse that corpus linguists have utilised, indeed Daryl Hocking (Chapter 10) works with two definitions of discourse, one following Candlin (1997) as relating to the semiotic resources used by people to carry out practices that shape their professional, institutional and social worlds, the other based on resources used to represent practices or objects.

In Chapter 3 Svenja Adolphs, Dawn Knight and Ronald Carter view discourse in the sense of being all forms of ‘language in use’ while others more closely associate discourse with genres or registers of language use – so this could be used to refer to spoken discourse (Karin Aijmer in Chapter 5) or digital discourse (Dawn Knight in Chapter 2). Linked to this notion of discourse are more specific subdivisions, such as American presidential discourse, which Cinzia Bevitori (Chapter 6) characterises as a sub-category of political discourse. American presidential discourse would cover language used by American presidents, presumably in public settings (e.g. speeches, press releases, interviews). Bevitori also refers to environmental discourse, which could be viewed as language around the topic of the environment, and such a topic could potentially occur across a range of different genres or registers of language. However, other chapters, particularly those towards the end of this collection, conceptualise discourse from a more Foucauldian perspective, where discourses are seen as ways of looking at the world, of constructing objects and concepts in certain ways, of representing reality in other words, with attendant consequences for power relations e.g. involving gender (Sally Hunt in Chapter 13), ethnicity (Alan Partington in Chapter 11, Amanda Potts in Chapter 14) or social class (Paul Baker and Tony McEnery in Chapter 12). Three of these four chapters follow a critical discourse analysis framework in that research has been carried out in order to highlight inequalities around the ways that certain groups are represented.

An issue with traditional methods of critical discourse analysis relates to the ways that texts and features are chosen for analysis, with Widdowson (2004) warning that ‘cherry-picking’ could be used to prove a preconceived point, while swathes of inconvenient data might be overlooked. The principles of representativeness, sampling and balance which underline corpus building help to guard against cherry-picking, while corpus-driven techniques like keywords help us to avoid over-focussing on atypical aspects of our texts. Corpus techniques can thus reassure readers that our analysts are actually presenting a systematic analysis, rather than writing a covert polemic.

However, an advantage of corpus-driven approaches means that techniques intended for objectively uncovering the existence of bias or manipulation in language can also be carried out from a discourse analysis perspective where the aim is not necessarily to highlight such problems. Alan Partington’s chapter, for example, examines representations of Arabs in press articles but the investigation is not based on an expectation that problematic representations are necessarily ‘out there’ to be uncovered. Partington instead takes a more prospecting approach, bearing in mind that in terms of news values, negative reporting is to be expected so a distinction needs to be made between negative and prejudiced representation. Corpus techniques can help us to distinguish between the two, particularly if we make comparisons between different groups or different press outlets. While Potts, Hunt, and Baker and McEnery all position their research as coming from a critical discourse analysis perspective, Partington defines his research as CADS (Corpus-Assisted Discourse Studies) – note the absence of the word critical.

Kevin Harvey and Daniel Hunt (Chapter 7) also offer an interesting perspective on corpus approaches to critical discourse analysis. Their chapter examines the online language of people who suffer from eating disorders – but this is not a traditional CDA study that aims to highlight how a powerful text producer unfairly treats a less powerful group. Instead the analysis shows that some people personalise their disorder as ‘talking’ to them. Harvey and Hunt discuss how such a representation can both help to mitigate the stigma around the illness and provide support to others but it may also constrain understandings that afford more control to the person with the illness. However, in positioning their research as critical, they cite Toolan (2002), who argues that a critically motivated analysis can focus on discourses that are simultaneously enabling and disempowering. The point we wish to make here is that corpus linguistics is extremely well-placed to enable discourse analytical research to be carried out from a range of different ‘starting positions’, depending on the meaning(s) of discourse we wish to work with.

The development of a synergy

The relationship between corpus linguistics and discourse analysis has been in development for a quarter of a century, focussed on different groupings over time. The paragraphs below give a vaguely sequential summary of some of the main proponents of what has been referred to more recently as a ‘synergy’, although it is admittedly brief and thus incomplete; apologies are made in advance to anyone who is missed.

The early work in the field tended to use untagged corpora and was often highly reliant on concordance analyses. Pioneering work was connected to the University of Birmingham in the early 1990s, coming out of early research in corpus linguistics by John Sinclair and taken up by Michael Stubbs, Susan Hunston, Bill Louw, Ramesh Krishnamurthy, Wolfgang Teubert and Carmen Caldas-Coulthard, among others. While corpus research at Birmingham had initially been focussed at the lexical and grammatical levels, an early theoretical concept was that of prosodies. Sinclair (1991) showed how the verb phrase set in had a negative prosody, tending to co-occur or collocate with negative associations like rot. While set in has no intrinsically negative meaning in itself, it is hypothesised that people unconsciously remember the contexts that they have heard it in the pa...

Cover
Title
Copyright
Contents
List of Figures and Tables
Series Editor’s Preface
Notes on Contributors
1 Introduction
2 e-Language: Communication in the Digital Age
3 Beyond Modal Spoken Corpora: A Dynamic Approach to Tracking Language in Context
4 Corpus-Assisted Multimodal Discourse Analysis of Television and Film Narratives
5 Analysing Discourse Markers in Spoken Corpora: Actually as a Case Study
6 Discursive Constructions of the Environment in American Presidential Speeches 1960–2013: A Diachronic Corpus-Assisted Study
7 Health Communication and Corpus Linguistics: Using Corpus Tools to Analyse Eating Disorder Discourse Online
8 Multi-Dimensional Analysis of Academic Discourse
9 Thinking about the News: Thought Presentation in Early Modern English News Writing
10 The Use of Corpus Analysis in a Multi-Perspectival Study of Creative Practice
11 Corpus-Assisted Comparative Case Studies of Representations of the Arab World
12 Who Benefits When Discourse Gets Democratised? Analysing a Twitter Corpus around the British Benefits Street Debate
13 Representations of Gender and Agency in the Harry Potter Series
14 Filtering the Flood: Semantic Tagging as a Method of Identifying Salient Discourse Topics in a Large Corpus of Hurricane Katrina Reportage
Index

About this book

Frequently asked questions

Information

1

Introduction

Corpus linguistics

Discourse analysis

The development of a synergy

Table of contents