Using Corpora to Analyze Gender
eBook - ePub

Using Corpora to Analyze Gender

Paul Baker

Share book
  1. 224 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Using Corpora to Analyze Gender

Paul Baker

Book details
Book preview
Table of contents
Citations

About This Book

Corpus linguistics uses specialist software to identify linguistic patterns in large computerised collections of text - patterns which then must be interpreted and explained by human researchers. This book critically explores how corpus linguistics techniques can help analysis of language and gender by conducting a number of case studies on topics which include: directives in spoken conversations, changes in sexist and non-sexist language use over time, personal adverts, press representation of gay men, and the ways that boys and girls are constructed through language. The book thus covers both gendered usage (e.g. how do males and females use language differently, or not, from each other), and gendered representations (e.g. in what ways are males and females written or spoken about). Additionally, the book shows ways that readers can either explore their own hypotheses, or approach the corpus from a "naïve" position, letting the data drive their analysis from the outset. The book covers a range of techniques and measures including frequencies, keywords, collocations, dispersion, word sketches, downsizing and triangulation, all in an accessible style.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Using Corpora to Analyze Gender an online PDF/ePUB?
Yes, you can access Using Corpora to Analyze Gender by Paul Baker in PDF and/or ePUB format, as well as other popular books in Langues et linguistique & Sémantique linguistique. We have over one million books available in our catalogue for you to explore.

Information

Year
2014
ISBN
9781472527073
CHAPTER ONE
Introduction
How many times did you say ‘I love you’ today?
In 2012, four days before Valentine’s Day, I was contacted by a journalist who worked for an international newspaper. He was writing an article ‘on linguistic differences between men and women’ and was primarily interested in ‘outlining the main linguistic differences, along with sentences/paragraphs as spoken by men/women to illustrate such differences’. He wanted me to give him a list of some gendered differences and he cited a study by Harrison and Shortall (2011), which had surveyed 171 college students and found that males report falling in love earlier than women and saying ‘I love you’ earlier.
I replied to the journalist, saying that it is difficult to create such a list as the amount of data needed would be enormous. We would require millions of words of spoken language data, taken from large numbers of people from a wide range of backgrounds and locations. It would be a good idea to sample data at various points in time to ensure that any differences we found were stable rather than being due to something specific about a particular period of a particular society. With regard to the paper that the journalist had cited, I suggested that perhaps we cannot generalize too widely from a study which used a relatively small number of participants (99 women and 72 men) who were of similar age and current circumstances (students) and asked to remember and then report on their own linguistic behaviour (Harrison and Shortall also refer to some of these issues in the discussion section of their article1). To illustrate, I sent the journalist some information about the phrase ‘I love you’ in the British National Corpus (BNC), a large reference corpus consisting of 100 million words of British English of which 10 million words are transcriptions of recorded conversations. For about 71 per cent of this conversational data we know whether the speaker was male or female. Although the BNC can only directly tell us about language use in British society at the point in time that the data was collected (the early 1990s), as one of the biggest sources of naturally occurring spoken language data, at the time of writing, it is still one of the best resources that corpus linguists have access to.
I found that I love you only occurred 64 times in the spoken part of the BNC, and while the female speakers in the corpus said it about three times as much as the males, the majority picture was that most of the speakers did not say ‘I love you’ (at least when they were being recorded). I offered, humorously, that people should perhaps be encouraged to say the phrase more often.
Unsurprisingly, the journalist didn’t reply. I had not produced a list of words and phrases which would either confirm stereotypes about gendered language use or refute them. Either way, such a finding could have constituted ‘news’. Instead, my response could be summarized as ‘there is not enough evidence to draw much of a conclusion’. February 14th was approaching and my response was unlikely to cohere with any narrative that the journalist wanted to create.
From gender difference to gendered discourses
The above anecdote is illustrative of a dissonance between academic research in the field of Gender and Language over the last 20 years or so, and public/media perceptions of gender and language. But this discrepancy was not always the case. The ‘gender differences paradigm’2 was actually an early academic approach, linked to Lakoff’s (1975) ‘male dominance’ theory of language use (the view that males used language to dominate women). Fishman (1977) expanded on and contributed to this theory by proposing that women engaged in what she memorably called ‘interactional shitwork’, which among other things involved using questions and hedges to force responses from men in order to facilitate conversation.
While Lakoff and Fishman focused more on the notion of male dominance than gender difference per se, there was an underlying assumption that for men to be dominant and women to be dominated, then the sexes must also use language differently. Towards the end of the 1980s, another approach, popularized by Tannen (1990) emphasized gender differences rather than male dominance. This was a perspective influenced by interactional sociolinguistics and broadly based around the view that males and females had distinct and separate ‘genderlects’ which resulted in ‘cross-cultural miscommunications’. Tannen argued that men viewed conversation as a contest, whereas women used conversation to exchange confirmation and support. On the surface, this ‘difference’ paradigm could be seen as a more politically neutral and thus uncontroversial3 way of thinking about gender and language. In eschewing second-wave4 feminist claims of patriarchal dominance, ‘gender difference’ does not characterize men as oppressors and women as victims, nor does it position anybody’s language use as ‘superior’ to anybody else’s. The difference paradigm instead views males and females as growing up in largely separate speech communities and learning different ways of socializing and using language. Linguistic gender differences are therefore used to ‘explain’ interpersonal conflict within (heterosexual) couples. Such conflicts are said to be due to misunderstandings as males and females attach different meanings to the same utterances as well as having different needs. Some proponents of the paradigm claim that the sexes need to be educated in order to understand each other’s language. ‘Difference’ is thus a ‘grand’ theory, simple to grasp, blame-free and offering an explanation and solution to couples’ conflict that is widely applicable. It is easy to see why it has become so popular, particularly in the media, spawning numerous relationship ‘self-help’ books and newspaper articles about amusing linguistic gender differences that seem to confirm what we already knew or suspected about men and women.
But while the gender differences paradigm is popular in the media, within academia, there has been a considerable amount of disagreement over whether men and women actually do use language differently to a significant degree, with some researchers arguing that differences do exist (e.g. Locke 2011), and others indicating that linguistic gender difference is a myth (e.g. Cameron 2008). Among those who argue for difference, there are a range of views about where such differences come from – perhaps they can be attributed to essential biological differences relating to chemicals in the brain, different reproductive systems or body musculature and size which can all impact on how people come to see themselves and are viewed by others. Possibly the differences are related to the ways that society socializes males and females differently, with different expectations regarding appropriate language behaviour for boys and girls. In the 1990s, taking a post-structuralist perspective, Judith Butler proposed that gender is performative – a form of doing rather than a form of being, so rather than people speaking a certain way because they are male or female, instead they use language (among other aspects of behaviour) in order to perform a male or female identity, according to current social conventions about how the sexes should behave. Butler pointed to female impersonators, showing that gender performances can be subverted and are therefore not intrinsically linked to a single sex. People learn what their correct gender performance for their sex should be by observing and copying other people around them. Therefore Butler (1990: 31) notes, ‘The parodic repetition of “the original”. . . reveals the original to be nothing other than a parody of the idea of the natural and the original.’ Butler also links gender performance to sexuality, referring to a ‘heterosexual matrix’ (ibid.: 5). She argues that ‘. . . for bodies to cohere and make sense there must be a stable sex expressed through a stable gender (masculine expresses male, feminine expresses female) that is oppositionally and hierarchically defined through the compulsory practice of heterosexuality’ (ibid.: 151).
Since the 1990s then, in the field of Gender and Language, there has been a move away from studies which shoehorn all males and all females into separate categories for comparison, and a shift instead towards research which has explored differences among women or differences among men, for example, by focusing on the ways that gender interacts with other identity categories (Eckert and McConnnell-Ginet 1992). Such an approach has also helped to formulate an alternative set of research questions, which focus around the ways that language use helps to create, reflect and challenge the societal conventions that Butler pointed to as influencing the ways that men and women talk. Terms like social convention and expectations can be related to the concept of discourse which Foucault (1972: 49) defines as ‘practices which systematically form the objects of which they speak’ while Burr (1995: 48) suggests that discourse is the production of ‘meanings, metaphors, representations, images, stories, statements and so on that in some way together produce a particular version of events’. Gill (1993: 166) has noted that language has increasingly become important across the social sciences, due to the ‘influence of post-structuralist ideas which stressed the thoroughly discursive, textual nature of social life’ and Cameron (1998: 947) points out that in fact, this ‘linguistic’ turn was mainly a turn to discourse analysis. Livia and Hall (1997: 12) argue that ‘. . . it is discourse that produces the speaker, and not the other way round, because the performance will be intelligible, only if it “emerges in the context of binding conventions”’.
Within the field of Gender and Language then, a number of key approaches have utilized the turn to discourse, including work in discursive psychology which combines elements from conversation analysis, ethnomethodology and rhetorical social psychology. Some researchers have introduced elements of post-structuralist theory or critical discourse analysis into discursive psychology, such as Edley and Wetherell (1999) who examined young men’s talk about fatherhood. Others have shown how techniques used in Conversation Analysis can be adopted for feminist research, for example Kitzinger (2008: 136) has shown how ‘gender – or sexuality, or power, or oppression – is produced and reproduced in interaction’. A different approach combines critical discourse analysis and feminist linguistics to form FCDA (Feminist Critical Discourse Analysis). FCDA involves the critique of ‘discourses which sustain a patriarchal social order: that is, relations of power that systematically privilege men as a social group and disadvantage, exclude and disempower women as a social group’ (Lazar 2005: 5). FCDA is thus concerned with outlining how language use sustains unequal gender relations, with its main aims being emancipation and transformation. While FCDA also has the remit of showing how taken-for-granted assumptions around gender can be negotiated and contested as well as (re)produced, a third approach offered by Baxter, is more firmly focused on negotiation. Baxter’s FPDA (Feminist Post-Structuralist Discourse Analysis) ‘suggests that females always adopt multiple subject positions, and that it is far too reductive to constitute women in general, or indeed any individual woman, simply as victims of male oppression’ (Baxter 2003: 10). Instead, FPDA involves close qualitative analyses of texts (often detailed transcripts of conversations) to show how participants (particularly those who may be conceived of as relatively powerless) can experience ‘moments of power’, whereas powerful people can be positioned as momentarily powerless.
A useful concept in the field of Gender and Language is the idea of gendered discourses, which Sunderland (2004) suggests can be identified through the analysis of traces in language use:
People do not . . . recognise a discourse . . . in any straightforward way . . . Not only is it not identified or named, and is not self-evident or visible as a discrete chunk of a given text, it can never be ‘there’ in its entirety. What is there are certain linguistic features: ‘marks on a page’, words spoken or even people’s memories of previous conversations . . . which – if sufficient and coherent – may suggest that they are ‘traces’ of a particular discourse. (ibid.: 28)
Sunderland acknowledges that the identification and naming of a gendered discourse is a highly subjective process, and her approach involves categorizing discourses in terms of their function (e.g. conservative, resistant, subversive or damaging) and relationship to each other (e.g. two discourses may be competing or mutually supportive or one may be dominant and the other subordinate). This relational aspect of discourse helps to explain why people can appear to be inconsistent in their positions as they may be drawing on conflicting discourses.
The above feminist approaches to discourse analysis also all place emphasis on intertextuality (relations between texts), interdiscursivity (relations between discourses) and self-reflexivity, advocating that researchers acknowledge their own theoretical positions and reflect on their research practices ‘lest these inadvertently contribute towards the perpetuation, rather than the subversion, of hierarchically differential treatment of women’ (Lazar 2005: 15).
Another aspect that many of the discourse-based types of Gender and Language research have in common, is that they often involve a ‘close’ or qualitative analysis of a small number of short texts (as well as taking into account the practices surrounding the creation, distribution and reception of those texts). There are good reasons for this, one being that the identification and critique of discourse is a complicated and often slow process, requiring attention to detail and consideration of many types of context (see Flowerdew forthcoming). As Mills (1998: 247–8) points out, the successes of feminism have helped to curb some of the more obviously harmful forms of sexist language use, although it could be argued that rather than being fully eradicated, sexist discourse has become increasingly more complex, sophisticated and ambiguous and thus more difficult to identify. Mills (ibid.) argues ‘What is necessary now is a form of feminist analysis which can analyse the complexity of sexism . . . now that feminism has made sexism more problematic.’
So while discourse analysis has become popular within Gender and Language, this has tended to be based on detailed qualitative studies using smaller excerpts of texts rather than approaches that involve techniques from Corpus Linguistics (described in more detail in the following section), which work well on large amounts of data, sometimes comprising millions or even billions of words. As an illustration of the extent of the impact of Corpus Linguistics on the field of Gender and Language, I examined frequencies of the word corpus and its plural corpora in 63 articles published in issues 1–6 (between 2007–12) of the journal Gender and Language. Twenty five articles contained at least one mention of the word corpus or corpora, although this does not necessarily indicate that these were articles which used Corpus Linguistics methods. Indeed, authors mainly used the term to refer to their data set as a corpus but they carried out their analysis using purely qualitative methods. I would classify only four papers (6.3% of the total) as taking a Corpus Linguistics approach (Johnson and Ensslin (2007), Charteris-Black and Seale (2009), Baker (2010) and King (2011)). Additionally, Holmgreen (2009) used a corpus to verify some of her findings but her main method was qualitative. There is evidence that some researchers in Gender and Language are using corpus approaches, although they seem to be in the minority.
A chief motivation for writing this book then, is to consider and demonstrate some of the ways that Corpus Linguistics can be of value to people working in the field of Gender and Language. I would not encourage researchers to abandon their existing methods but rather offer Corpus Linguistics as a supplementary approach. This book is therefore mainly written for two types of people. First, those who are interested in Gender and Language and would like to know more about how Corpus Linguistics can help them in their research, and second, people already working in Corpus Linguistics who do not consider themselves to be familiar with the field of Gender and Language, but would like to examine gender in their corpus research. I assume that readers have a basic level of computer literacy (e.g. they know how to create, alter and find files and folders on a computer, can send emails and access information on the World Wide Web through a browser like Internet Explorer) but are not computer programmers or statisticians and do not necessarily want to be. Indeed, an aim of this book is to demonstrate how much can be achieved within a Corpus Linguistics paradigm without needing to be a computer or maths wizard, although with that said, such people have been and continue to be central to the development of the field. I hope that this book will empower ‘non-techy’ Gender and Language researchers to feel confident in building and exploiting corpora, while encouraging corpus linguists to incorporate some of the more recent thinking about Gender and Language into their own studies. Therefore, each of the analysis chapters in this book (Chapters Two to Seven) combine the analysis of different types of corpora, with a variety of aims in mind, using a range of techniques, as well as addressing issues and problems as they arise. I have tried to provide as much coverage as possible and an overview of the book is presented at the end of this chapter.
However, in the follo...

Table of contents