eBook - ePub

Corpus Linguistics for ELT

Name: Corpus Linguistics for ELT
Author: Ivor Timmis

Research and Practice

Ivor Timmis

Condividi libro

214 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Corpus Linguistics for ELT

Research and Practice

Ivor Timmis

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Corpus Linguistics for ELT provides a practical guide to undertaking ELT-related corpus research. Aimed at researchers, advanced undergraduate and postgraduate students of ELT and TESOL, and English language teachers, this volume:

covers corpus research in the main areas of language study relevant to ELT: grammar, lexis, ESP, spoken grammar and discourse;
presents a review of relevant corpus research in these areas, and discusses the implications of this research for ELT;
suggests potential ELT-focused corpus research projects, and equips the reader with all the required tools and techniques to carry them out;
deals with the growing area of learner corpora and direct classroom application of corpus material.

Corpus Linguistics for ELT empowers and inspires readers to carry out their own ELT corpus research, and will allow them in turn to make a significant contribution to corpus-informed ELT pedagogy.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Corpus Linguistics for ELT è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Corpus Linguistics for ELT di Ivor Timmis in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Filología e Lingüística. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Routledge

Anno

2015

ISBN

9781317504283

Edizione

Argomento

Filología

Categoria

Lingüística

Chapter 1 Introduction

DOI: 10.4324/9781315715537-1

Aims

The challenge of fostering a fruitful relationship between corpus linguistics and ELT was clearly set out by Conrad (2000: 556):

Corpus grammarians must strive to reach more audiences that include teachers and must emphasize concrete pedagogical applications … In fact, the strongest force for change could be a new generation of ESL teachers who were introduced to corpus-based research in their training programs [and] have practiced conducting their own corpus investigations and designing materials based on corpus research.

Indeed, this comment by Conrad encapsulates the main aim of this book: to help move corpus linguistics from what Römer (2012) terms its ‘minority sport’ status in language teaching to a point where the ability to carry out and interpret corpus research is seen as a normal part of an English language teacher’s repertoire. Familiarity with corpus research and practice should be a standard part of an English language teacher’s toolkit, I would argue, because most people in ELT will at some time have had thoughts like these:

How many words do my learners need to learn?
Why is everyone talking about lexical chunks and collocations?
Do my students really need this grammar point?
Which words should I use to exemplify this structure?
Am I teaching my learners language they will need to use when they speak the language?
Does the grammar explanation in the coursebook really reflect how we use this structure?
What vocabulary do my English for dentistry students need to get their teeth into?

If you have had questions like these, this book is designed to help you to answer them by consulting corpora and corpus-informed literature. It is also designed to help you to generate and investigate similar questions. It is, however, important to keep corpora in perspective throughout this book. The argument presented here is that corpora are a resource and a reference source and, as is the case with all resources, pedagogic judgement is vitally important in determining how and when they are deployed to best effect.

The book does not assume prior knowledge or experience of corpus research; nor does it assume any technical expertise. Technophobes can relax: contemporary corpus interfaces and corpus software are user-friendly and often include tutorial packages. The tasks in this book will help to familiarise readers with publicly available user-friendly corpora such as the British National Corpus hosted at http://corpus.byu.edu/bnc/

And if you know how to save a document, you are, as we shall see in the next chapter, well on the way to being able to compile your own corpus for teaching purposes; and then things get really interesting.

What is a corpus?

Defining a corpus

If you are reading this book, you probably know what a corpus is, but it is useful to draw out some key points from definitions in the literature to be sure that we have a shared understanding. Brazil (1995: 24) defines a corpus as ‘a collection of used language’, explaining that ‘used language’ is ‘language which has occurred under circumstances in which the speaker was known to be doing something more than demonstrate the way the system works’. This definition is useful in that it focuses on the fact that language in a corpus is naturally occurring. We need to note, however, that a corpus is not just a collection of naturally occurring language in the form of isolated words or sentences randomly collected; it consists of spoken and/or written texts (the word ‘text’ in corpus linguistics is used to refer to both spoken and written language). And the collection of texts also has to be purposeful: ‘A corpus is not simply a collection of texts. Rather a corpus seeks to represent a language or some part of a language’ (Biber, Conrad and Reppen 1998: 246). In practice, as McEnery and Wilson (1996) note, in contemporary usage a corpus almost always refers to texts collected in machine-readable form, i.e. electronic texts which can be automatically analysed with software packages. For our purposes, it is important to note that while ‘big-name’ corpora such as the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA) consist of hundreds of millions of words, size is not an absolute criterion for corpus design: size is a question of fitness for purpose. O’Keeffe, McCarthy and Carter (2007: 4) stress that the design of the corpus is more important than the size:

For corpora of spoken language, anything over a million words is considered to be large; for written corpora, anything below five million is considered quite small. In terms of suitability, however, it is often the design of a corpus as opposed to its size which is the determining factor.

It is the design of a corpus which will ensure that it represents what it seeks to represent. Design issues include demographic factors such as gender, age and social class, as well as questions of the genres and contexts of the language included in the corpus. Even a very large corpus such as the BNC self-evidently does not tell us how English is used in the USA, in India, or as a lingua franca between non-native speakers.

Types of corpus

It is important to be aware of the range of corpora available (see Appendix 2 for a fuller list). While large general corpora such as BNC and COCA have both written and spoken components, many corpora are either written or spoken. The five million word CANCODE (Cambridge and Nottingham Corpus of Discourse English) is a well-known spoken corpus often cited in ELT studies. There are also English for Specific Purposes corpora, e.g. MICASE (Michigan Corpus of Academic Spoken English); CANBEC (Cambridge and Nottingham Business English Corpus), and the Hong Kong Engineering corpus. For ELT purposes, corpora of non-native English are important, e.g. VOICE ¹ (Vienna–Oxford International Corpus of English), a spoken corpus of English used as a Lingua Franca (ELF). Learner corpora are a specific type of non-native corpus, self-evidently containing data produced by learners of English, e.g. ICLE (International Corpus of Learner English) which ‘contains argumentative essays written by higher intermediate to advanced learners of English from several mother tongue backgrounds’ (http://www.uclouvain.be/en-cecl-icle.html). We need to consider one further type of corpus: a pedagogic corpus or, to use Leech’s (1997) term, a teaching-oriented corpus. A pedagogic corpus is one that has been compiled specifically for language teaching purposes. An interesting suggestion for ‘pedagogic corpora’ has been made by Willis (2003), who proposes a pedagogic corpus is made up of the texts already used by the learners in class, which is then exploited for the study of particular language features. The advantage of such corpora, Willis (2003) argues, is that learners will already be familiar with the co-text, i.e. the text immediately surrounding the target feature, as they will previously have studied the whole text in class. Similarly, Römer (2006) has suggested that coursebooks themselves can be made into corpora so that ‘coursebook English’ can be compared with ‘real English’. The SACODEYL (System Aided Compilation and Distribution of European Youth Language) corpus could also be seen as a pedagogic corpus, though it was not compiled from learning materials; it was deliberately constructed for language learning purposes, as described below on the SACODEYL website: ‘The [SACODEYL] corpora are based on structured video interviews with pupils between 13 and 18 years of age. The interviews have been annotated and enriched for language learning purposes.’ http://sacodeyl.inf.um.es/sacodeyl-search2/

While SACODEYL might not be the most transparent project title, it has the significant benefit of being free to access and providing online guidance on how to use it.

Corpus Search

Visit the four websites below and consider which you might find most useful for your teaching, research or studies:

http://corpus.byu.edu/bnc/

http://sacodeyl.inf.um.es/sacodeyl-search2/

http://www.uclouvain.be/en-cecl-icle.html

http://www.univie.ac.at/voice/

What can we do with a corpus?

Questions corpora can answer – quantitative analysis

Though corpus linguistics has come to be seen as a domain of applied linguistics in its own right, it will be useful for our purposes to view it also as a methodology through which various domains of applied linguistics can be investigated, e.g. grammar, lexis, discourse, pragmatics, SLA (second language acquisition). Corpora are most often associated with quantitative research as frequency information can be generated with striking ease. The most basic kinds of frequency question we can ask are:

What are the most frequent words in our corpus, i.e. rank order?
How many instances of a given word are there in the corpus, i.e. raw frequency?
What percentage of the total number of tokens in the corpus does the raw frequency represent, i.e. relative frequency?
What are the most frequent collocations of a given word in our corpus?
What are the most frequent phrases of a given length (e.g. 2-word phrases, 3-word phrases, 4-word phrases and so on)?
What are the most frequent grammatical structures in our corpus?

Each of these questions may be applied with a more specific focus, but we will take word frequency as an example:

What are the most frequent words used in a given component of the corpus, e.g. academic or business or technical English?
What are the most frequent words used by a particular demographic group of people, e.g. women, people under 30, people of a given social class or from a given region?
What are the most frequent words used in a particular kind of text, e.g. scientific articles?
What are the most frequent words in a given genre, e.g. self-descriptions on internet dating sites?

These questions do not exhaust the possibilities, but give some idea of the range of questions which can be asked of a corpus. It is crucial to note, however, that the kind of question which can be investigated depends on the composition of the corpus and the information wh...