eBook - ePub

Corpus Linguistics for ELT

Name: Corpus Linguistics for ELT
Author: Ivor Timmis

Research and Practice

Ivor Timmis

Compartir libro

214 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Corpus Linguistics for ELT

Research and Practice

Ivor Timmis

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Corpus Linguistics for ELT provides a practical guide to undertaking ELT-related corpus research. Aimed at researchers, advanced undergraduate and postgraduate students of ELT and TESOL, and English language teachers, this volume:

covers corpus research in the main areas of language study relevant to ELT: grammar, lexis, ESP, spoken grammar and discourse;
presents a review of relevant corpus research in these areas, and discusses the implications of this research for ELT;
suggests potential ELT-focused corpus research projects, and equips the reader with all the required tools and techniques to carry them out;
deals with the growing area of learner corpora and direct classroom application of corpus material.

Corpus Linguistics for ELT empowers and inspires readers to carry out their own ELT corpus research, and will allow them in turn to make a significant contribution to corpus-informed ELT pedagogy.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Corpus Linguistics for ELT un PDF/ePUB en línea?

Sí, puedes acceder a Corpus Linguistics for ELT de Ivor Timmis en formato PDF o ePUB, así como a otros libros populares de Filología y Lingüística. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Routledge

Año

2015

ISBN

9781317504283

Edición

Categoría

Filología

Categoría

Lingüística

Chapter 1 Introduction

DOI: 10.4324/9781315715537-1

Aims

The challenge of fostering a fruitful relationship between corpus linguistics and ELT was clearly set out by Conrad (2000: 556):

Corpus grammarians must strive to reach more audiences that include teachers and must emphasize concrete pedagogical applications … In fact, the strongest force for change could be a new generation of ESL teachers who were introduced to corpus-based research in their training programs [and] have practiced conducting their own corpus investigations and designing materials based on corpus research.

Indeed, this comment by Conrad encapsulates the main aim of this book: to help move corpus linguistics from what Römer (2012) terms its ‘minority sport’ status in language teaching to a point where the ability to carry out and interpret corpus research is seen as a normal part of an English language teacher’s repertoire. Familiarity with corpus research and practice should be a standard part of an English language teacher’s toolkit, I would argue, because most people in ELT will at some time have had thoughts like these:

How many words do my learners need to learn?
Why is everyone talking about lexical chunks and collocations?
Do my students really need this grammar point?
Which words should I use to exemplify this structure?
Am I teaching my learners language they will need to use when they speak the language?
Does the grammar explanation in the coursebook really reflect how we use this structure?
What vocabulary do my English for dentistry students need to get their teeth into?

If you have had questions like these, this book is designed to help you to answer them by consulting corpora and corpus-informed literature. It is also designed to help you to generate and investigate similar questions. It is, however, important to keep corpora in perspective throughout this book. The argument presented here is that corpora are a resource and a reference source and, as is the case with all resources, pedagogic judgement is vitally important in determining how and when they are deployed to best effect.

The book does not assume prior knowledge or experience of corpus research; nor does it assume any technical expertise. Technophobes can relax: contemporary corpus interfaces and corpus software are user-friendly and often include tutorial packages. The tasks in this book will help to familiarise readers with publicly available user-friendly corpora such as the British National Corpus hosted at http://corpus.byu.edu/bnc/

And if you know how to save a document, you are, as we shall see in the next chapter, well on the way to being able to compile your own corpus for teaching purposes; and then things get really interesting.

What is a corpus?

Defining a corpus

If you are reading this book, you probably know what a corpus is, but it is useful to draw out some key points from definitions in the literature to be sure that we have a shared understanding. Brazil (1995: 24) defines a corpus as ‘a collection of used language’, explaining that ‘used language’ is ‘language which has occurred under circumstances in which the speaker was known to be doing something more than demonstrate the way the system works’. This definition is useful in that it focuses on the fact that language in a corpus is naturally occurring. We need to note, however, that a corpus is not just a collection of naturally occurring language in the form of isolated words or sentences randomly collected; it consists of spoken and/or written texts (the word ‘text’ in corpus linguistics is used to refer to both spoken and written language). And the collection of texts also has to be purposeful: ‘A corpus is not simply a collection of texts. Rather a corpus seeks to represent a language or some part of a language’ (Biber, Conrad and Reppen 1998: 246). In practice, as McEnery and Wilson (1996) note, in contemporary usage a corpus almost always refers to texts collected in machine-readable form, i.e. electronic texts which can be automatically analysed with software packages. For our purposes, it is important to note that while ‘big-name’ corpora such as the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA) consist of hundreds of millions of words, size is not an absolute criterion for corpus design: size is a question of fitness for purpose. O’Keeffe, McCarthy and Carter (2007: 4) stress that the design of the corpus is more important than the size:

For corpora of spoken language, anything over a million words is considered to be large; for written corpora, anything below five million is considered quite small. In terms of suitability, however, it is often the design of a corpus as opposed to its size which is the determining factor.

It is the design of a corpus which will ensure that it represents what it seeks to represent. Design issues include demographic factors such as gender, age and social class, as well as questions of the genres and contexts of the language included in the corpus. Even a very large corpus such as the BNC self-evidently does not tell us how English is used in the USA, in India, or as a lingua franca between non-native speakers.

Types of corpus

It is important to be aware of the range of corpora available (see Appendix 2 for a fuller list). While large general corpora such as BNC and COCA have both written and spoken components, many corpora are either written or spoken. The five million word CANCODE (Cambridge and Nottingham Corpus of Discourse English) is a well-known spoken corpus often cited in ELT studies. There are also English for Specific Purposes corpora, e.g. MICASE (Michigan Corpus of Academic Spoken English); CANBEC (Cambridge and Nottingham Business English Corpus), and the Hong Kong Engineering corpus. For ELT purposes, corpora of non-native English are important, e.g. VOICE ¹ (Vienna–Oxford International Corpus of English), a spoken corpus of English used as a Lingua Franca (ELF). Learner corpora are a specific type of non-native corpus, self-evidently containing data produced by learners of English, e.g. ICLE (International Corpus of Learner English) which ‘contains argumentative essays written by higher intermediate to advanced learners of English from several mother tongue backgrounds’ (http://www.uclouvain.be/en-cecl-icle.html). We need to consider one further type of corpus: a pedagogic corpus or, to use Leech’s (1997) term, a teaching-oriented corpus. A pedagogic corpus is one that has been compiled specifically for language teaching purposes. An interesting suggestion for ‘pedagogic corpora’ has been made by Willis (2003), who proposes a pedagogic corpus is made up of the texts already used by the learners in class, which is then exploited for the study of particular language features. The advantage of such corpora, Willis (2003) argues, is that learners will already be familiar with the co-text, i.e. the text immediately surrounding the target feature, as they will previously have studied the whole text in class. Similarly, Römer (2006) has suggested that coursebooks themselves can be made into corpora so that ‘coursebook English’ can be compared with ‘real English’. The SACODEYL (System Aided Compilation and Distribution of European Youth Language) corpus could also be seen as a pedagogic corpus, though it was not compiled from learning materials; it was deliberately constructed for language learning purposes, as described below on the SACODEYL website: ‘The [SACODEYL] corpora are based on structured video interviews with pupils between 13 and 18 years of age. The interviews have been annotated and enriched for language learning purposes.’ http://sacodeyl.inf.um.es/sacodeyl-search2/

While SACODEYL might not be the most transparent project title, it has the significant benefit of being free to access and providing online guidance on how to use it.

Corpus Search

Visit the four websites below and consider which you might find most useful for your teaching, research or studies:

http://corpus.byu.edu/bnc/

http://sacodeyl.inf.um.es/sacodeyl-search2/

http://www.uclouvain.be/en-cecl-icle.html

http://www.univie.ac.at/voice/

What can we do with a corpus?

Questions corpora can answer – quantitative analysis

Though corpus linguistics has come to be seen as a domain of applied linguistics in its own right, it will be useful for our purposes to view it also as a methodology through which various domains of applied linguistics can be investigated, e.g. grammar, lexis, discourse, pragmatics, SLA (second language acquisition). Corpora are most often associated with quantitative research as frequency information can be generated with striking ease. The most basic kinds of frequency question we can ask are:

What are the most frequent words in our corpus, i.e. rank order?
How many instances of a given word are there in the corpus, i.e. raw frequency?
What percentage of the total number of tokens in the corpus does the raw frequency represent, i.e. relative frequency?
What are the most frequent collocations of a given word in our corpus?
What are the most frequent phrases of a given length (e.g. 2-word phrases, 3-word phrases, 4-word phrases and so on)?
What are the most frequent grammatical structures in our corpus?

Each of these questions may be applied with a more specific focus, but we will take word frequency as an example:

What are the most frequent words used in a given component of the corpus, e.g. academic or business or technical English?
What are the most frequent words used by a particular demographic group of people, e.g. women, people under 30, people of a given social class or from a given region?
What are the most frequent words used in a particular kind of text, e.g. scientific articles?
What are the most frequent words in a given genre, e.g. self-descriptions on internet dating sites?

These questions do not exhaust the possibilities, but give some idea of the range of questions which can be asked of a corpus. It is crucial to note, however, that the kind of question which can be investigated depends on the composition of the corpus and the information wh...