Learner English on Computer
eBook - ePub

Learner English on Computer

  1. 250 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Learner English on Computer

About this book

The first book of its kind, Learner English on Computer is intended to provide linguists, students of linguistics and modern languages, and ELT professionals with a highly accessible and comprehensive introduction to the new and rapidly-expanding field of corpus-based research into learner language. Edited by the founder and co-ordinator of the International Corpus of Learner English (ICLE), the book contains articles on all aspects of corpus compilation, design and analysis.

The book is divided into three main sections; in Part I, the first chapter provides the reader with an overview of the field, explaining links with corpus and applied linguistics, second language acquisition and ELT. The second chapter reviews the software tools which are currently available for analysing learner language and contains useful examples of how they can be used. Part 2 contains eight case studies in which computer learner corpora are analysed for various lexical, discourse and grammatical features. The articles contain a wide range of methodologies with broad general application. The chapters in Part 3 look at how Computer Learner Corpus (CLC) based studies can help improve pedagogical tools: EFL grammars, dictionaries, writing textbooks and electronic tools. Implications for classroom methodology are also discussed.

The comprehensive scope of this volume should be invaluable to applied linguists and corpus linguists as well as to would-be learner corpus builders and analysts who wish to discover more about a new, exciting and fast-growing field of research.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Learner English on Computer by Sylviane Granger in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Part I
Learner Corpus Design and Analysis

Chapter One
The computer learner corpus: a versatile new source of data for SLA research

Sylviane Granger

1 Corpus linguistics and English studies

Since making its first appearance in the 1960s, the computer corpus has infiltrated all fields of language-related research, from lexicography to literary criticism through artificial intelligence and language teaching. This widespread use of the computer corpus has led to the development of a new discipline which has come to be called 'corpus linguistics', a term which refers not just to a new computer-based methodology, but as Leech (1992: 106) puts it, to a 'new research enterprise', a new way of thinking about language, which is challenging some of our most deeply-rooted ideas about language. With its focus on performance (rather than competence), description (rather than universals) and quantitative as well as qualitative analysis, it can be seen as contrasting sharply with the Chomskyan approach and indeed is presented as such by Leech (1992: 107). The two approaches are not mutually exclusive however. Comparing the respective merits of corpus linguistics and what he ironically calls 'armchair linguistics', Fillmore (1992: 35) comes to the conclusion that 'the two kinds of linguists need each other. Or better, that the two kinds of linguists, wherever possible, should exist in the same body.'
The computer plays a central role in corpus linguistics. A first major advantage of computerization is that it liberates language analysts 'from drudgery and empowers [them] to focus their creative energies on doing what machines cannot do' (Rundell and Stock 1992: 14). More fundamental, however, is the heuristic power of automated linguistic analysis, i.e. its power to uncover totally new facts about language. It is this aspect, rather than 'the mirroring of intuitive categories of description' (Sinclair 1986: 202), that is the most novel and exciting contribution of corpus linguistics.
English is undoubtedly the language which has been analysed most from a corpus linguistics perspective. Indeed the first computer corpus to be compiled was the Brown corpus, a corpus of American English. Since then English corpora have grown and diversified. At the time, the 1 million words contained in the Brown and the LOB were considered to be perfectly ample for research purposes, but they now appear microscopic in comparison to the 100 million words of the British National Corpus or the 200 million words of the Bank of English. This growth in corpus size over the years has been accompanied by a huge diversification of corpus types to cover a wide range of varieties: diachronic, stylistic (spoken vs. written; general vs. technical) and regional (British, American, Australian, Indian, etc.) (for a recent survey of English corpora, see McEnery and Wilson 1996).
Until very recently however, no attempt had been made to collect corpora of learner English, a strange omission given the number of people who speak Eng1ish as a foreign language throughout the world. It was not until the early 1990s that academics, EFL specialists and publishing houses alike began to recognize the theoretical and practical potential of computer learner corpora and several projects were launched, among which the following three figure prominently: the International Corpus of Learner English (ICLE), a corpus of learner English from several mother tongue backgrounds and the result of international academic collaboration, the Longman Learners' Corpus (LLC), which also contains learner English from several mother tongue backgrounds and the Hong Kong University of Science and Technology (HKUST) Learner Corpus, which is made up of the English of Chinese learners.

2 Learner corpus data and SLA research

2.1 Empirical data in SLA research

The main goal of Second Language Acquisition (SLA)1 research is to uncover the principles that govern the process of learning a foreign / second language. As this process is mental and therefore not directly observable, it has to be accessed via the product, i.e. learner performance data. Ellis (1994: 670) distinguishes three main data types: (1) language use data, which 'reflect learners' attempts to use the L2 in either comprehension or production'; (2) metalingual judgements, which tap learners' intuitions about the L2, for instance by asking them to judge the grammaticality of sentences; and (3) self-report data, which explore learners' strategies via questionnaires or think-aloud tasks. Language use data is said to be 'natural' if no control is exerted on the learners' performance and 'elicited' if it results from a controlled experiment.
Current SLA research is mainly based on introspective data (i.e. Ellis's types 2 and 3) and language use data of the elicited type. People have preferred not to use natural language use data for a variety of reasons. One has to do with the infrequency of some language features, i.e. the fact that 'certain properties happen to occur very rarely or not at all unless specifically elicited' (Yip 1995: 9). Secondly, as variables affecting language use are not controlled, the effect of these variables cannot be investigated systematically. Finally, natural language use data fails to reveal the entire linguistic repertoire of learners because 'they [learners] will use only those aspects in which they have the most confidence. They will avoid the troublesome aspects through circumlocution or some other device' (Larsen-Freeman and Long 1991: 26).
Introspective and elicited data also have their limitations, however, and their validity, particularly that of elicited data, has been put into question. The artificiality of an experimental language situation may lead learners to produce language which differs widely from the type of language they would use naturally. Also, because of the constraints of experimental elicitation, SLA specialists regularly rely on a very narrow empirical base, often no more than a handful of informants, something which severely restricts the generalizability of the results. There is clearly a need for more, and better quality, data and this is particularly acute in the case of natural language data. In this context, learner corpora which, as will be shown in the following section, answer most of the criticisms levelled at natural language use data, are a valuable addition to current SLA data sources. Undeniably however, all types of SLA data have their strengths and weaknesses and one can but agree with Ellis (1994: 676) that 'Good research is research that makes use of multiple sources of data.'

2.2 Contribution of learner corpora to SLA research

The ancestor of the learner corpus can be traced back to the Error Analysis (EA) era. However, learner corpora in those days bore little resemblance to current ones. First, they were usually very small, sometimes no more than 2,000 words from a dozen or so learners. Some corpora, such as the one used in the Danish PIF (Project in Foreign Language Pedagogy) project (see Faerch et al. 1984) were much bigger, though how much bigger is difficult to know as the exact size of the early learner corpora was generally not mentioned. This was quite simply because the com pilers usually had no idea themselves. As the corpora were not com puterized, counting the number of words had to be done manually, an impossible task if the corpus was relatively big. At best, it would some times have been possible to make a rough estimate of the size on the basis of the number of informants used and the average length of their assignments.
A further limitation is the heterogeneity of the learner data. In this connection, Ellis (1994: 49) comments that, in collecting samples of learner language, EA researchers have not paid enough attention to the variety of factors that can influence learner output, with the result that 'EA studies are difficult to interpret and almost impossible to replicate'. Results of EA studies and in fact a number of SLA studies have been inconclusive, and on occasion contradictory, because these factors have not been attended to. In his book on transfer, Odlin (1989: 151) notes 'considerable variation in the number of subjects, in the backgrounds of the subjects, and in the empirical data, which come from tape-recorded samples of speech, from student writing, from various types of tests, and from other sources' and concludes that 'improvements in data gathering would be highly desirable'.
Yet another weakness of many early learner corpora is that they were not really exploited as corpora in their own right, but merely served as depositories of errors, only to be discarded after the relevant errors had been extracted from them. EA researchers focused on decontextualized errors and disregarded the rest of the learner's performance. As a result, they 'were denied access to the whole picture' (Larsen-Freeman and Long 1991: 61) and failed to capture phenomena such as avoidance, which does not lead to errors, but to under-representation of words or structures in L2 use (Van Els et al. 1984: 63).
Current learner corpora stand in sharp contrast to what are in effect proto-corpora. For one thing, they are much bigger and therefore lend themselves to the analysis of most language features, including infrequent ones, thereby answering one of the criticisms levelled at natural language use data (see section 2.1). Secondly, there is a tendency for compilers of the current computer learner corpora (CLCs), learning by mistakes made in the past, to adopt much stricter design criteria, thus allowing for investigations of the different variables affecting learner output. Last but not least, they are computerized. As a consequence, large amounts of data can be submitted to a whole range of linguistic software tools, thus providing a quantitative approach to learner language, a hitherto largely unexplored area. Comparing the frequency of words/structures in learner and native corpora makes it possible to study phenomena such as avoidance which were never addressed in the era of EA. Unlike previous error corpora, CLCs give us access not only to errors but to learners' total interlanguage.

2.3 Learner corpus data and ELT

The fact that CLCs are a fairly recent development does not mean that there was no previous link between corpus linguistics and the ELT world. Over the last few years, native English corpora have increasingly been used in ELT materials design. It was Collins Cobuild who set this trend and their pioneering dictionary project gave rise to a whole range of EFL tools based on authentic data. Underlying the approach was the firm belief that better descriptions of authentic native English would lead to better EFL tools and indeed, studies which have compared materials based on authentic data with traditional intuition-based materials have found this to be true. In the field of vocabulary, for example, Ljung (1991) has found that traditional textbooks tend to over-represent concrete words to the detriment of abstract and societal terms and there fore fail to prepare students for a variety of tasks, such as reading quality newspapers and report-writing. The conclusion is clear: textbooks are more useful when they are based on authentic native English.
However much of an advance they were, native corpora cannot ensure fully effective EFL learning and teaching, mainly because they contain no indication of the degree of difficulty of words and structures for learners. It is paradoxical that although it is claimed that ELT materials should be based on solid, corpus-based descriptions of native English, materials designers are content with a very fuzzy, intuitive, non-corpus based view of the needs of an archetypal learner. There is no doubt that the efficiency of EFL tools could be improved if materials designers had access not only to authentic native data but also to authentic learner data, with the NS (native speaker) data giving information about what is typical in English, and the NNS (non-native speaker) data highlighting what is difficult for learners in general and for specific groups of learners. As a result, a new generation of CLC-informed EFL tools is beginning to emerge. Milton's (Chapter 14, this volume) Electronic Language Learning and Production Environment is an electronic pedagogical tool which specifically addresses errors and patterns of over- and underuse typical of Cantonese learners of English, as attested by the HKUST Learner Corpus. In the lexicographical field, the Longman Essential Activator is the first...

Table of contents

  1. Cover
  2. Half Title
  3. Series Page
  4. Title
  5. Copyright
  6. Contents
  7. List of contributors
  8. Editor's acknowledgements
  9. Publisher's acknowledgements
  10. List of abbreviations
  11. Preface
  12. Introduction
  13. Part I Learner Corpus Design and Analysis
  14. Part II Studies of Learner Grammar, Lexis and Discourse
  15. Part III Pedagogical Applications of Learner Corpora
  16. List of linguistic software mentioned in the book
  17. Bibliography
  18. Index