This book has evolved from a Workshop on Computerized Speech Corpora, held at Lancaster University in 1993. It brings together the findings presented in a clear and coherent manner, focussing on the advantages and disadvantages of particular transcription or mark-up practice.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Spoken English on Computer by Geoffrey Leech,Greg Myers,Jenny (All Lecturers, Department Of Linguistics And Modern English, Lancaster University) Thomas, Geoffrey Leech,Greg Myers,Jenny Thomas in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2014

eBook ISBN

9781317891048

Topic

Languages & Linguistics

Subtopic

Linguistics

Index

Languages & Linguistics

Part A

Issues and practices

Introduction

The first four chapters in this section deal with theoretical and practical issues relating to the transcription and coding of spoken language in machine-readable form. Transcription is the process of representing spoken language in written form: how broad/narrow should that representation be? How can transcription be made useful to/usable by a wide range of users? How can we overcome the limitations of the written medium? Coding (also known as ‘tagging’ or ‘annotation’) relates to more abstract attributes of the text: for example, you might want to label grammatical, semantic, pragmatic or discoursal categories (to indicate, for example, that a word is a proper noun, that its use is restricted in some way, that a particular utterance was said in a sarcastic manner, or that it was used to bring an interaction to a close). Chapters 5 and 6 focus on issues of mark-up – the process of making texts machine-readable in ways which facilitate the interchange of data between users. The final chapter is rather different in nature – it is an edited transcript of an unscripted talk delivered interactively at the Lancaster Workshop on Computerized Spoken Discourse, held in September 1993. In this chapter, John Sinclair responds to the issues raised in the previous chapters. If we were constructing corpora in an ideal world, the issues raised in the first six chapters regarding delicacy of transcription and coding and detailed mark-up might all be taken on board. However, Sinclair speaking from his experience of many years working with large corpora of spoken language, discusses how in practice issues of cost and usability affect the transcription, coding and mark-up of very large corpora.

Three of the contributors to this section draw attention to Ochs’s (1979) paper, in which she observes that transcription (and to transcription we can now add coding) becomes data. The use of the computer, in spite of the many advantages and new possibilities which it opens up, does not resolve the problems of the relationship between the original speech event and the transcription, nor does it obviate the problem of representing spoken language in written form (indeed, in some ways it exacerbates these problems). Decisions made at the time of transcription and coding will affect the entire process of data analysis. Carefully thought out transcription can greatly aid analysis; poor decision-making early on can make the subsequent analysis difficult or worthless. Chapters 1 to 4 raise many issues which need to be taken into account when transcribing and coding a corpus.

In the first chapter, Jane Edwards focuses in particular on issues of coding. In a discussion which will prove invaluable to corpus researchers for years to come, she examines the principles underlying the design and implementation of transcription and coding, the principles of designing coding categories, the implementation of coding (applying the design to the data), ways of optimizing readability for human users while at the same time creating a system which is computationally tractable.

In Chapter 2, Guy Cook argues that while the use of the computer offers new possibilities for the researcher (particularly in terms of data retrieval and statistical analysis) it does not solve the initial problem of representing spoken language in written form. Underlying everything must be a sound theory and practice of transcription. He warns against treating speech as if it were writing. In particular he notes the tremendous importance of including contextual and other information when dealing with spoken data and the danger of focusing on purely linguistic features, at the expense of discourse phenomena, simply because the former are easier to handle. In discourse analysis and pragmatics we are dealing not with words alone, but with utterance-context pairings – how something is said, and the context in which it is said, may be as important as the words themselves. In relation to this Cook discusses the problems of how to represent paralinguistic and other non-linguistic features as well as background knowledge, while at the same time being aware of the problems of producing transcriptions which are so elaborate that the user becomes lost in a welter of detail. He argues (cf. Burnard and Johansson in Chapters 6 and 7) that it would be a mistake to assume that elaborate coding systems mean that we now have everything under control – many issues still remain to be resolved.

In Chapter 3, Wallace Chafe picks up many of the issues raised in the first two contributions. He is concerned with the representation of spoken language in a written format which optimizes its usefulness to the human reader. Transcription of spoken language is done for the specific purposes of the original transcriber, but ideally should be usable by a broad range of other users. Like Edwards, Chafe stresses the importance of building on the interpretive skills readers already have, and to this end discusses ways in which transcriptions can exploit such features of written language as different fonts and iconic symbols. He discusses in detail features of intonation and how to represent them and the importance of distinguishing between what can be measured (e.g. the precise length of a pause) and what is actually significant to participants in the original interaction and to the analysts of that interaction. Finally, in a discussion which looks forward to issues raised in Parts B and C, Chafe suggests that many of the problems raised so far can be mitigated by issuing corpora on CD-ROM which can also include the original recording in addition to the transcription and other information (such as digitized waveforms).

In Chapter 4, James Monaghan focuses on the importance of considering the end-user of the corpus and the importance of designing corpora in such a way that it is possible to access whole text structures, as well as lower level phenomena.

Chapters 5 and 6 deal with issues of transcription, coding and mark-up as they relate specifically to electronic storage and data interchange. Lou Burnard discusses in detail the requirements for encoding all types of text in order to conform to the requirements of the Text Encoding Initiative (TEI), regardless of the domain of application, or of the hardware and software the individual may be using. Johansson, in Chapter 6, deals specifically with the distinctions necessary for representing spoken discourse so that it conforms with TEI requirements. He argues that there is no necessary conflict between what are often seen as the very demanding requirements of TEI-conformant mark-up and the limited resources of the individual corpus-builder, nor between TEI and a reader-friendly/transcriber-friendly system. Provided the necessary software is developed, the underlying TEI representation can be transferred into any form convenient for an individual project.

In Chapter 7, the final chapter of this section, John Sinclair voices the worries of people involved in constructing large corpora who are alarmed by the demands of making their transcriptions TEI-conformant. These worries can be grouped under three main headings:

1. Picking up the final point made in Johansson’s chapter, Sinclair raises the question of transcriptions for humans, versus those suitable for machines. Like Cook, Sinclair is concerned that end-users will become lost in a welter of detail. Several contributors to the conference raised the possibility of associating transcription with either waveforms or sound recordings by means of hypertext or CD-ROM (see Johansson, Roach, Chafe and Cook, this volume) thereby offering access to greater detail as an option, although the relevant software and hardware are not yet widely available.

2. If TEI-conformant transcriptions are difficult to read to most users, their production also makes totally unrealistic demands on most transcribers. For those involved in the production of very large corpora (and it must be remembered that the size of the corpus is not a trivial matter, but crucially affects the types of linguistic generalizations and claims which can be made) the cost-effectiveness of TEI must be challenged. Although it is clearly of great importance that the basic data be available for other researchers to use, is it really the case that others will want to use your corpus annotations?

3. Sinclair challenges the way in which the requirements of TEI will operate in practice. His worry is that instead of individuals being able to operate within the inherent (indeed, almost unlimited) flexibility of TEI, as outlined in Chapters 5 and 6, we shall in practice be forced to operate within a very limited subset. We shall end up distorting our data in order to fit it into a straitjacket designed by computer buffs. Sinclair argues strongly in favour of the much weaker notion of compatibility (rather than conformity) with TEL

Like many contributors to this book and to the conference, John Sinclair underlines the need for software interpreters to be produced – not just an interpreter which will render your TEI marked-up text readable to ordinary users, but one which will translate ‘ordinary’ transcripts into TEI format. Mark-up must remain user-friendly, without costing too much, and this is undoubtedly the direction in which things will develop in the future. These issues are not only of interest to designers o...

Cover
Half Title
Title Page
Copyright Page
Table of Contents
List of contributors
List of abbreviations and acroyms
Editors’ general introduction
Part A: Issues and practices
Part B: Applications and more specialized uses
Part C: Samples and systems of transcription
Bibliographical references
Author index
Subject index

About this book

Frequently asked questions

Information

Part A

Introduction

Table of contents