eBook - ePub

Spoken English on Computer

Name: Spoken English on Computer
Author: Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas

Transcription, Mark-Up and Application

Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas

Condividi libro

272 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Spoken English on Computer

Transcription, Mark-Up and Application

Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

This book has evolved from a Workshop on Computerized Speech Corpora, held at Lancaster University in 1993. It brings together the findings presented in a clear and coherent manner, focussing on the advantages and disadvantages of particular transcription or mark-up practice.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Spoken English on Computer è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Spoken English on Computer di Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Lingue e linguistica e Linguistica. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Routledge

Anno

2014

ISBN

9781317891048

Edizione

Argomento

Lingue e linguistica

Categoria

Linguistica

Part A

Issues and practices

Introduction

The first four chapters in this section deal with theoretical and practical issues relating to the transcription and coding of spoken language in machine-readable form. Transcription is the process of representing spoken language in written form: how broad/narrow should that representation be? How can transcription be made useful to/usable by a wide range of users? How can we overcome the limitations of the written medium? Coding (also known as ‘tagging’ or ‘annotation’) relates to more abstract attributes of the text: for example, you might want to label grammatical, semantic, pragmatic or discoursal categories (to indicate, for example, that a word is a proper noun, that its use is restricted in some way, that a particular utterance was said in a sarcastic manner, or that it was used to bring an interaction to a close). Chapters 5 and 6 focus on issues of mark-up – the process of making texts machine-readable in ways which facilitate the interchange of data between users. The final chapter is rather different in nature – it is an edited transcript of an unscripted talk delivered interactively at the Lancaster Workshop on Computerized Spoken Discourse, held in September 1993. In this chapter, John Sinclair responds to the issues raised in the previous chapters. If we were constructing corpora in an ideal world, the issues raised in the first six chapters regarding delicacy of transcription and coding and detailed mark-up might all be taken on board. However, Sinclair speaking from his experience of many years working with large corpora of spoken language, discusses how in practice issues of cost and usability affect the transcription, coding and mark-up of very large corpora.

Three of the contributors to this section draw attention to Ochs’s (1979) paper, in which she observes that transcription (and to transcription we can now add coding) becomes data. The use of the computer, in spite of the many advantages and new possibilities which it opens up, does not resolve the problems of the relationship between the original speech event and the transcription, nor does it obviate the problem of representing spoken language in written form (indeed, in some ways it exacerbates these problems). Decisions made at the time of transcription and coding will affect the entire process of data analysis. Carefully thought out transcription can greatly aid analysis; poor decision-making early on can make the subsequent analysis difficult or worthless. Chapters 1 to 4 raise many issues which need to be taken into account when transcribing and coding a corpus.

In the first chapter, Jane Edwards focuses in particular on issues of coding. In a discussion which will prove invaluable to corpus researchers for years to come, she examines the principles underlying the design and implementation of transcription and coding, the principles of designing coding categories, the implementation of coding (applying the design to the data), ways of optimizing readability for human users while at the same time creating a system which is computationally tractable.

In Chapter 2, Guy Cook argues that while the use of the computer offers new possibilities for the researcher (particularly in terms of data retrieval and statistical analysis) it does not solve the initial problem of representing spoken language in written form. Underlying everything must be a sound theory and practice of transcription. He warns against treating speech as if it were writing. In particular he notes the tremendous importance of including contextual and other information when dealing with spoken data and the danger of focusing on purely linguistic features, at the expense of discourse phenomena, simply because the former are easier to handle. In discourse analysis and pragmatics we are dealing not with words alone, but with utterance-context pairings – how something is said, and the context in which it is said, may be as important as the words themselves. In relation to this Cook discusses the problems of how to represent paralinguistic and other non-linguistic features as well as background knowledge, while at the same time being aware of the problems of producing transcriptions which are so elaborate that the user becomes lost in a welter of detail. He argues (cf. Burnard and Johansson in Chapters 6 and 7) that it would be a mistake to assume that elaborate coding systems mean that we now have everything under control – many issues still remain to be resolved.

In Chapter 3, Wallace Chafe picks up many of the issues raised in the first two contributions. He is concerned with the representation of spoken language in a written format which optimizes its usefulness to the human reader. Transcription of spoken language is done for the specific purposes of the original transcriber, but ideally should be usable by a broad range of other users. Like Edwards, Chafe stresses the importance of building on the interpretive skills readers already have, and to this end discusses ways in which transcriptions can exploit such features of written language as different fonts and iconic symbols. He discusses in detail features of intonation and how to represent them and the importance of distinguishing between what can be measured (e.g. the precise length of a pause) and what is actually significant to participants in the original interaction and to the analysts of that interaction. Finally, in a discussion which looks forward to issues raised in Parts B and C, Chafe suggests that many of the problems raised so far can be mitigated by issuing corpora on CD-ROM which can also include the original recording in addition to the transcription and other information (such as digitized waveforms).

In Chapter 4, James Monaghan focuses on the importance of considering the end-user of the corpus and the importance of designing corpora in such a way that it is possible to access whole text structures, as well as lower level phenomena.

Chapters 5 and 6 deal with issues of transcription, coding and mark-up as they relate specifically to electronic storage and data interchange. Lou Burnard discusses in detail the requirements for encoding all types of text in order to conform to the requirements of the Text Encoding Initiative (TEI), regardless of the domain of application, or of the hardware and software the individual may be using. Johansson, in Chapter 6, deals specifically with the distinctions necessary for representing spoken discourse so that it conforms with TEI requirements. He argues that there is no necessary conflict between what are often seen as the very demanding requirements of TEI-conformant mark-up and the limited resources of the individual corpus-builder, nor between TEI and a reader-friendly/transcriber-friendly system. Provided the necessary software is developed, the underlying TEI representation can be transferred into any form convenient for an individual project.

In Chapter 7, the final chapter of this section, John Sinclair voices the worries of people involved in constructing large corpora who are alarmed by the demands of making their transcriptions TEI-conformant. These worries can be grouped under three main headings:

1. Picking up the final point made in Johansson’s chapter, Sinclair raises the question of transcriptions for humans, versus those suitable for machines. Like Cook, Sinclair is concerned that end-users will become lost in a welter of detail. Several contributors to the conference raised the possibility of associating transcription with either waveforms or sound recordings by means of hypertext or CD-ROM (see Johansson, Roach, Chafe and Cook, this volume) thereby offering access to greater detail as an option, although the relevant software and hardware are not yet widely available.

2. If TEI-conformant transcriptions are difficult to read to most users, their production also makes totally unrealistic demands on most transcribers. For those involved in the production of very large corpora (and it must be remembered that the size of the corpus is not a trivial matter, but crucially affects the types of linguistic generalizations and claims which can be made) the cost-effectiveness of TEI must be challenged. Although it is clearly of great importance that the basic data be available for other researchers to use, is it really the case that others will want to use your corpus annotations?

3. Sinclair challenges the way in which the requirements of TEI will operate in practice. His worry is that instead of individuals being able to operate within the inherent (indeed, almost unlimited) flexibility of TEI, as outlined in Chapters 5 and 6, we shall in practice be forced to operate within a very limited subset. We shall end up distorting our data in order to fit it into a straitjacket designed by computer buffs. Sinclair argues strongly in favour of the much weaker notion of compatibility (rather than conformity) with TEL

Like many contributors to this book and to the conference, John Sinclair underlines the need for software interpreters to be produced – not just an interpreter which will render your TEI marked-up text readable to ordinary users, but one which will translate ‘ordinary’ transcripts into TEI format. Mark-up must remain user-friendly, without costing too much, and this is undoubtedly the direction in which things will develop in the future. These issues are not only of interest to designers o...