Spoken English on Computer
eBook - ePub

Spoken English on Computer

Transcription, Mark-Up and Application

Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas

Compartir libro
  1. 272 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Spoken English on Computer

Transcription, Mark-Up and Application

Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

This book has evolved from a Workshop on Computerized Speech Corpora, held at Lancaster University in 1993. It brings together the findings presented in a clear and coherent manner, focussing on the advantages and disadvantages of particular transcription or mark-up practice.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Spoken English on Computer un PDF/ePUB en línea?
Sí, puedes acceder a Spoken English on Computer de Geoffrey Leech, Greg Myers, Jenny Thomas, Geoffrey Leech, Greg Myers, Jenny Thomas en formato PDF o ePUB, así como a otros libros populares de Lingue e linguistica y Linguistica. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial
Routledge
Año
2014
ISBN
9781317891048
Edición
1
Categoría
Linguistica

Part A

Issues and practices

Introduction

The first four chapters in this section deal with theoretical and practical issues relating to the transcription and coding of spoken language in machine-readable form. Transcription is the process of representing spoken language in written form: how broad/narrow should that representation be? How can transcription be made useful to/usable by a wide range of users? How can we overcome the limitations of the written medium? Coding (also known as ‘tagging’ or ‘annotation’) relates to more abstract attributes of the text: for example, you might want to label grammatical, semantic, pragmatic or discoursal categories (to indicate, for example, that a word is a proper noun, that its use is restricted in some way, that a particular utterance was said in a sarcastic manner, or that it was used to bring an interaction to a close). Chapters 5 and 6 focus on issues of mark-up – the process of making texts machine-readable in ways which facilitate the interchange of data between users. The final chapter is rather different in nature – it is an edited transcript of an unscripted talk delivered interactively at the Lancaster Workshop on Computerized Spoken Discourse, held in September 1993. In this chapter, John Sinclair responds to the issues raised in the previous chapters. If we were constructing corpora in an ideal world, the issues raised in the first six chapters regarding delicacy of transcription and coding and detailed mark-up might all be taken on board. However, Sinclair speaking from his experience of many years working with large corpora of spoken language, discusses how in practice issues of cost and usability affect the transcription, coding and mark-up of very large corpora.
Three of the contributors to this section draw attention to Ochs’s (1979) paper, in which she observes that transcription (and to transcription we can now add coding) becomes data. The use of the computer, in spite of the many advantages and new possibilities which it opens up, does not resolve the problems of the relationship between the original speech event and the transcription, nor does it obviate the problem of representing spoken language in written form (indeed, in some ways it exacerbates these problems). Decisions made at the time of transcription and coding will affect the entire process of data analysis. Carefully thought out transcription can greatly aid analysis; poor decision-making early on can make the subsequent analysis difficult or worthless. Chapters 1 to 4 raise many issues which need to be taken into account when transcribing and coding a corpus.
In the first chapter, Jane Edwards focuses in particular on issues of coding. In a discussion which will prove invaluable to corpus researchers for years to come, she examines the principles underlying the design and implementation of transcription and coding, the principles of designing coding categories, the implementation of coding (applying the design to the data), ways of optimizing readability for human users while at the same time creating a system which is computationally tractable.
In Chapter 2, Guy Cook argues that while the use of the computer offers new possibilities for the researcher (particularly in terms of data retrieval and statistical analysis) it does not solve the initial problem of representing spoken language in written form. Underlying everything must be a sound theory and practice of transcription. He warns against treating speech as if it were writing. In particular he notes the tremendous importance of including contextual and other information when dealing with spoken data and the danger of focusing on purely linguistic features, at the expense of discourse phenomena, simply because the former are easier to handle. In discourse analysis and pragmatics we are dealing not with words alone, but with utterance-context pairings – how something is said, and the context in which it is said, may be as important as the words themselves. In relation to this Cook discusses the problems of how to represent paralinguistic and other non-linguistic features as well as background knowledge, while at the same time being aware of the problems of producing transcriptions which are so elaborate that the user becomes lost in a welter of detail. He argues (cf. Burnard and Johansson in Chapters 6 and 7) that it would be a mistake to assume that elaborate coding systems mean that we now have everything under control – many issues still remain to be resolved.
In Chapter 3, Wallace Chafe picks up many of the issues raised in the first two contributions. He is concerned with the representation of spoken language in a written format which optimizes its usefulness to the human reader. Transcription of spoken language is done for the specific purposes of the original transcriber, but ideally should be usable by a broad range of other users. Like Edwards, Chafe stresses the importance of building on the interpretive skills readers already have, and to this end discusses ways in which transcriptions can exploit such features of written language as different fonts and iconic symbols. He discusses in detail features of intonation and how to represent them and the importance of distinguishing between what can be measured (e.g. the precise length of a pause) and what is actually significant to participants in the original interaction and to the analysts of that interaction. Finally, in a discussion which looks forward to issues raised in Parts B and C, Chafe suggests that many of the problems raised so far can be mitigated by issuing corpora on CD-ROM which can also include the original recording in addition to the transcription and other information (such as digitized waveforms).
In Chapter 4, James Monaghan focuses on the importance of considering the end-user of the corpus and the importance of designing corpora in such a way that it is possible to access whole text structures, as well as lower level phenomena.
Chapters 5 and 6 deal with issues of transcription, coding and mark-up as they relate specifically to electronic storage and data interchange. Lou Burnard discusses in detail the requirements for encoding all types of text in order to conform to the requirements of the Text Encoding Initiative (TEI), regardless of the domain of application, or of the hardware and software the individual may be using. Johansson, in Chapter 6, deals specifically with the distinctions necessary for representing spoken discourse so that it conforms with TEI requirements. He argues that there is no necessary conflict between what are often seen as the very demanding requirements of TEI-conformant mark-up and the limited resources of the individual corpus-builder, nor between TEI and a reader-friendly/transcriber-friendly system. Provided the necessary software is developed, the underlying TEI representation can be transferred into any form convenient for an individual project.
In Chapter 7, the final chapter of this section, John Sinclair voices the worries of people involved in constructing large corpora who are alarmed by the demands of making their transcriptions TEI-conformant. These worries can be grouped under three main headings:
1. Picking up the final point made in Johansson’s chapter, Sinclair raises the question of transcriptions for humans, versus those suitable for machines. Like Cook, Sinclair is concerned that end-users will become lost in a welter of detail. Several contributors to the conference raised the possibility of associating transcription with either waveforms or sound recordings by means of hypertext or CD-ROM (see Johansson, Roach, Chafe and Cook, this volume) thereby offering access to greater detail as an option, although the relevant software and hardware are not yet widely available.
2. If TEI-conformant transcriptions are difficult to read to most users, their production also makes totally unrealistic demands on most transcribers. For those involved in the production of very large corpora (and it must be remembered that the size of the corpus is not a trivial matter, but crucially affects the types of linguistic generalizations and claims which can be made) the cost-effectiveness of TEI must be challenged. Although it is clearly of great importance that the basic data be available for other researchers to use, is it really the case that others will want to use your corpus annotations?
3. Sinclair challenges the way in which the requirements of TEI will operate in practice. His worry is that instead of individuals being able to operate within the inherent (indeed, almost unlimited) flexibility of TEI, as outlined in Chapters 5 and 6, we shall in practice be forced to operate within a very limited subset. We shall end up distorting our data in order to fit it into a straitjacket designed by computer buffs. Sinclair argues strongly in favour of the much weaker notion of compatibility (rather than conformity) with TEL
Like many contributors to this book and to the conference, John Sinclair underlines the need for software interpreters to be produced – not just an interpreter which will render your TEI marked-up text readable to ordinary users, but one which will translate ‘ordinary’ transcripts into TEI format. Mark-up must remain user-friendly, without costing too much, and this is undoubtedly the direction in which things will develop in the future. These issues are not only of interest to designers o...

Índice