Introducing Electronic Text Analysis
eBook - ePub

Introducing Electronic Text Analysis

A Practical Guide for Language and Literary Studies

  1. 176 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Introducing Electronic Text Analysis

A Practical Guide for Language and Literary Studies

About this book

Introducing Electronic Text Analysis is a practical and much needed introduction to corpora – bodies of linguistic data. Written specifically for students studying this topic for the first time, the book begins with a discussion of the underlying principles of electronic text analysis. It then examines how these corpora enhance our understanding of literary and non-literary works.

In the first section the author introduces the concepts of concordance and lexical frequency, concepts which are then applied to a range of areas of language study. Key areas examined are the use of on-line corpora to complement traditional stylistic analysis, and the ways in which methods such as concordance and frequency counts can reveal a particular ideology within a text.

Presenting an accessible and thorough understanding of the underlying principles of electronic text analysis, the book contains abundant illustrative examples and a glossary with definitions of main concepts. It will also be supported by a companion website with links to on-line corpora so that students can apply their knowledge to further study.

The accompanying website to this book can be found at http://www.routledge.com/textbooks/0415320216

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Introducing Electronic Text Analysis by Svenja Adolphs in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

1
Introduction

The field of electronic text analysis has been expanding rapidly over the past decades. This is partly due to advances in information technology and software development, but also as a result of the growing interest in using electronic resources to complement more traditional approaches to the analysis of language and literature. The improved accessibility of computers has added to the increasing popularity of electronic text analysis, especially in the higher education context. The development of principled collections of electronic texts, also called corpora, has allowed a systematic exploration of recurring patterns in language in use, and this has become one of the main areas of enquiry in the emerging field referred to as corpus linguistics.
With courses and modules in corpus linguistics and computer-aided language analysis currently being offered in many university departments across the country, there is also a growing emphasis on integrating electronic tools and resources in analyses of literary works. At the same time, electronic text analysis is increasingly being utilised as a tool in a range of applied contexts, for example in the area of language teaching or the study of language and ideology. These areas of investigation make use of a range of methodologies that have originally been developed in the area of corpus linguistics with the aim of enhancing language description.
This book combines the description of a range of approaches and methodologies in this field with a discussion of a number of areas of language study in which electronic text analysis is being used, often by way of complementing more traditional, analytical approaches. The main aim throughout the book is to introduce key ideas and methodologies and to illustrate these, where appropriate, through attested examples of language data. The book is primarily intended for the non-expert user who wishes to draw on some of the methodologies developed in the field of corpus linguistics for the purpose of analysing electronic texts.

Electronic text analysis: corpus linguistics by another name?

There are a number of terms that describe traditions and methodologies of computer-aided language research. They include, amongst others, corpus linguistics, Natural Language Processing (NLP) and Humanities Computing. The differences between these approaches lie in their overall research goals, the types of texts that they draw on, and the way in which the texts are analysed. While the methodologies described in this book are derived mainly from the corpus linguistic tradition, they are also applied to problems and texts that are not normally at the heart of this tradition. The term electronic text analysis has been adopted to reflect the different priorities in terms of data sources and research processes when we compare corpus linguistics as a tradition with other areas of computer-aided language research. As such, the term electronic text analysis has been chosen for its inclusive and broad meaning that relates to the analysis of any digitized text or text collection.

Research goals

To illustrate just some of the kinds of different orientations found in the diverse range of areas that use electronic text analysis, we will consider the examples of Natural Language Processing (NLP) and Humanities Computing in more detail. NLP is often geared towards developing models for particular applications, such as machine translation software for example. Sinclair (2004b) makes a useful distinction between description and application in this context. Language description here refers to the process of exploring corpus data with the aim of developing a better understanding of language in use, while an application refers to the deployment of language analysis tools with the aim of producing an output that has relevance outside of linguistics. Sinclair (2004b: 55) notes that the end users of language description are predominantly other linguists who are interested in empirical explorations of the way in which language is used. The end users of linguistic applications on the other hand are not necessarily linguists. They may be people who are simply users of the developed application, such as a spell checker or a machine translation system that has been developed on the basis of a textual resource. The research goal in this case is the successful development of an application rather than the comprehensive description of language in use. This distinction marks one of the differences in orientation between corpus linguistics and NLP.
Humanities Computing tends to be concerned with enhancing and documenting textual interpretations, often within a hermeneutic tradition. A number of specialist journals have emerged in this area, including Computers and the Humanities, and a substantial amount of research is devoted to making processes of textual interpretation more explicit to the research community by way of various types of documentation. Burnard (1999) highlights the need for this process:
[…] because the digital world so greatly increases access to original unmediated source material (or at least a simulation thereof), the esoteric techniques developed over the centuries in order to contextualise and thus comprehend such materials will need to be made accessible to far more people. We urgently need to develop new methods of doing textual editing and textual exposition, appropriate to the coming digital textual deluge.
All of the fields above analyse electronic, i.e. digitised, text(s) and use, where appropriate, software tools to do so.

Textual resources

One of the main differences between the various traditions in electronic text analysis lie in the nature of the textual resources and in the way in which they have been assembled to become an object of study. A corpus tends to be defined as a collection of texts which has been put together for linguistic research with the aim of making statements about a particular language variety. Biber et al. (1998:4) point out in this context that a corpus-based approach ā€˜utilises a large and principled collection of natural texts, known as a ā€œcorpusā€, as the basis for analysis’.
A single text might not be able to provide a balanced sample of any one language variety. The same applies to other texts that may exist in electronic format but have not been assembled to represent a principled sample of a language variety, such as an e-mail message, for example, or the world wide web. These can, of course, be assembled in a principled way and turned into a corpus for linguistic study. We will return to a discussion of the world wide web as a corpus in chapter two.
As far as the nature of the textual resource is concerned, there are core differences between naturally occurring discourse versus discourse that has been produced under experimental conditions, and between large-scale and small-scale texts and text collections. Since people who work in the discipline of corpus linguistics are often interested in the exploration of social phenomena, such as the relationship between patterns of usage and social context for example, naturally occurring discourse is required as the basis of any study. In order to be able to extract patterns from this type of discourse, the textual resources need to be substantial in size for the corpus linguist. This point takes us to the next issue.

Types of analysis

The way in which the corpus linguist approaches a text is through secondary analysis of concordance lines and frequency information (see Sinclair 2004a: 189). The close reading and interpretation of a single text is not the primary concern of the corpus linguist; instead the core research activity is the extraction of language patterns through the analysis of suitably sorted instances of particular lexical items and phrases (see Sinclair 2004a). This is not necessarily the approach taken by the Natural Language Processing (NLP) researcher, nor the humanities researcher, who will, respectively, analyse texts in a way that facilitates the development of specific software applications or process textual information as part of an often multi-faceted framework for textual interpretation. As such, the humanities researcher might be very familiar with a particular novel that they study but still make use of frequency counts to gather further quantitative information about the text.
The term electronic text analysis has been chosen as a broad title because the types of analyses discussed in this book draw on elements of various different approaches, albeit with a strong bias towards corpus linguistics techniques. These include the analysis of single texts to facilitate literary interpretation (chapter five), the investigation of lexical items within a corpus to better understand how ideology is encoded in language (chapter six), the exploration of corpus data for English language teaching applications (chapter seven) and the close reading of extended stretches of naturally occurring discourse (chapter eight). However, the main focus of the book is on the way in which different methods in electronic text analysis can facilitate the study of language in a range of different contexts.

A brief background to techniques in electronic text analysis

Electronic text analysis can be used to organize textual data in a variety of ways, such as through the generation of frequency information or through the representation of individual words or phrases in a concordance format. Both of these techniques will be discussed in more detail in chapters three and four respectively and the sections below are merely aimed to provide a brief background.

Frequency lists

Many of the techniques used in the electronic analysis of texts originate from manual procedures of text analysis, which were used long before the more recent advent of computer technology. Thorndike (1921), for example, gathered frequency information of individual words in a set of texts by manually counting each word form. His frequency list was based on a corpus of 4.5 million words from over 40 different sources and informed the Teacher’s Workbook (Thorndike 1921), later superseded by The Teacher’s Workbook of 30,000 Words (Thorndike and Lorge 1944) which was based on a corpus of over 18 million words in total.
This work, and other similar projects that were carried out during the early part of the 20th century, had a pedagogic purpose in that the results were used to inform language instruction. Thorndike’s work la...

Table of contents

  1. Contents
  2. List of tables and illustrations
  3. Preface
  4. Acknowledgements
  5. 1 Introduction
  6. 2 Electronic text resources
  7. 3 Exploring frequencies in texts: basic techniques
  8. 4 Exploring words and phrases in use: basic techniques
  9. 5 The electronic analysis of literary texts
  10. 6 Electronic text analysis, language and ideology
  11. 7 Language teaching applications
  12. 8 Further fields of application
  13. Appendix 1 Transcription conventions and codes in the CANCODE data
  14. Glossary
  15. Bibliography of websites
  16. Bibliography
  17. Index