1 Introduction: A corpus-based approach to the study of discourse presentation in written narratives
1.1 Introduction
We hope that this book will be of interest to at least two different kinds of linguists: (i) textlinguists (e.g. stylisticians and critical discourse analysts) who are involved in the analysis of discourse presentation in written and spoken language, and (ii) corpus linguists or other linguists who are interested in developing dedicated electronic corpora to elucidate textual phenomena. As we try to take both of these main readerships into account, we may, to some degree, tell one readership what it already knows. We apologize in advance if we sometimes do this, and we will try to keep such descriptions to a minimum. Nonetheless, we think it helpful to try to draw the textlinguistic and corpus traditions closer together through this specific study.
Our book describes the research on discourse presentation in written narratives we have been involved in since 1994, and which is still ongoing.1 This work has involved the systematic and detailed annotation of a corpus of written fictional and non-fictional narratives for speech, writing and thought presentation categories, in order to throw light on discourse presentation theory and on how patterns of discourse presentation vary in three different written narrative genres (fiction, news reports and (auto)biographies).
Since 1996 we have published seven articles and book chapters on our work.2 However, because these articles are spread through different books and journals, it is difficult for scholars to access the reports of the work we have undertaken. This volume, which draws from parts of these articles but also contains new material, is a summation of our work to date ā work which aims to offer insights in relation to the study of discourse presentation in texts and to what is a relative innovative methodology for textlinguists. We will also use this book to consolidate what has been for us a constantly developing method of textual annotation and theory building. Because our research project has evolved over time, our articles to date have some descriptive and annotational inconsistencies among them. We have gradually changed some of the terms and annota-tions we have used as we have come to grips with new discourse presentation phenomena in our data. These inconsistencies may well have been confusing for those who have read more than one of our articles, and this volume provides an opportunity to explain the changes we have made and our reasons for making them, and to arrive at a reasonably stable set of descriptive terms and annotations for further research. We do not, of course, assume that our work to date is the end of the story in descriptive, annotational, analytical or theoretical terms.3 We hope that others might be interested in applying the analytical methods we have developed to yet other spoken and written genres/text types,4 to see how well our approach works for these other genres and how the patterns of discourse presentation in these genres compare with those we have analysed.
Before we proceed further, it will be helpful if we make some points about our use of terminology in this book. We have used the term ādiscourseā in the discussion above for two reasons. First, we sometimes need a general, and briefer, term to refer to what we otherwise call āspeech, writing and thought presentationā (SW&TP).5 We will strive to use the term ādiscourse presentationā only in this general, overarching sense. Our second reason for using the term was that we wanted to connect our work to that of other scholars who have written about the way in which the discourse of others is presented, and who often use the term ādiscourse presentationā for this enterprise. However, we are conscious of the fact that the term ādiscourseā is often used vaguely and/or with somewhat different meanings by different scholars. We have pointed out before (Short et al. 2002) that one of the dangers of the term ādiscourse presentationā is that, if it is used as an elegant variant of the more specific terms āspeech presentationā, āwriting presentationā and āthought presentationā, it is possible to move seamlessly from the discussion of one mode of presentation to another without making the change clear to oneself, or to others. This in turn can lead to mis-analyses and a less accurate understanding of the phenomena under investigation. We believe that, although there are commonalities among speech presentation, writing presentation and/or thought presentation, there are also important differences which are unhelpfully hidden if the general term ādiscourse presentationā is used as an alternative for these more specific, mode-related terms and concepts. Hence, when discussing specific discourse presentation phenomena, we will strive to use the more specific terms and not to use the general term as a substitute for them.
The other term which we have already made considerable use of is āpresentationā. We use this term as a default, rather than āreportā or ārepresentationā (which are often used as default terms by other linguists), because we are specifically interested in how the discourse of others (or the speaker/writer on some previous occasion) is presented. This is what textual annotation and analysis can most sensibly be used for (and explains why stylisticians tend to use this term). We prefer not to use the term āreportā, which is often used as a default by grammarians (e.g. Hud-dlestonand Pullum 2002: 1023ā30; Quirk et al. 1985: 1020ā33) and other linguists who are part of a tradition where examples are invented when discussing discourse presentation. This is because the term āreportā suggests an unproblematic relationship between the discourse presentation and the anterior discourse which is being presented. Tannen (1989), among others, has shown that an assumption of faithful report for direct speech presentation in casual conversation is unrealistic (yet interestingly she uses the term āreportā even when undermining this assumption). However, we do not want to use the term ārepresentationā as a default either, as this tends to be used by linguists (e.g. critical discourse analysts like Caldas-Coulthard 1994 and Fairclough 1988) who want to concentrate mainly on distortions and misrepresentations in the reporting of anterior discourses. āPresentationā is thus helpfully neutral for the discussion of speech, writing and thought presentation in a corpus of written texts where, for the most part, we do not, in any case, have easy access to the anterior speech, writing or thought being presented. We discuss this issue of terminology in more detail in Short et al. (2002).6
Many studies have proposed models of the forms and functions of discourse presentation in a range of text-types (e.g. Bally 1912a, 1912b; Ban-field1982; Collins 2001; Fairclough 1988; Fludernik 1993; Fowler 1986; McHale 1978; Pascal 1977; Tannen 1989; Thompson 1994, 1996; Volosi-nov1973; Waugh 1995; see also papers in Coulmas 1986 and Lucy 1993). The original motivation for our corpus-based study of discourse presentation, however, was to test how well the particular model of speech and thought presentation outlined in Leech and Short (1981: Ch. 10) worked on written text types other than the novel. The Leech and Short model was developed specifically to account for the range of speech and thought presentation forms and their effects in novels written in English. We wanted to test this model, not only because one of us has a rather obvious personal interest in it, but also because (i) it is still the most analytically specific account of speech and thought presentation to date, and (ii) it has been influential and widely used by other textlinguists.
Many analysts of prose fiction, including Fludernik (1993: 283ā316, passim) and Simpson (1993: 21ā30), have discussed the Leech and Short approach. Person (1999: 28ā37) and Toolan (2001: 136ā40) also include discussions of some of our more recent work referred to above. A number of studies have also applied the Leech and Short approach to non-literary texts. McKenzie (1987) uses Leech and Short to analyse how free indirect speech was used to circumvent a ban on direct quotation of the ANC in a booklet by South African students, and Roeh and Nir (1990) use it in the analysis of Israeli radio broadcasts. Thompsonās (1996) account of the dimensions of choice available to speakers or writers when reporting the language of others also draws on the Leech and Short model, which he describes as ācomprehensive in its coverageā and ā[t]he most fully developedā of the various approaches to speech and thought presentation (Thompson 1996: 504).
1.2 Why a corpus-based approach?
The Leech and Short model, like all theoretical models in stylistics up to that point, was developed through the use of scholarly intuition, based on extensive personal reading experience, which was in turn exemplified and tested through the analysis of examples chosen from previous reading. The model was also designed to account specifically for speech and thought presentation in fictional texts (indeed, most of the discourse presentation work by stylisticians and narratologists has concentrated on fiction). Hence it was difficult to know how generalizable the model was to other text-types, or how descriptively adequate it was when ātested to destructionā on texts (including fictional texts) in a way that could not avoid inconvenient or borderline cases. It was for this reason that we decided to develop and annotate a dedicated corpus to test out the model.
We should also point out that some of the non-corpus work on discourse presentation which has already been completed has been based on the accumulation of very large numbers of examples accrued from previous reading. Specific mention should be made here of the monumental work of the narratologists Cohn (1978) and Fludernik (1993). We have benefited considerably from these two very insightful works. Cohn grounded her analysis of what we would call thought presentation through the accumulation of a manually collected corpus of examples:
Equipped with these basic abstractions [of narrative theory] I could then travel around in narrative literature, selecting works and passages in works that would best display the entire spectrum of possibilities, while in turn allowing these works themselves to reveal unforeseen hues.
(Cohn 1978: v)
Cohnās motivation is not unlike ours, except that we want to compare discourse presentation across text types, including narrative fiction, and want to be much more explicit about our criteria for text selection, as well as being more explicit and systematic in our analysis of the texts in our corpus. Cohn was writing before computers could be used to store and interrogate large corpora of texts, of course, and we could well imagine that if she were beginning her work now, she might also want to make use of an electronic corpus, as we have.
Fludernikās (1993) study of what she calls free indirect discourse is even more impressive in terms of the wide range of textual examples she uses to illustrate the points she wants to make. We have learned much from her work but, as with Cohnās study, we were concerned that her relatively informal analytical approach might mean that important factors in the study of discourse presentation would be missed. In her research, Flud-ernikspecifically considered the possibility of a corpus-based approach, and the quantification that comes with it, but rejected this option (i) because she did not want to restrict herself to the literature of just one language, nation, period, etc., which she thought a corpus-based approach would prevent, and (ii) because she believed that a corpus and its associated annotation would have created serious methodological problems, in the sense that she thinks it would have been necessary to āinstitute arbitrary definitions of the relevant categoriesā (Fludernik 1993: 9):
Such arbitrariness would necessarily have resulted in an erosion of the actual usefulness of the statistical data, since one would have had either to decide on larger categories that include marginal and ambiguous phenomena, or to indulge in a proliferation of subcategories and intermediary categories which would have rendered the statistics next to useless for interpretation. From previous experience with statistical research (Fludernik 1982) I have also acquired a profound distrust of the methodological relevance of statistical data. Statistics typically take individual occurrences of certain phenomena out of context. Since the present study attempts to document the crucial importance of context for the purpose of the even preliminary establishment of basic categories, a statistical approach would from the outset have vitiated one of the major aims of the project. These remarks are, however, not meant to discredit statistical research in itself. On the contrary, I would welcome a series of statistical analyses that might help to corroborate, modify or refute some of the theses I am here proposing.
(Fludernik 1993: 9)
We have quoted from Fludernik at length because we have effectively tried to do what she decided to avoid, namely to use a set of categories and subcategories to analyse the textual extracts in our corpus comprehensively and systematically. Consequently, we certainly recognize some of the problems she points to, though we think that the annotation difficulties have not been as damaging as she thought they would be. Indeed, we would claim that forcing ourselves to be as clear and precise as possible about our annotations has helped us to isolate, and come to terms with, phenomena we may not otherwise even have noticed. Similarly, we believe that forcing ourselves to account for ambiguity and marginal phenomena in our annotations has helped us to understand more exactly how the speech, writing and thought presentation scales operate, and what factors are at work in producing ambiguity on those scales. Because we take this explicit analytical approach, we are able to provide some of the statistical information which, at the end of the above quotation, Fludernik says that she would welcome.
We very much agree with Fludernik that statistical analysis has limitations as well as advantages, and this is why we present both quantitative and qualitative analysis in this book. We do not think that the one precludes the other (though doing both does increase the workload still further, as, from experience, we are very well aware). Indeed, we would want to argue that both forms of analysis are needed, and work best when used interdependently. Although Fludernik decided not to adopt a corpus-based and quantitative approach (the experience of the dissertation she refers to as Fludernik 1982 was clearly salutory!), she makes a point of saying that she is not antipathetic to such work. She is very open to the fact that all approaches have advantages and disadvantages, and that we can all learn from different approaches to the same phenomenon. This tolerant and inclusive attitude is in contrast to the attacks on corpus linguistics by some other linguists, which we allude to briefly below.
It was natural for us to move to a corpus-based approach as we work in a department which has members who have been involved in corpus construction and annotation for some years, and who could easily be called upon for advice and help. The LancasterāOslo/Bergen (LOB) corpus was one of the early modern linguistic corpora to be developed; Lancaster is the āhomeā of the British National Corpus (BNC), for which Lancaster did much of the work, and our colleagues are involved in the building and exploitation of other corpora too. However, not all linguists are sympathetic to a corpus-based approach, and so we will take a little space here to explore some of the pros and cons in the use of electronic corpora, to help explain our decision to develop our corpus and to use ācorpus stylisticsā as the main title of this book.
The first point that we would like to make is that although this book, and much of our current work, involves the use of a corpus-based approach in stylistics, we do not think that this approach should supplant other work within our field. Rather, our decision to use a corpus-based approach was because it was the best tool we could find to carry o...