All Roads Lead to Rome?
Around 20 bc, Emperor Caesar Augustus erected the Golden Milestone, a monument in the central forum of Ancient Rome. Such was the power of the Empire that all roads were considered to begin from it and distances were measured from that point, resulting in the still-used phrase (or variations upon it) âAll roads lead to Romeâ. Today the phrase is not used literally, but more colloquially it means âit doesnât matter how you do it, youâll get the same resultâ.
In this book, we set out to test the applicability of the phrase to corpus linguistics, a method (or collection of principles and procedures) which uses large collections of naturally occurring language texts (written, spoken, or computer mediated) that are sampled and balanced in order to represent a particular variety (e.g. nineteenth century fiction, twentieth century news articles, twenty-first century tweets). In dealing with real, often unpredictable and linguistically âmessyâ texts, corpus linguistics differs from introspective methods of analysis which can rely on neater but somewhat artificial-looking cases of language such as âthe duke gave my aunt this teapotâ (Halliday & Matthiessen 2004: 53).1 It also differs from more traditionally human-based qualitative research in that extremely large numbers of texts are analysed with the help of computers which process data and carry out statistical tests in order to identify unexpectedly frequent or salient language patterns. However, it would be unfair to cast corpus linguistics as a merely quantitative form of analysisâthe patterns need to be interpreted and explained by human researchers, and this involves close reading of the texts in a corpus, often abetted again by corpus tools which can present the texts or sections of them in ways that make it easier for human eyes to process. As Biber (1998: 4) points out, âAssociation patterns represent quantitative relations, measuring the extent to which features and variants are associated with contextual factors. However functional (qualitative) interpretation is also an essential step in any corpus-based analysisâ.
Corpus linguists are thus able to make fairly confident generalisations about the varieties of language they are examining based on the combination of automated and human elements to the analysis. The automated side helps to direct the human researcher to aspects of the corpus that he or she may not have thought interesting to look at (a form of analysis which Tognini-Bonelli (2001) called corpus driven), but it can also help to confirm or refute existing researcher hypotheses (referred to as corpus-based analysis). Partington, Duguid, and Taylor (2013: 9) refer to the serendipity of corpus research as
the chancing upon hitherto unforeseen phenomena or connections ⌠Evidence-driven research is highly likely to take the researcher into uncharted waters because the observations arising from the data will inevitably dictate to a considerable degree which next steps are taken.
Does an uncharted corpus contain a set of discoverable âfindingsâ, possibly a finite number, or at least a smallish subset which most people would concur are particularly notable or even unexpected (the opposite of so-called âso whatâ findings (Baker & McEnery 2015: 9)). And if so, is it the case that, assuming the analyst has a reasonably high degree of experience and expertise, the procedures of corpus linguistics will direct everybody to the same set of salient findings, the same serendipities? In other words, if we vary the procedure and analyst, but keep the research question and the corpus the same, are we likely to obtain similar outcomes? For corpus linguists, do all roads really lead to Rome?
Why would it matter if they donât? A key advantage of corpus linguistics over other forms of analysis is that the computational procedures are thought to remove human cognitive, social, or political biases which may skew analysis in certain directions or even lead to faulty conclusions. Unlike humans, computers do not care about what they study, so there is no chance that their findings are misguided by conviction (âI know itâs true; it must be true!â). Nor do computers make errors due to fatigue or boredom. It is tempting to view corpus linguistics in a similar light to âhardâ sciences such as biology or chemistry, where phenomena can be objectively measured. Potassium placed in water will always result in potassium hydroxide and hydrogen gas. There is a sense of reassurance about the replicability of that kind of researchâfacts are facts. And with its reliance on scientific, empirical notions of sampling, balance and representativeness in corpus construction, along with the certainty that our tools and procedures of analysis will not mislead us, corpus linguists might not be blamed if they experience a robust feeling of confidence about the validity of their findings.
But what if this is not the case? What if say, ten people, all with their own favourite way of doing corpus linguistics, all good at what they do, were given the same corpus and research question and asked to produce an analysis. What if they all found different things? Even worse, what if their findings disagreed? Would that render the method unworkable? Or would it tell us something interesting in itself? These are the issues which inspired this edited collection, and we explore them by carrying out a comparison of ten analyses of the same corpus in order to see the extent to which all roads actually do lead to Rome. Such a meta-analysis hopefully provides a clearer picture around questions of analytical objectivity and also should give insight into what individual techniques can achieve and how they may complement one another. A different way of looking at this collection of chapters, though, is that they also work as analyses in themselves of a corpus of an âemergingâ register of language. They tell many interesting things about how people who have learnt different varieties of English use this form of language in ways that reflect aspects of their identity and culture.
Triangulation
Triangulation is a term taken from land surveying which uses distance from and direction to two landmarks in order to elicit bearings on the location of a third point (hence completing one triangle). According to Layder (1993: 128), methodological triangulation facilitates validity checks of hypotheses, anchors findings in more robust interpretations and explanations, and allows the researcher to respond flexibly to unforeseen problems and aspects of the research. Such triangulation can involve using multiple methods, analysts, or datasets, and it has been used for decades by social scientists as a means of explaining behaviour by studying it from two or more perspectives (Webb, Campbell, Schwartz & Sechrest, 1966, Glaser & Strauss 1967, Newby 1977, Layder 1993, Cohen & Manion 2000). Most contemporary corpus linguists employ triangulation to an extent in their own research by, for example, using different techniques on their corpora. However, the potential benefits of triangulating the results of two or more corpus-linguistic methods have been largely unexplored.
This book is not the first study of this kind, although it is the largest study of triangulation in corpus linguistics that we are aware of. Prior to our study, there existed a collection by van Dijk and Petofi (1977) which contained multiple analyses of a short story called âThe Lover and His Lassâ by James Thurber. Grimshaw (1994) contains a collection of analyses of a transcript of a thesis defence, while another collection by van den Berg, Wetherell, and Houtkoop-Steenstra (2004) involves 12 chapters which each analyse the same transcript of interview data about race. Grimshawâs book is the only one which attempts to synthesize the analyses in a concluding chapter called âWhat Have We Learnt?â, but all three collections consist of qualitative analyses of relatively short texts and do not involve corpus-based methods.
It is worth describing two related pieces of research in more detail before moving further on, as they function as unintended pilot studies to this one, both involving corpus-based critical discourse analysis of newspapers. First, Marchi and Taylor (2009) separately carried out critical analyses of a newspaper corpus, asking the question âhow do journalists talk about themselves/each other and their profession in a corpus of British media texts?â In comparing their results, they noted a range of convergent (broadly similar), complementary (different but not contradictory), and dissonant (contradictory) findings. An example of a dissonant finding was that one analyst concluded that journalists tend to talk about themselves, while the other noted that they do not talk about themselves but instead refer to other newsmakers. Both journalists had convergent findings relating to notions of good and bad journalism, while complementary findings were cases where analysts focused on related but different aspects of the corpus data. For example, one analyst noted a number of metaphors which constructed journalists as beasts, e.g. press pack, feeding frenzy, while the other pointed out that journalism is a highly reflexive activity, talking about and to itself. These two findings function together as pieces of a larger picture. Marchi and Taylor (2009: 18) conclude that âthe implementation of triangulation within a research study in no way guarantees greater validity, nor can it be used to make claims for âscientificâ neutralityâ.
Second, one of the authors of this chapter carried out a slightly larger pilot study (Baker 2015), giving five analysts a newspaper corpus about foreign doctors and asking âHow are foreign doctors represented in this corpus?â While Marchi and Taylorâs meta-analysis was more qualitative and reflective, this study attempted to impose an element of quantification onto the analysis by counting and comparing findings. Of the 26 separate findings identified across the reports, all five analysts agreed on only one: the finding that foreign doctors were criticized (and thus seen as unwanted) for having poor language skills. A further five findings were s...