The frequency-driven approach to phraseology: lexical bundles
The growing interest in how language is patterned has been stimulated by corpus linguistics since 1990s. Corpora have shown that language use is highly patterned and that patterns are cognitively motivated (Stubbs 2004: 111). Thanks to its tools and methods, which facilitate studying recurrent patterns of language use, corpus linguistics has shifted attention from a word to a pattern – “phrase-like units, which are the basic unit of meaning” (Stubbs 2004: 118).
Corpus linguistics has not only rekindled interest in patterns and, hence, in phraseology but has also changed our understanding of phraseology.1 The category of phraseology has been redefined and extended to include new types of word combinations while pushing the hitherto central non-compositional members, such as proverbs, sayings and idioms, to the periphery due to their rare use in language, in particular in specialized genres. The traditional approach has been dethroned by the frequency-based approach, where phrase-mes are identified empirically through corpus-driven methods not only on the basis of their co-occurrence but, above all, their recurrence (high frequency) (cf. Granger and Paquot 2008: 28–32). The new centre is occupied by collocations and various types of frequent multi-word units, both continuous and discontinuous ones, such as lexical bundles, phrase frames, skipgrams and phrasal constructions (cf. Nesselhauf 2005: 12; Greaves and Warren 2010: 213). In contrast to the traditional categories which tend to have an ornamental and stylistic function (cf. Grabowski 2015: 82), multi-word units are systematically employed to perform important discourse functions, which will be discussed below.
The most commonly researched multi-word units in the frequency-based corpus-driven approach are lexical bundles, also referred to as clusters, n-grams, chunks or lexical phrases. Lexical bundles are identified solely on the frequency criterion (Biber and Barbieri 2007: 264; Hyland 2008: 6). Even though lexical bundles are very frequent, they are not “perceptually salient” (Biber 2009: 13). They are word sequences that co-occur “irrespective of their idiomaticity” – they are not always meaningful or grammatical units that are structurally complete (Biber et al. 1999: 58–59). Examples of lexical bundles in EU law include: referred to in Article, in accordance with the, of regulation EU No., having regard to the, for the purposes of, the European Parliament and, Member States shall ensure that. Lexical bundles are often transparent in meaning – “semantically transparent” (Cortes 2004: 400; Hyland 2008: 6). They are indicators of genre variation as they have been found to vary across genres (cf. Biber et al. 1999; Hyland 2008: 7).
Lexical bundles may be categorized according to formal criteria (length, structure) or functional criteria. The length-based categorization takes into account a number of constituents in a bundle: if it contains three words, it is referred to as a 3-gram; if four words, a 4-gram; if eight words, an 8-gram. The structural categorization is based on the grammatical structure of lexical bundles, depending on whether they contain noun, verb or prepositional phrases and clause fragments. As for the functional criterion, three main categories of lexical bundles have been identified with reference to the most frequently studied academic discourse: stance bundles, which communicate attitudes, and discourse organizers and referential bundles, which indicate entities and participants (Biber and Barbieri 2007: 265). In general, lexical bundles are “building blocks in discourse” – they provide familiar frames retrieved from memory which are filled in with new information: they are “a kind of pragmatic ‘head’ for larger phrases and clauses, where they function as discourse frames for the expression of new information” (Biber and Barbieri 2007: 270).
Lexical bundles in legal language
Lexical bundles have been extensively researched in academic genres (see e.g. Biber and Barbieri 2007 for an overview) with few studies into other specialized discourses, including legal language. In general, despite the high formulaicity of legal discourse, legal phraseology has not been a popular topic in legal language studies. This has started to change recently, triggered by the surge of interest in phraseology within corpus linguistics, which found its parallel in the legal domain, as attested, inter alia, by this volume. Trends in corpus research into legal phraseology have been classified by Goźdź-Roszkowski and Pontrandolfo into: (1) research into collocations; (2) research into routine formulae, (3) terminographically-oriented studies, (4) cross-linguistic studies of phraseology, including translation, and (5) semantics of legal patterns (2015: 133–134). Research into lexical bundles is subsumed under trend (2).
Lexical bundles do not fit the existing categorizations of legal phraseology. A traditional classification groups legal phrasemes, e.g., into: (1) multi-word terms, (2) collocations with a term and (3) formulaic expressions and standard phrases (Kjær 2007: 509–510). Another classification proposed specifically for the genre of legislation ranges from the global textual level to the local microlevel: text-organizing, grammatical and term-forming patterns as well as term-embedding and lexical collocations (Biel 2014: 36–48). Neither of these classifications embraces lexical bundles, which typically cut across all these categories, both structurally and functionally. Lexical bundles should be viewed as a distinct class of legal patterns in its own right, identified on the basis of frequency-based criteria (and thus incompatible with classifications based on other criteria).
As for lexical bundles in legal language, there are three noteworthy contributions which apply this method: papers by Jablonkai (2010) and Breeze (2013) and a book by Goźdź-Roszkowski (2011). The publications focus on how lexical bundles vary across English-language legal genres in three legal systems: the EU, England and Wales,2 and the US, respectively.
Starting chronologically, Jablonkai’s (2010) study into English-language EU discourse is based on a mixed-genre corpus3 for ESP purposes and analyzes the corpus of EU genres as a whole against the British National Corpus (BNC) (Sampler, Academic, News, Fiction sections) rather than against a reference corpus of a comparable genre, i.e. a UK legal/administrative corpus. For this reason, Jablonkai’s findings cannot be separately related to individual genres, e.g. EU law only, but concern EU administrative discourse in general. The study shows the high formulaicity of EU discourse against the reference corpora attested by the excessively high number of lexical bundles (2010: 258). The EU corpus contains twice as many bundle types and six times as many tokens as the Academic prose section of the BNC; these rates are even higher compared to the fiction, news and general sections of the BNC (2010: 258). As for structural properties of EU bundles, bundles with noun phrases and prepositional phrases dominate the list (80%), but there is also an untypically high number of verb phrase bundles against the reference corpora (2010: 260). As for functional properties of EU bundles, drawing on Cortes (2004) and Hyland (2008), Jablonkai extends Biber’s classification to include subject-specific bundles (i.e. context-dependent bundles, topic bundles) and refines the category of referential bundles by quality specification and intertextual bundles (2010: 260–261). Jablonkai finds that ...