1.1 The universe of discourse
Repetition of textual elements is more than a superficial phenomenon. Repetition may even be considered as constitutive for units and relations in a text: on a primary level when no other way exists to establish a unit – as in a musical composition (a motif can be recognised as such only after at least one repetition) – and on a secondary, artistic level, where repetition is a consequence of the transfer of the equivalence principle from the paradigmatic axis to the syntagmatic one (cf. e.g., Jakobson 1971).
For our purposes, we do not presuppose any specific definition of the term text. Any meaningful spoken or written sequence of linguistic units in a natural language can be analysed by the methods presented in this volume. We do not even presuppose fundamental properties such as cohesion and coherence; on the contrary: there are objective methods which can be used to find such properties. In our examples, we will analyse only written texts, which makes the procedures easy to follow and to reproduce by the reader.
We will use the term textual unit to denote any phenomenon in a text which can be defined in an operational way, i.e. a phenomenon which can be identified unambiguously on a set of criteria and whose properties can be measured. Here is a list of some commonly considered textual units:
| Character | Phrase | Metrical foot |
| Grapheme | Bar | Lemma |
| Phoneme | Clause | Lexeme |
| Syllable | Sentence | Syntactic construction |
| Morph, Morpheme | Paragraph | Hreb |
| Word-form | Sememe | Motif |
Numerous textual units can be found in works on text-linguistics such as Koch (1969, 1971), Daneš / Viehweger (1977), Gottman / Parkhurst (1980), Dressler / de Beaugrand (1981), Hřebíček (1997, 2000). All such units possess (or better: can be ascribed) an unlimited number of properties, with respect to which the researcher can classify the units. Some examples of properties of the word HOUSE are the following ones:
| Property | Value |
| Length in characters | 5 |
| Length in syllables | 1 |
| Length in morph(eme)s | 1 |
| Part-of-speech | Noun |
| Number of meanings (polysemy) | n |
| Number of metaphorical meanings | k |
| Polytextuality (no. of texts in a corpus in which the unit occurs) | t |
| Number of compounds (productivity) | m |
| Number of derivations (productivity) | d |
| Inflectional paradigm | regular |
| Relative frequency | p |
| Origin | Germanic |
| Number of contexts (polytextuality) | r |
| Number of synonyms | x |
| Number of variants in a dialect atlas | y |
Every unit can also be considered from the point of view of its function in the discourse and with respect to other relations. Here, some of them are listed:
| Grammatical function | Reference | Co-reference |
| Anaphora | Cataphora | Poetic figure |
| Speech Act | Metaphor | Argumentation |
Each of them can be subdivided into individual kinds or categories, such as the argumentation relations for instance into {background, circumstance, non-volitional cause, volitional cause, contrast, condition, concession, evaluation, elaboration, …} or speech acts e.g., (after Searle 1969) into {assertives, directives, commissives, expressives, declarations}. Thus, not only the units can be studied but also their properties, i.e. classes of units sharing one or more properties, and their functions. Generally, the term repetition will be used in the sense of occurring several times in a text. There are various reasons why a textual unit may occur repeatedly in a text:
- – Limitation of inventory. If an inventory is small, the corresponding units must occur with a frequency greater than one. In an English sentence with, say 50 characters, some characters will occur repeatedly since the character inventory is smaller than 50. On the other hand, sentences need not be repeated for reasons of inventory size because there is no limited number of sentences in a natural language. The repetition of units from a finite but not too small inventory such as the morpheme inventory of a language may be used as an indicator of creativeness.
- – Grammar. Certain kinds of repetitions are due to the rules of the grammar of a language. Function words occur much more often than content words; they serve grammatical functions, e.g. determiners, prepositions, conjunctions etc. Specific repetitive patterns and distributions concerning function words are commonly used as characteristics of authors for the purposes of stylometrics and forensic linguistics.
- – Thematic bond. The words which belong to the semantic field of the topic presented in a text occur more often than others. Repetitions of this kind can be used to differentiate text sorts. Technical texts will contain, as a rule, more thematically caused repetitions than novels.
- – Discourse functions, such as emphasis.
- – Stylistic, aesthetic factors. Some of the repetitions which can be observed in texts may be due to special textual functions. The author may want to underline some elements or to produce stylistic or poetic effects such as rhythm and euphony.
- – Perseveration. The repetition of textual units can also have very individual causes. It has been observed that repetitions can be due to mental diseases and self-stimulation. This is why repetition patters can be used for psychiatric diagnoses, cf. Mittenecker (1953), Breidt (1973), Möller, Laux, Deister (2009).
- – Information flow. The author of a text wants, while he/she is writing, to make sure that the addressee will be able to understand it. Among other factors, the limitation of the amount of information conveyed per text passage must not exceed a threshold, which varies with the type of reader. A consequence with respect to the text vocabulary is the fact that a constant flow of new words would overstress the reader. Words must be repeated to avoid this stress and also to be able to explicitly discuss difficult concepts.
It is not always clear whether a repetition was intentionally employed or unintentionally. The only person who might be able to give an answer to this question is the author himself (cf. Jakobson 1971). However, we are not so much interested in the intentionality of an observed property but rather in the question whether a given phenomenon such as a repetition pattern can be considered as significant – which means that it is a potential indicator of some meaningful text or author characteristic. Therefore, we will in the first line show ways to (1) construct and apply useful objective measures of text phenomena and (2) to test whether the results of measurements are significant or random. Randomness in this sense means that a result is not guaranteed to be an indicator of a text characteristic. We conduct investigations of repetitions for the following reasons:
- Characterisation of texts by means of parameters (measures, indicators) as taken from established mathematical statistics or specifically constructed ones in individual cases.
- Comparison of texts on the basis of their quantitative characteristics and classification of the texts by the results.
- Research for the laws of text, which control the mechanisms connected to text generation. As a remote aim, the construction of a theory of text consisting of a system of text laws. The final attempt of every possible quantitative text analysis is the construction of a text theory; we must, however, admit that we are still far from a chance to reach this aim. We have been successful in finding and formulating a number of text laws, and this success contributes to our confidence that methods such as those presented in this volume will help to achieve the aims of our demanding program (cf. Altmann 2009).
The first two reasons serve philological purposes, they can be applied in forensic linguistics for author attribution and various other aims, and they are involved in applied psychology and psychiatrics for diagnostic purposes and for the documentation of changes in mental states. The third one is, of course, a matter of pure science but – as experiences in other fields show –is very likely to form the theoretical background for advanced applications.