Chapter 1
Re-thinking Variable Properties in Language: An Introduction
DAVID W. LIGHTFOOT AND JONATHAN HAVENHILL
Georgetown University
Invariant Principles, Their Successes
The Biolinguistic Program reflects the work of many people from many countries analyzing many very different grammars, discovering a huge range of interesting, abstract properties. A āgrammarā is what we used to call the formal, generative system that characterizes a personās mature language faculty, which is represented in an individualās mind/brain; this is now often referred to as an internal, individual or āI-language.ā Grammars, I-languages, are subject to general, restrictive principles that appear to be common to the species and have been discovered over several decades.
Rich, invariant principles have emerged, often in response to arguments from the poverty of the stimulus; such principles, defined universally, bridge the gap between information conveyed by a childās typically very limited experience and the rich information that characterizes mature grammars. Other methods have been used, but invariant principles, it was postulated, are available to children through their biology, attributes of their genetic material. The principles explain how simple experiences can trigger rich structures in the biological grammars of some form of Japanese or of Javanese. Understanding the successful invariant principles and the thinking behind them illuminates how we might gain new approaches to variable properties, where linguists have been conspicuously less successful.
Over the past two decades, under the Minimalist Program, linguists have sought to simplify the principles, āminimizingā the information they embody. Part of the motivation is the legacy of William of Occamās simplicity in theorizing, always seeking simpler and more beautiful analyses, and part is to provide a plausible biological account whereby we might attribute the evolution of the language faculty in the species to a single mutation. Invariant computational operations of Project and (internal and external) Merge build hierarchical structures from the bottom up, which combine heads and complements, phrasal categories and adjuncts; this applies for all languages. A repeatable operation assembles two syntactic elements a and b into a unit, which may, in turn, be merged with another element to form another phrase, and so on. That universal, invariant property raises the prospect that the option of Merge was the mutation that made language and thought possible for Homo sapiens. Berwick and Chomsky (2016) showed why such a view might be productive and elicited a judicious and informed review from paleoanthropologist Ian Tattersall (2016).
Thinking in terms of hierarchical structures resulting from Minimalist computational operations has also informed remarkable neuroscientific work linking brain activity to the structural units underlying language and thought in novel ways. Ding et al. (2016) showed that when people listen to connected speech, cortical activity of different timescales tracks the time course of abstract structures at different hierarchical levels, such as words, phrases, and sentences. Results indicate that āa hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structureā; David Poeppel and colleagues have found neural activity that reflects directly the abstract structures that linguists have postulated for the infrastructure of language, needed to account for the way that expressions are understood and used. See also Nelson et al. (2017). We always knew that the brain would have to have a mechanism for encoding the abstract structures of different levels, and now we know what the mechanism is, a huge achievement.
Minimalist ideas about hierarchical structures being formed by multiple applications of Merge not only help us think differently about the evolution of the language faculty and of thought in the species and stimulate new neuroscientific work, but they have also facilitated new approaches to the acquisition of language by children. The hierarchical structures formed by multiple applications of Merge constitute the means by which people, including very young children, begin to analyze and āparseā what they hearāthe key component of the acquisition process. We may now be at the point where we can dispense with independent parsing principles or procedures, and, given the way in which hierarchical structures are built, we might argue that assigning structures to expressions is simply a matter of using the binary branching structures that Universal Grammar (UG) makes available and the structures that children discover/invent as they parse the ambient external language they hear (E-language); under this view, parsing is not a function of nonlinguistic conditions on structures permitted in āworking memoryā and so forth. There is an interplay between E-language, which is parsed, and I-languages, which result from parsing.
Work has shown repeatedly that children rely on the tools provided by their biology and learn much from very little experience. Research has examined language acquisition by children exposed only to unusually restricted data, much of the work focusing on the acquisition of signed systems. A striking fact is that 90 percent of deaf children are born to hearing parents, who are normally not experienced in using signed systems and often learn a primitive kind of pidgin to permit rudimentary communication. In such contexts, children surpass their models readily and dramatically and develop effectively normal mature capacities, despite the limitations of their parentsā signing (Newport 1998; Hudson Kam and Newport 2005; Singleton and Newport 2004).
This is not surprising in light of studies of Creoles more generally (Aboh 2017), and of new languages beyond Creoles, which show that children exposed to very limited experiences go well beyond their models in quickly developing the first instances of rich, new I-languages (Lightfoot 2005, 2006). Not much is needed for a rich capacity to emerge, as demonstrated by many contributors to Piattelli-Palmarini and Berwick (2013) and now by Belletti (2017). Belletti offers a new kind of poverty-of-stimulus argument, showing that children sometimes overextend certain constructions, using them much more freely than their adult models, hence creatively.
Extraordinary events have cast new light on these matters: the birth of new languages in Nicaragua and in the Bedouin community in Israel. In Nicaragua the Somoza dictatorship treated the deaf as subhuman and barred them from congregating. Consequently, deaf children were raised mostly at home, had no exposure to fluent signers or to a language community, were isolated from each other, and had access only to home-signs and gestures. The Sandinistas took over the government in 1979 and provided a school where the deaf could mingle, soon to have four hundred deaf children enrolled. Initially the goal was to have them learn spoken Spanish through lip reading and finger spelling, but this was not successful. Instead, the schoolyard, streets, and school buses provided good vehicles for communication and the students combined gestures and home-signs to create first a pidgin-like system, then a kind of productive Creole, and eventually their own language, Nicaraguan Sign Language. The creation of a new language community took place over only a few decades. This may be the first time that linguists have witnessed the birth of a new language ex nihilo and they were able to analyze it and its development in detail. Kegl, Senghas, and Coppola (1998) provide a good general account and Senghas, Kita, and Ćzyürek (2004) examine one striking development, whereby certain signs proved to be not acquirable by children and were eliminated from the emerging language.
Sandler, Meir, Padden, and Aronoff (2005) discuss the birth of another sign language among the Bedouin community, which has arisen in ways similar to Nicaraguan Sign Language and was discovered at about the same time. These two discoveries have provided natural laboratories to study the capacity of children exposed to unusually limited linguistic experience to go far beyond their models and to attain more or less normal mature I-languages.
If successful language acquisition may be triggered by exposure only to very restricted data, then perhaps children learn only from simple expressions. They only need to hear simple expressions, because there is nothing new to be learned from complex ones. This is ādegree-0 learnability,ā which hypothesizes that children need access only to unembedded material (Lightfoot 1989). Such a restriction would explain why many languages manifest computational operations in simple, unembedded clauses, which do not appear in embedded clauses (e.g., English subject-inversion sentences like Has Kim visited Washington? but not comparable embedded clauses *I wonder whether has Kim visited Washington), but no language manifests the reverse, operations that appear only in embedded clauses and not in matrix clauses. One explanation for this striking asymmetry is that children do not learn from embedded domains. Therefore, much that children hear has no consequences for the developing I-language; nothing complex triggers any aspect of I-languages.
Parameters, Their Problems
Postulating hierarchical linguistic structures formed by a simple Merge operation has yielded new understanding of the invariant properties of language and has generated an immensely fruitful research program, bringing explanatory depth to a wide range of phenomena, indeed discovering a huge range of properties (den Dikken 2012). However, a hallmark of human language, alongside its invariant properties, is its VARIATION. The environmentally induced variation that one finds in language is biologically unusual, not what one sees in other species or in other areas of human cognition, and requires a biologically coherent treatment. Children attain significantly different internal languages, depending on whether they are raised in contexts using some form of Swedish or a kind of Vietnamese. English-speakers in seventeenth-century London typically acquired different grammars from those acquired three generations earlier. Furthermore, people speak differently depending on their class background, their geography, their interlocutors, their mood, their alcohol consumption, and other factors.
For a good biological understanding, variation in grammars, I-languages, needs to take its place among other types of variation. This is an area where we have made much less progress than with invariant properties, and it is now clear that there needs to be new thinking. Chomsky (1981) initiated the Principles-and-Parameters approach, seeking to find a Universal Grammar with both invariant principles and a set of formal parameters that children were thought to set on exposure to Primary Linguistic Data. For four decades, linguists have been postulating parameters, ideally binary parameters (either structure a or structure b), but no real, general theory has emerged and genuinely binary parameters are scarce. Minimalists set on reducing the complexities of the invariant principles that had emerged by the mid-1990s have not devoted equivalent efforts to minimizing the complexities of UG parameters nor to giving an account of how parameter settings might be acquired by young children.
Linguists study variation in silos: syntacticians studying parameters have little to do with sociolinguists studying variable rules, and proponents of variable rules do not interact much with Optimality theorists studying constraint reranking. Indeed, Minimalists have devoted little attention to variation and acquisition (the two go together: variable properties must be acquired by children during development, whereas invariant properties may be provided in advance by UG and not need to be acquired); UG parameters (macro- and micro-) grossly violate Minimalist aspirations to minimize information at UG. Indeed, Chomsky (2001) invoked a Uniformity hypothesis, abstracting away from variation: āIn the absence of compelling evidence to the contrary, assume languages to be uniform, with variety restricted to easily detected properties of utterancesā (our emphasis, DWL-JH). Hornsteinās (2009) effort to transform the Minimalist Program to a Minimalist Theory has essentially no discussion of variation or acquisition, apart from a four-page discussion of the history of parameters (164ā68). Boeckx (2015) went further and sought to eliminate lexically determined features on the remarkable grounds that āthey are obstacles in any interdisciplinary investigation concerning the nature of language,ā as if linguists should deal only with analytical machinery invoked by biologists.
Difficulties with parameters are aggravated by the absence of an adequate account of which Primary Linguistic Data set which parameters. It is often supposed that children evaluate candidate grammars by checking their generative capacity against a global corpus of data experienced, converging on the grammar that best generates all the data stored in the memory bank. But that entails elaborate calculations by children and huge feasibility problems (Lightfoot 2006, section 4.1).
Even considering a small number of parameters, the problems become clear. If parameters are independent of each other, forty binary parameters entail over a trillion possible grammars, each generating an infinite number of structures. Parameters, of course, are not always independent of each other, and therefore the number of grammars to be evaluated might be somewhat smaller. On the other hand, Longobardi et al. (2013) postulate fifty-six binary parameters just for the analysis of noun phrases in Indo-European languages, which would suggest much larger numbers. On any account, the relevant numbers are astronomical.
There are other conceptual problems with viewing children as evaluating the generative capacity of numerous grammars, calculating how best to match the input experienced and setting binary parameters accordingly (see also Boeckx 2014; Lightfoot 2006; Newmeyer 2017). Certainly, given the abstractness of grammars, it will not do to suppose that triggers for elements of I-languages are sentences that those elements serve to generate. It may be opportune to consider other approaches.
We have proposed that rather than evaluating the generative capacity of grammars and setting formal parameters, children are born to parse, endowed with the tools provided by a restrictive and Minimalist Universal Grammar and no specific parsing principles (Lightfoot 2017). They parse what they hear and use the structures necessary to do so, thereby discovering or inventing them, in principle one by one. Those structures are part of the childās emerging I-language and, in aggregate, they constitute the mature I-language. Such an approach enables us to understand how children develop their internal system and how those systems may change from one generation to another, as revealed by work on historical change in syntactic systems. After all, all syntactic variation must originate in change (Lightfoot 2018).
An Alternative
In short, a common approach is to think of I-languages as consisting of invariant properties and a set of formal parameter settings; children evaluate numerous grammars against a corpus of data experienced and flick on/off switches on binary parameters, rating the generative capacity of I-languages in the fashion of Clark (1992). A better alternative might be to think of internal languages as consisting of invariant properties over a certain domain plus supplementary structures that are not invariant but required in order to parse what children hear, consistent with the invariant properties. Put differently, children parse the external language they hear, assigning to expressions structures provided by Merge, and discover/invent specific I-language elements required for certain aspects of the parse. To do so, they make use of what UG makes available, notably the bottom-up procedures of Project and Merge. The aggregation of those parsed elements constitutes the ...