1.1 Introduction
Berwick and Chomsky (2015) raise the question of why it is only us, Homo sapiens, who can speak language, think in language, and make use of language for all sorts of purposes. Their answer is essentially that it’s because we are the only species that has Merge. Merge is, it is assumed, a recursive generative procedure of the human faculty of language that takes n (typically two) objects, Σ1, …, Σn, and forms an unordered set, {Σ1, …, Σn}.
- (1) Merge(Σ1, …, Σn) = {Σ1, …, Σn}
The contemporary theory of bare phrase structure (Chomsky 1994 et seq.) hypothesizes that virtually every phrase structure of human language is uniformly generated by Merge, applied in an unbounded fashion. Structural representations generated by Merge are then variously utilized by human performance systems, notably by the Conceptual-Intentional (CI) system and the Sensorimotor (SM) system, thereby leading to various mental capacities of human cognition. According to Berwick and Chomsky, Homo sapiens is a unique organism, essentially by virtue of Merge and its various cognitive applications.
Now, we would like to pose the following question: Why only Merge?
The question is ambiguous, interpretable at least in three different manners.1 So, more specifically, we would like to ask:
| Question 1: | Why only Merge?—Why do linguists, Berwick and Chomsky, among others, want to claim that it is Merge that instantiates the only distinctive property of human language? What is supposed to be the empirical gain from this proposal? |
| Question 2: | Why only Merge?—Why do they want to claim that Merge is the only species-specific property, disregarding other cognitive capacities of human beings, which may or may not be as innate, innovative, and/or specific? |
| Question 3: | Why only Merge?—Why is it only Merge, and not any other devices, that has the properties it does and that was allowed to emerge by virtue of the evolution-development of human beings? That is, why is Merge as it is rather than any other way? |
1.2 Only Merge, because it is minimally necessary to capture the Basic Property of human language
We will first address Question 1. The answer to this question is related to a longstanding observation about the ordinary use of human language.
Thousands of years of linguistic investigation share the belief, commonly ascribed to Aristotle, that the grammar of natural language is in essence a system of pairing “sound” (or “signs”) with “meaning.” The enterprise of generative grammar initiated by Chomsky (1955/1975, 1957) is just a recent addition to this long tradition, but it provided a couple of important insights into the nature of human language that have revolutionized the perspective from which we study human language. At the core of this Chomskyan revolution lies an old observation, essentially due to Descartes and other rationalists, that the capacity to pair sounds and meanings in human language exhibits unbounded creativity: humans can produce and understand an infinitude of expressions, many of which are previously unheard of or are too long and/or senseless to be produced. This Cartesian observation can be called the “Basic Property” of human language, to adopt Chomsky’s (2015a:3, 2015b:4, etc.) formulation.
To illustrate, consider a simple declarative sentence in English, The boy read the book. A speaker of English can understand that this sentence has an internal structure consisting of two major constituents, the noun phrase (NP) the boy and the verb phrase (VP) read the book, and that these constituents are further decomposed into smaller units (as in: VP = [V read] [NP the book], etc.). Based on the hierarchical structure, s/he can also assign phonological and semantic interpretations to the expression: thus, s/he understands that the noun boy is the agent performing the action denoted by the verb read, that the definite article the induces a presupposition that there exists a contextually salient unique boy in the discourse, and so on. These interpretive properties are somehow encoded at the respective SM and CI interfaces, PHON and SEM. Moreover, such an expression can be infinitely expanded, for example, by adding optional adjuncts of various types (2), coordinating its constituents (3), or embedding it into another expression (4), showing various sorts of unbounded generation.
- (2)
- the boy (often) (eagerly) read the book (carefully) (quickly) (at the station) (at 2pm) (last week) …
- the (smart) (young) (handsome) … boy (who was twelve years old) (who Mary liked) (whose mother was sick) … read the book.
- (3)
- the boy read the book (and/or/but) the girl drank coffee (and/or/but) …
- [the boy (and/or/but not) the girl (and/or) …] read the book.
- (4)
- I know that [the girl believes that [it is certain that … [the boy read the book] …]]
- The boy [(that/who) the girl [(that/who) the cat […] bit] liked] read the book.
In this manner, human language yields an infinite number of SEM- and PHON-interpretable hierarchical structures. Let us call this observation the Basic Property of human language, (Chomsky 2015a:3, 2015b:4, etc.).
It is hypothesized by Chomsky (2004 et seq.) that the Basic Property essentially follows from the recursive application of Merge, understood as a simple set-formation operation that takes n (typically two) objects Σ1, …, Σn and forms a set, {Σ1, …, Σn}. If applied to two elements, say, the and book, Merge generates the set {the, book} in (5a). For expository purposes, we will occasionally represent the output sets of Merge with familiar tree diagrams. Thus, the set in (5a) can be represented as (5b), though it should be borne in mind that, strictly speaking, tree representations may not be equivalent to the set notation. In particular, order is irrelevant in such diagrams.2