Part One
Behavioural Experiments beyond the Questionnaire
1
Experimental Philosophy and Statistical Learning
Shaun Nichols
Introduction
Much experimental philosophy aims to uncover the processes and representations that guide judgements about philosophically relevant issues. This is the agenda in discussions about whether people are incompatibilists about free will (e.g. Murray and Nahmias 2014), whether moral judgement is driven by distorting emotions (e.g. Greene 2008) and whether judgements about knowledge are sensitive to irrelevant details (e.g. Swain et al. 2008). Much less attention has been paid to historical questions about how we ended up with the processes and representations implicated in philosophical thought. There are different kinds of answers to these historical questions. One might offer distal answers that appeal to the more remote history of the concept. For instance, an evolutionary psychologist might argue that some of our concepts are there because they are adaptations. Or a cultural theorist might argue that some of our concepts are there because they played an important role in facilitating social cohesion. On the proximal end of things, we can attempt to say how the concepts might have been acquired by a learner. Those proximal issues regarding acquisition will be the focus in this paper.1 Recent cognitive science has seen the ascendance of statistical learning accounts which draw on statistical learning to explain how we end up with the representations we have (Perfors et al. 2011). In this chapter, Iāll describe how this approach might be extended to explain the acquisition of philosophically relevant concepts and distinctions.
1. Background on statistical learning
Accounts of acquisition in terms of statistical learning include not only very sophisticated statistical techniques from machine learning, but also humble and familiar forms of statistical inference. Imagine youāre on a road trip with a friend and you have been sleeping while he drives. You wake up wondering what state youāre in. You notice that most of the licence plates are Kansas plates. You will likely use this information to conclude that you are in Kansas. This is a simple form of statistical learning. You are consulting samples of licence plates (the ones you see) and using a principle on which samples (in this case, of licence plates) reflect populations. This, together with the belief that Kansas is the only state with a preponderance of Kansas plates, warrants your new belief that you are in Kansas.
Early work on statistical reasoning in adults indicated that people are generally bad at statistical inference (e.g. Kahneman and Tversky 1973). But over the last decade, work in developmental and cognitive psychology suggests that children actually have an impressive early facility with statistical reasoning (Girotto and Gonzalez 2008; Xu and Garcia 2008; Xu and Denison 2009; Fontanari et al. 2014). In tandem with this, computational psychologists have offered statistical learning accounts for a wide range of representations and processes including categorization (Smith et al 2002; Kemp et al. 2007), word learning (Xu and Tenenbaum 2007) and causal judgement (Bramley et al. 2017). I will now review some key characteristics of a statistical learning account of acquisition, and then present philosophical applications that draw on several statistical learning principles from this growing body of literature.
2. Theories of acquisition
A theory of acquisition involves many components, and itās helpful to pull them apart. We will be considering accounts of acquisition that are, broadly speaking, empiricist. In this section, Iāll say a little bit about the empiricist/nativist debate give a sketch of the structure of statistical learning accounts.
2.1. Empiricism vs nativism
In the contemporary idiom of cognitive science, the debate between empiricists and nativists is all about learning (see e.g. Laurence and Margolis 2001; Cowie 2008). Take some capacity like the knowledge of grammar. How is that knowledge acquired? Empiricists about language learning typically maintain that grammatical knowledge is acquired from general purpose learning mechanisms (e.g. statistical learning) operating over the available evidence.2 Nativists about language learning maintain instead that there is some domain-specific learning mechanism (e.g. a mechanism specialized for learning grammar) that plays an essential role in the acquisition of language. In the case of grammatical knowledge, debate rages on ā e.g. Perfors et al. (2011) against Berwick et al. (2011). But itās critical to appreciate that there is some consensus that for certain capacities, an empiricist account is most plausible, while for other capacities, a nativist account is most plausible.
On the empiricist end, research shows that infants can use statistical evidence to segment sequences of sounds into words. The speech stream is largely continuous, as is apparent when you hear a foreign language as spoken by native speakers. So how can a continuous stream of sounds be broken up into the relevant units? In theory, one way that this might be done is by detecting ātransitional probabilitiesā: how likely it is for one sound (e.g. a syllable) to follow another. In general, the transitional probabilities between words will be lower than the transitional probabilities within words. Take a sequence like this:
happy robin
As an English speaker, you will have heard āPEEā following āHAPā more frequently than ROB following PEE. This is because āHAPPEEā is a word in English but āPEEROBā isnāt. This sort of frequency information is ubiquitous in language. And it could, in principle, be used to help segment a stream into words. When the transitional probability between one sound and the next is very low, this is evidence that there is a word boundary between the sounds. In a ground-breaking study, Jenny Saffran and colleagues (1996) used an artificial language experiment to see whether babies could use transitional probabilities to segment a stream. They created four nonsense āwordsā:
pabiku tibudo golatu daropi
These artificial words were strung together into a single sound stream, varying the order between the words (the three orders are depicted on separate lines but are seamlessly strung together in the audio):
pabikutibudogolatudaropi
golatutibudodaropipabiku
daropigolatupabikutibudo
By varying the order of the words, the transitional probabilities are also varied. Transitional probabilities between syllabus pairs within a word (e.g. bi-ku) were higher than between words (e.g. pi-go) (p=1.0 vs p=.33). After hearing 2 continuous minutes of this sound stream, infants were played either a word (e.g. pabiku) or part word (e.g. pigola). Infants listened longer (i.e. showed more interest) when hearing the part word, which indicates that they were tracking the transitional probabilities. This ability to use statistical learning to segment sequences isnāt specific to the linguistic domain. It extends to segmenting non-linguistic tones (Saffran et al. 1999) and to the visual domain (Kirkham et al. 2002). Perhaps humans have additional ways to segment words, but at a minimum, there is a compelling empiricist account of one way that we can segment streams of continuous information into parts using statistical learning.
Nativists can claim victories too, however. Birdsong provides a compelling case. For many song birds, like the song sparrow and the swamp sparrow, the song they sing is species-specific. Itās not that the bird is born with the exact song it will produce as an adult, but birds are born with a ātemplate songā which has important elements of what will emerge as the adult song. One line of evidence for this comes from studies in which birds are reared in isolation from other birds. When the song sparrow is raised in isolation, it produces a song that shares elements with the song the normally reared adult song sparrow; similarly, an isolated swamp sparrow produces a song that shares elements with normally reared swamp sparrows. Critically, the song produced by the isolated song sparrow differs from the song produced by the isolated swamp sparrow. This provides a nice illustration of a nativist capacity. Itās not that experience plays no role whatsoever ā the specific song that the bird produces does depend on the experience. But there is also an innate contribution that is revealed by the song produced by isolate birds. The template gives the bird a head start in learning the appropriate song (see, e.g. Marler 2004). We can also cast the point in terms of a poverty-of-the-stimulus argument. The evidence in the stimulus is not adequate to explain the isolate songs of the two kinds of sparrows. If the evidence in the stimulus were adequate, then we should find that song and swamp sparrows produce the same song if they are provided with the same evidence. The fact that they produce different songs shows that there is something contributed by the organism that is not present in the stimulus.
The examples of bird song and segmentation of acoustic strings show that itās misguided to think that there is a general answer to the nativist/empiricist debate. The debate needs always to be focused on particular capacities. The examples Iāll give in Sections 3 and 4 are all empiricist learning stories for specific pieces of knowledge, based on principles of statistical inference. Letās now turn to the characteristics of such learning accounts.
2.2. A schema for statistical learning accounts of acquisition
Suppose we want to argue that some concept was acquired via statistical inference over the available evidence. Several things are needed.
1. The first task is to describe the concept (belief, distinction, etc.) the acquisition of which is to be explained. We can call this target the acquirendum (Pullum and Scholz 2002). Part of the work here will be to argue that we do in fact have the concept or distinction or belief that is proposed as the acquirendum (A).
2. Insofar as statistical learning is a form of hypothesis selection, one needs to specify the set of hypotheses (S) that the learner considers in acquiring A. This set of hypotheses will presumably include A as well as competing hypotheses.
3. One will also need an empirical assessment of the evidence (E) that is available to the learner. For this, one might consult, inter alia, corpora on child-directed speech.
4. One needs to articulate the statistical principles (P) that are supposed to be implicated in acquiring the concept. These principles should make it appropriate for a learner with the evidence E and the set of hypotheses S to infer A. This step will thus provide an analysis of how the principle would yield the concept given certain evidence.
5. Finally, a complete theory of acquisition would tie this all together by showing that the learner in fact does use the postulated statistical principle P and the evidence E to select A among hypotheses S.
Few theories of acquisition manage to provide convincing evidence for all of these components. Furthermore, the last item is well beyond what most learning theorists hope to achieve. In place of this daunting demand, a learning theorist might aim for a weaker goal ā to show that learners are capable of using the relevant kind of evidence to make the inferences that would be appropriate given the postulated statistical principles. That is, instead of trying to capture the learnerās actual acquisition of the concept, one might settle for something a good deal weaker:
5*. Show that the learner is appropriately sensitive to the evidence; that is, given evidence like E, she makes inferences that would be appropriate if she were deploying the postulated statistical principles P.
3. A first application: āKnowledgeā
The foregoing is a recipe in a sense, but the critical ingredient is coming up with a specific idea for how principles of statistical learning might explain some aspect of our philosophical outlook. In this section, Iāll explain how the method has been applied to a core issue in epistemology and in the next section, Iāll review four different aspects of moral cognition that we have attempted to explain using statistical principles. The reviews will be rather brief, but the idea is to give a sense of the range of possibilities, not a detailed exposition of any one.
Acquirendum
Itās an old idea in epistemology, stretching back to Plato, that knowledge demands infallibility. Infallibilism is characterized in different ways, but one intuitive characterization is modal: if S knows that p on the basis of his evidence only if, given the evidence, Sās belief that p could not have been wrong.3 In contemporary discussions, several philosophers have suggested that infallibilism is supported by the fact that itās odd to say things like āHe knows that itās raining, but he could be wrong about thatā (see e.g. Rysiew 2001; Dodd 2011). This forms an acquirendum: how do we acquire an infallibilist notion of knowledge. Ćngel Pinillos and I have recently offered a statistical learning explanation for the acquisition of an infallibilist notion of knowledge (Nichols and Pinillos forthcoming).
Hypotheses
We suggest that when a learner is trying to acquire the concept knowledge, the competing hypotheses will include the following: (1) the possibility that knowledge is true belief, (2) the possibility that knowledge is true belief with justification of some sort but...