Background
Successful human interaction often requires monitoring another personâs moods and feelings. In everyday life this is facilitated by the fact that the muscles of our faces allow a wide range of facial expressions, many of which have clear communicative functions. Yet understanding how we recognise facial expressions has proved surprisingly difficult, with longstanding debates and controversies around quite basic questions.
This book provides an overview of the field of facial expression recognition from the perspective that my colleagues and I have developed across many years of research. In this introductory chapter, I set out the general background and some of the main issues we have approached. A complementary overview that places this line of work into a broader context of studies of face and person perception can be found in Bruce and Young (2012).
In part, the lack of rapid progress in understanding facial expression recognition can be seen to reflect the complexity of facial expressions and our skill at decoding them. Quite subtle expressions can be meaningful, and the meaning attributed to a particular expression can be different in different contexts (Russell & Fehr, 1987). Sometimes there are cultural differences too, especially in the meanings of non-emotional expressions (such as nodding or shaking your head to indicate agreement or disagreement) and in the circumstances in which it is considered appropriate to show emotional expressions, which Ekman (1972) called cultural âdisplay rulesâ. Linked to this are differences between spontaneous and posed expressions, such as smiling when you are happy and smiling when you are trying to ingratiate yourself, with corresponding differences in the neurological mechanisms that underlie spontaneous and posed expressions (Rinn, 1984). Moreover, the timings of the muscle movements can themselves be informative; a sudden smile will often mean something different than one that develops more gradually.
Such observations are typical of nonverbal communication. Consider eye gaze direction, which also conveys a great deal about a personâs interests and feelings. Much the same points apply. For example, the cue of establishing mutual gaze by looking into someoneâs eyes can be interpreted as a question, a threat, a romantic interest or a conversational turn-taking signal, depending on the context in which it occurs (Bruce & Young, 2012). In itself, then, eye contact doesnât have a fixed meaning (Kleinke, 1986); a number of factors combine to influence how it is interpreted.
So how can we make progress in understanding such complicated and flexible abilities? One strategy is to identify circumstances in which our perceptions and interpretations are more stable and relatively invariant across individual perceivers, using these as a starting point from which we can build up our knowledge. This is the strategy that my colleagues and I have used for around 30 years to understand facial expression recognition. Our approach has been grounded in observations proposed by Darwin (1872) and elaborated in work by Ekman (1972, 1992).
In his book The expression of the emotions in man and animals, Darwin (1872) distinguished conventional from emotional expressions and put forward the idea that some emotional expressions are common to all humans. He also suggested that unlike conventional expressions these emotional expressions are not arbitrary, but instead reflect evolved signals that often seem to be truncated actions. For example, the facial expression of disgust usually involves wrinkling the nose and pursing the lips or even opening the mouth and protruding the tongue. Darwin pointed out that the origins of disgust can be seen as a response to offensive tastes and smells. From this perspective closing the nostrils would be a way to block out a bad smell, and the lip and tongue movements are linked to the action of spitting out something that tastes bad. Systematic studies investigating the components of disgusted expressions bear out these points (Rozin, Lowery, & Ebert, 1994).
According to Darwin and others since, some facial expressions are thus expli-cable as remnants of behavioural responses to emotionally arousing events. From this biological standpoint, Darwin argued that these facial expressions will be recognised in all cultures across the world, and later studies by Ekman (1972) and many others have strongly supported this opinion. These universally recognised expressions correspond to what Ekman (1992) calls basic emotions, and he suggested a set of characteristics that distinguish basic emotions.
The number of basic emotions that meet Ekmanâs criteria is not large, and there is still discussion about exactly which emotions to include, but most lists of basic emotions include happiness, anger, fear, disgust and sadness. Surprise is often included too (it was in Ekmanâs original set), though it has become clear that its status is more equivocal because one can be pleasantly or unpleasantly surprised, raising questions about whether surprise corresponds to a unique emotional state (Oatley & Johnson-Laird, 1987) and perhaps explaining why facial expressions of surprise are sometimes confused with fear.
A particular strength of Ekmanâs work has been that it was grounded in careful analysis of the muscle movements involved in different expressions, and he developed a Facial Action Coding System (FACS) for this purpose (Ekman & Friesen, 1978). A set of photographs of facial expressions of basic emotions (happiness, sadness, fear, anger, disgust, surprise, and also neutral expressions) posed by a number of different models (the Pictures of facial affect) was published by Ekman and Friesen (1976) and has been used in many subsequent studies. Advantages of the Ekman and Friesen images include the care taken to establish that appropriate muscles were moved in posing each expression (by using instructions like ânarrow your eyesâ rather than âlook sadâ) and the validation of good recognition rates for each image as the intended emotion.
Through this careful manipulation of appropriate muscle movements, Ekman and Friesen were able to avoid problems inherent in simply asking people to pose expressions. Posed expressions can be quite variable, and they can reflect self-consciousness and acting ability or may even involve unnatural types of âtheatricalâ expression.
Of course, by using photographs Ekman and Friesen lost any information that might derive from the timing of the movements themselves. From a purely intuitive standpoint it is initially surprising that photographs of facial expressions should be so well recognised, since it is natural to think that the pattern of movement of the facial muscles will itself convey important information. We need to keep in mind that the fact that expression photographs are recognisable does not mean that movement is unimportant. On the contrary, there is evidence that the timing of facial movements is carefully balanced between the needs of the sender and the intended recipient, even for a facial signal as apparently simple as raising the corners of the mouth in a smile (Leonard, Voeller, & Kuldau, 1991). For expressions that are too subtle to be easily seen in static displays too, a role for patterns of movement has been found; movement can draw attention to small but critical changes (Ambadar, Schooler, & Cohn, 2005). However, the good recognition of photographs of normal intensity basic emotions shows, for these emotions at least, either that the apex of the set of muscle contractions forms a recognisable configu-ration of the facial features or that we are very skilled at estimating the implied motion (Martinez, 2003). Likewise, studies I have been involved with have not found much in the way of differences between moving and static expressions of basic emotions (Johnston, Mayes, Hughes, & Young, 2013; Harris, Young, & Andrews, 2014a). So despite its intuitive appeal, we need to be careful not to overstate the role of movement in facial expression recognition.
Ekmanâs emphasis on the facial muscles was also important because different functions of the human face all require muscular movements. For example, muscles are needed to control the lips, tongue and jaws during speech, to chew food, to sneeze, to blink, to move our eyes or to turn our heads toward a source of sound. Darwin argued that emotional expressions of the face build upon these other kinds of activity rather than involving specific muscles which have developed solely for expression. Consistent with this idea, there are clear similarities in the anatomy of the facial muscles between humans and chimpanzees (Parr, Waller, & Vick, 2007). However, an important study by Waller, Cray and Burrows (2008) may slightly qualify Darwinâs conclusion. Waller et al. (2008) noted that although textbooks of human anatomy describe a precise anatomical arrangement of facial muscles, there are actually individual differences between people in the muscles themselves. Some facial muscles are not present in some individuals and some can be asymmetric â larger or absent on one side of the face. Through a careful anatomical study of the arrangement of facial muscles in 18 cadavers, Waller et al. (2008) showed that those muscles involved in producing the facial expressions of basic emotions were present in all 18 cadavers and only showed minimal asymmetries. This raises the possibility that, although these muscles may well have originally developed for other purposes as Darwin thought, they are now subject to a selection pressure that facilitates communication.
There are good reasons, then, for thinking that facial expressions of basic emotions have an evolutionary background that leads to their being recognised in all cultures across the world. The most famous test of the idea that facial expressions of basic emotions are universal was reported by Ekman (1972), who visited the preliterate culture of the Fore in New Guinea whose inhabitants had not seen photographs, magazines, cinema or television, and had been visited by few outsiders. He asked members of the Fore to demonstrate what their faces would look like in response to story vignettes about emotional scenarios. Despite the fact that they would have had little opportunity to learn about the facial expressions of people from other cultures, and notwithstanding caveats about the validity of posed expressions, Ekmanâs photographs of the Fore show expressions that are easily recognised by a Western viewer.
As well as eliciting posed expressions from members of the Fore people, Ekman (1972) also tested their recognition of facial expressions. This is a tricky thing to do because there are marked cultural differences in the words and concepts people use to describe emotions. Moreover, even when members of the same culture are simply asked to describe or label facial expressions, they can give very variable responses which can themselves require considerable interpretation; how do we decide whether or not two people mean the same thing if one calls an expression âshockâ and the other calls it âfearâ?
To get round these difficulties, Ekman adopted a forced-choice procedure in which people were asked to assign photographs to one of a fixed number of basic emotion categories, and he again used short story vignettes to make clear what each category entailed. With such precautions, the work of Ekman and many others has shown reasonably good recognition of facial expressions of basic emotions in nearly all cultures (Izard, 1971; Biehl et al., 1997). Nonetheless, findings of universal recognition of facial expressions of emotion have not gone unchallenged â largely on two main grounds. First, universal recognition seems mainly to be found with the particular methods Ekman used, including forced-choice responses, stories to back up the response categories and careful selection of target photos (Russell, 1994). Second, even when such methods are used, there are cultural differences in overall recognition rates. In particular, participants are more accurate at recognizing emotions expressed by members of their own cultural group, and the size of this âown-culture advantageâ reflects relative geographical distance and cultural contact (Elfenbein & Ambady, 2002, 2003). These challenges seem, though, to miss the main point, which is that with appropriately careful testing there is evidence of an impressive degree of commonality across cultures in the interpretation of certain emotions. In making this point, Ekman has never sought to deny the richly diverse contributions from culture and upbringing.
Consider, for example, the own-culture advantage in recognising facial expressions. This might arise because universality has been overestimated and people from different cultures use different facial cues to represent expressions (Jack, Blais, Scheepers, Schyns, & Caldara, 2009; Jack, Caldara, & Schyns, 2012), or it might simply be the case that cultural stylistic differences can be superimposed on a common underlying pattern in much the same way that speakers of the same language may use different accents. We recently investigated this possibility by examining similarities and differences in the perception and categorisation of facial expressions between Chinese and white British participants (Yan, Andrews, & Young, in press). The perceptual task involved rating the degree of similarity in expression between pictures of facial expressions of same or different emotions. This task was used to generate a matrix of perceived similarities between exemplars of facial expressions of the five basic emotions for Chinese and for Western participants, and is equivalent to the kind of analysis used to create well-known perceptual models such as Russellâs circumplex (Russell, 1980). The categorisation task involved forced-choice recognition of basic emotions from the same images as were used in the perceptual similarity task. Our results showed no cultural difference in the patterns of perceptual similarity of expressions, indicating that participants from these very different cultural backgrounds see the expressions in t...