Part I
Introduction
1
Doing Psychology in an AI Context: A Personal Perspective and Introduction to This Volume
Robert R. Hoffman
The reason that this book exists is because trees are hot in winter infrared photography. Over my Christmas holiday in 1979 I visited a close friend, physicist Walter Carnahan of Indiana State University. I know him to be a specialist in optics and particle physics, which is why I was perplexed when I found him pondering a computer graphic display which merely showed a crazy patchwork of colors. The display seemed to have nothing to do with particle physics, so I figured he had a new software toy or something. But no, this was Science.
âThis is a thermogram,â he said, âtaken over Terre Haute, Indiana, from about a thousand feet. In winter. Itâs part of an energy conservation project.â
Well, I knew what a thermogram was, and although the false-color coding scheme was new to me, I tried my hand at interpretation. âIs this green blob a tree?â I asked.
âNo, itâs a house. Actually, one with pretty good insulation.â
âOh,â was the only word I could muster. But pointing to another region I asked, âSo is this yellow blob an uninsulated house?â
âNo, thatâs a tree. Trees are about the hottest things around in winter photography.â
The color code, displayed off to one side of the image, seemed to be something like a rainbow, with warmer temperatures being coded as white to yellow to red, cooler temperatures as greens, and cold as blue to black. But it wasnât a perfect rainbow. The hottest white appeared slightly pinkish. The yellows were just awful, and interspersed among the darker blues was a sort of olive-gray color.
âThere are a number of things called âinterpretation anomaliesâ in thermography,â Walter explained. âIf a tall building has lots of windows that leak heat, then looking down on it, all the whites and yellows make it look as if the sides are ballooning out at you.â Later at his home, his family and I had a chance to play with a portable infrared camera, one that enabled us to fiddle with the color coding. (I use the word play advisedly. Not only are such cameras expensive, but they have to be primed with liquid nitrogen.) We performed all manner of experiments, but just simply watching a person through the camera was like nothing Iâd ever seen before. The closest I can come to describing the dynamic panoply is that itâs like looking at a world of ghosts painted in a Peter Max psychedelic of flowing colors. Ghostlike residues of human activity appear everywhere, such as handprints left glowing on doors. Especially odd were the infrared reflections. For example, a candle could be seen reflected, not in a mirror or window, but in an ordinary wall! Infrared affords a very unusual way of looking at the world.
Fortunately, the experimental psychologist in me is not concerned about making a fool of himself by saying something stupid in the presence of a person who knows tensor calculus. So I asked, âIn your aerial thermograms, who decided how to code the different temperature ranges in the different colors?â
Walter had to think about that for a moment ⌠and he then said, âI donât know. Probably some engineer up in Michigan where they have the plane and the IR camera and the computers that process the data.â
At that moment I had an intuition that the field called remote sensing might be fertile ground for human factors research. That possibility intrigued me because remote sensing displays are often beautiful, and I love beautiful things. Furthermore, Iâd long wanted to somehow mix my work as experimental psychologist with my love for things that are related to space travelâIâm a science fiction nut.
The first step was to learn about remote sensing. The parent discipline of remote sensing is called aerial photo interpretation, in which aerial black-and-white photographs are interpreted in an elaborate process called terrain analysis. In 1983 I applied for a summer grant to learn from the masters of this craft, the experts at the U.S. Army Engineer Topographic Laboratories (ETL) at Fort Belvoir, Virginia. Their response to my application was basically: âGlad you applied. It turns out, we need an experimental psychologist to help us figure out ways of extracting the knowledge of our expert image interpreters ⌠because we want to build expert systems.â
I thought to myself, âHuh?â At that time the hype about expert systems was just building up steam, and I knew next to nothing about them. Well, at the ETL I managed to learn about both remote sensing and about expert systems. Eventually, my intuitions about applying experimental psychology to remote sensing bore some fruit (Hoffman, 1990, 1991; Hoffman & Conway, 1989). With regard to expert systems, I discovered that no one was doing anything about the âknowledge acquisition bottleneckâ problem.
Expert system developers had learned that building a prototype expert system (with, say, 100 or so rules and perhaps some dozens of core concepts), and getting the prototype to actually do something, can take anywhere from weeks to perhaps a few months. But specifying the knowledge that is to be captured in the systemâs knowledge base and inference engine can take week after week of exhausting interviews. This knowledge elicitation effort must be conducted before any system is built, and therefore it takes the system developers away from what they really like to doâprogram computers.
At the time, summer of 1983, there was a large literature of case study reports (conference papers, journal articles, edited volumes) on various peopleâs experiences while developing expert systems. A number of review articles had also appeared. Looking across the reports, many (most?) developers of expert systems had taken it for granted that knowledge is to be elicited through unstructured interviews (see Cullen & Brymanâs 1988 survey). Many system developers apparently did not even conceive of the idea that an interview can be âstructuredâ beforehand, or that adding structure could make the interviewing more efficient overall. In retrospect, it is not surprising that knowledge acquisition had become a bottleneck.
A second general feature of the literature is that it was (and still is) chock-full of inconsistent advice about knowledge elicitation. Some researchers recommend interviews with small groups; others say that such interviews can be disastrousâbecause disagreements with experts can cause serious problems for the system developer. Some recommend using âtest casesâ to probe for knowledge; others recommend the use of questionnaire ratings scales. Some say that experts are often pompous and defensive; some say they are usually not. Some say that experts are often unable to verbalize their knowledge; some say this is rarely a significant problem. And so on, each researcher generalizing from, and swearing by, his or her particular experience, with no hard data as back up.
I read that some computer scientists even recommended creating âautomated knowledge acquisitionâ or interviewing tools. At the time, this struck me as being somewhat self-serving, if not mere overkill. These people were saying, basically, âWe canât just jump in and start programming the expert system that we started out to build, so letâs take one step back and build yet another program, one that can do the interview for us!â I felt that there must be simple, flexible solutionsâhuman ones. Experimental psychologists have been studying learning for over 100 years. They are old hands at bringing people into the lab; teaching them some stuff (like lists of words or sentences); and then testing their memorial, perceptual, and cognitive processes. âSurely,â I thought, âone could tinker with an expertâs usual task, turning it into what psychologists call a âtransfer design.â With that, one could pull out an expertâs knowledge, and in the long run do so more easily than by stepping back and writing a whole additional program just to conduct interviews!â
Perhaps. But at the time, all of the pronouncements in the literature on expert systems had to be taken on faith. This included the views of philosophers, both pro-AI and anti-AI, who seem rarely to get their hands dirty in either programming or empirical research. Most unfortunate of all was the fact that people in the AI community, both academics and nonacademics, rarely regarded the issues they encountered as problems that begged for an empirical solution (Mitchell & Welty, 1988).
The most frustrating manifestation of the nonempirical outlook was that everyone talked about the knowledge elicitation bottleneck, but nobody ever did anything about it. To my amazement, no one reported any informative details about the methods that they actually used in their knowledge elicitation sessions. Indeed, something like a Methods section never appears in reports on expert system development projectsâunless method is equated with system architecture. Typically, the first paragraph of a report on an expert system development project mentions that knowledge elicitation was a problem, but then the second paragraph launches into an omnivorous discussion of the system architecture and its remarkably clever qualities and memory search algorithms.
Enter the experimental psychologist. Experimental psychologists love to get their hands dirty in the nits and picks of behavioral data. Such dirt is shunned by the hands of programmers. Not only that, experimental psychologists seem to think differently from programmers. For example, the idea of various kinds of âstructured interviewâ comes so naturally to experimentalists that it is perhaps one of the most reinvented of wheels. There is a copious social science literature on interviewing techniques, the classic work being decades old. In general, the idea of holding off on programming or modeling in order to design and conduct the necessary background psychological researchâthis is something that comes naturally to the experimental psychologist who finds himself or herself in a computational context.
These words come easily now, but at the time I was conducting my original research this was all a bit scary. Iâd been spiritually prepared to dive into applied contexts by one of my mentors, James Jenkins, when I was a postdoc at the Center for Research on Human Learning at the University of Minnesota. But to suddenly be a fish out of waterâŚ.
The library available to me at the ETL was full of information about remote sensing, of course. But, as one might also expect, it was a bit slim in its psychology selection. So I had to begin at the beginning and trust some of my intuitions. Fortunately, not all of the wheels I ended up with were reinventions.
First, I generated a functional (i.e., cognitively based) scheme for classifying various alternative knowledge elicitation methods, by incorporating ideas from the psychologistâs learning and problem-solving laboratory as well as methods that had been used by developers of expert systems. I also generated some measures that would permit comparisons of knowledge elicitation methods. (To do so, I had to rethink the usual definition of data âvalidity,â and frankly, that wasnât terribly easy.) The next step was to apply these preliminary ideas in a number of studies of expertsâthe ETL expert interpreters of remote sensing images and U.S. Air Force experts at airlift planning and scheduling. This work was reported in The AI Magazine in the summer of 1987.
It was about then that I learned of research being conducted at the University of Nottingham by Michael Burton and his colleagues (Burton, Shadbolt, Hedgecock, & Rugg, 1987). Burton et al.âs work not only knocked my socks off, it also reknitted them. For one thing, I learned that I was no longer a lone voice in the wilderness; they, too, were empirically comparing alternative knowledge elicitation methods. Although Burton et al. were not as concerned as I was with figuring out ways to calculate the relative efficiency of various knowledge elicitation methods, their work was ambitiously experimental in that a number of variables were systematically manipulated. For example, they looked at the effect of personality variables (such as introversion-extraversion) on performance at various knowledge elicitation tasks. Like me, they looked at interviews and various kinds of âcontrivedâ tasks, and with regard to the major conclusions, we were both pretty much on the same track: Methods other than unstructured interviews should be used in knowledge elicitation. Special contrived tasks, which somehow alter the expertâs habitual procedures, can be surprisingly efficient at eliciting refined knowledge. On a broad scale, we both knew that progress could be made on addressing the knowledge elicitation bottleneck problem.
At that point, early 1988, it became clear that there was a need for a general overview of the relevant psychological literature on expertise. Hypotheses about expertise were being discussed in diverse literatures (e.g., judgment and decision-making research, expert systems development research, psychological research on learning and memory, etc.), with no attempt being made to cross disciplines in order to integrate and compare hypotheses. As it turns out, the relevant literature is huge (see Hoffman & Deffenbacher, in press; Hoffman, Shanteau, Burton, & Shadbolt, 1991).
The psychological study of expertise dates at least as far back as the World-War-I-era research on highly skilled machine operators, railway motormen, airplane pilots, and other professionals. Methods commonly used today in human factors studies of task performance (including human-computer interaction) can be dated at least as far back as the late 1800s, when Hugo MĂźnsterberg studied the efficiency of movements in the use of a pocket watch and Frederick W. Taylor researched the problem of designing efficient shovels. The âthink-aloudâ method that is now commonly used in the study of the problem-solving process (and the âprotocol analysisâ method that is used to analyze the data) can be traced back to educational research on the acquisition and development of knowledge by Edouard Claparède (1917) and subsequent research by Karl Duncker in the 1930s (Duncker, 1945). In fact, it was pioneer psychologists such as Duncker and Alfred Binet who deserve credit for generating nearly all of the core concepts of modern theories of reasoning (e.g., concepts that are now called means-end analysis, reasoning by analogy, goal decomposition, concept-driven versus data-driven processing, and a host of other core ideas). (For the historical details, see Hoffman & Deffenbacher, in press; Hoffman et al., 1991.)
A considerable body of modern research represents the efforts of experimental psychologists to understand, in some detail, the knowledge, reasoning, and skills of experts in such domains as medicine, physics, and computer programming. Indeed, Kenneth Hammond (e.g., Hammond, 1966) and James Shanteau (e.g., Shanteau & Phelps, 1977) had been studying the cognition of experts long before the advent of expert systems. (For a review of the research see Chi, Glaser, & Farr, 1988; or Hoffman et al., 1991.)
In their expert system projects, David Prerau and his colleagues at GTE Laboratories, Inc., and Allison Kidd and her colleagues (sheâs now at Hewlett-Packard), had paid great attention to the fine details of knowledge elicitationâeverything from dealing with management to the problems encountered in scheduling meetings with experts (Kidd, 1987; Kidd & Cooper, 1985; Kidd & Welbank, 1984; Prerau, 1985, 1989). Anna Hart had published one of the few books on âhow to doâ expert systems in which attention was truly devoted to how to do knowledge elicitation (Hart, 1986). Karen McGraw, whose research focused on the problem of multiple experts, was working on a similar book on knowledge acquisition (McGraw & Harbison-Briggs, 1989).
A number of experimental psychologists had been developing an interest in expert systems from a human factors perspective, such as Dianne Berry (Berry & Broadbent, 1986); Nancy Cooke (Cooke & McDonald, 1986); Sallie Gordon (1988); Gary Klein (1987); and of course Michael Burton, who ha...