An Introduction to Situational Judgment Testing
Jeff A. Weekley
Kenexa
Robert E. Ployhart
University of South Carolina
In selection, testing, and assessment contexts, the fundamental task is to make accurate predictions about a person's current and future job performance based on limited information obtained during the selection/testing process. The more accurate and relevant the information obtained in the selection/testing process, the better this prediction is going to be. The magnitude and consequences of testing in today's world is staggering. From assessments in elementary and high school, to college and graduate admissions, to employment testing and certification, to placement in military occupations, millions of people are affected by testing each year. When done correctly, such testing programs improve the effectiveness of organizations and possibly entire nations. Therefore, the continued search for better and more efficient predictors of performance is critical.
The situational judgment test (SJT) presents one such “new” predictor. Over the past 15 years, SJTs have increased in popularity as a predictor of performance. In the typical SJT, an applicant is presented with a variety of situations he or she would be likely to encounter on the job—these situations are usually gleaned from critical incidents or other job-analytic methods. Accompanying each situation are multiple possible ways to handle or respond to the hypothetical situation. The test taker is then asked to make judgments about the possible courses of action, in either a forced-choice (e.g., “select the course of action you would be most and least likelyto perform”) or Likert-style format (e.g., “rate the effectiveness of each option on a five-point scale”). Scoring is done by comparing the applicant's choices to a key, which itself can be determined rationally or empirically. Although most SJTs are of the paper-pencil variety, a few have been adapted to video (e.g., (Dalessio, 1994; Weekley & Jones, 1997) and more recently to the personal computer (Olson-Buchanan et al., 1998), including Web-based administration (e.g., Ployhart, Weekley, Holtz, & Kemp, 2003). Samples of typical SJT items appear in Table 1.1 (see also chap. 9 for other examples).
Although SJTs have been around for quite some time, research on the subject has until recently, been very sporadic. Publications by Sternberg and colleagues (Sternberg, Wagner, & Okagaki, 1993; Wagner, 1987; Wagner & Sternberg, 1985) on “tacit knowledge” and by Motowidlo, Dunnette, and Carter (1990) on the “low fidelity simulation” stimulated renewed interest in SJT. Since Motowidlo et al.'s (1990) reintroduction of the subject to industrial/organizational psychology, there has been a surge in research directed at understanding SJTs. Consider submissions to the annual conference of the Society for Industrial and Organizational Psychology. Over the past decade, there has been a dramatic surge in the number of papers on the topic presented at the conference. For example, the number of SJT-related papers and presentations more than doubled from 1999 to 2004. Further growth in research on the topic is expected.
This increased popularity of SJTs is undoubtedly due to research showing these tests to have a number of very positive features. First, research indicates that SJTs can have validity approaching that of cognitive ability tests. McDaniel, Morgeson, Finnegan, Campion, and Braverman (2001), for example, accumulated 102 validity coefficients and estimated the mean corrected validity of SJTs to be 0.34. Furthermore, there have been several studies showing SJTs to have incremental validity above and beyond traditional predictors such as cognitive ability and personality (e.g., Clevenger, Pereira, Wiechmann, Schmitt, & Schmidt-Harvey, 2001; Weekley & Ployhart, 2005). These studies suggest that SJTs are capturing something unique, something related to performance that is not captured by other traditional constructs.
Second, mean subgroup differences are typically small to moderate. Importantly, SJTs show smaller racial subgroup differences than those observed for cognitive ability tests (e.g., Motowidlo & Tippins, 1993; Pulakos & Schmitt, 1996; Weekley & Jones, 1999; see Hough, Oswald, & Ployhart, 2001, for a review). Although there is wide variation in the effect sizes found
TABLE 1.1
Sample Situation Judgment Items | One of the people who reports to you doesn't think he or she has anywhere near the resources (such as budget, equipment, and so on) required to complete a special task you've assigned. You are this person's manager. |
- Tell him/her how he/she might go about it.
- Give the assignment to another employee who doesn't have the same objections.
- Tell the person to “just go do it.”
- Ask the person to think of some alternatives and review them with you.
- Provide the employee with more resources.
|
| Which response above do you think is best? |
| Which response above do you think is worst? |
| You have been trying to get an appointment with a very important prospect for several months, but you can't seem to get past her secretary. The secretary screens all of her boss' calls and mail. Of the following options: |
- Try just dropping in when you are nearby, and say you will wait to meet with her.
- Diplomatically tell the secretary that her boss, rather than she, should make the decision whether to see you.
- Write a confidential/private letter to the prospect, explaining the situation.
- Try to reach the prospect early in the morning or in the evening when the secretary is not there.
- Try to get the prospect's attention by doing something unusual, such as sending flowers, tickets to something special, or a singing telegram.
|
| Rate each option above using the following scale: |
| 6 = highly effective |
| 5 = moderately effective |
| 4 = slightly effective |
| 3 = slightly ineffective |
| 2 = moderately ineffective |
| 1 = highly ineffective |
for race, in almost all cases they are lower than the standardized mean difference of d = 1.0 typically reported for cognitive ability tests (Sackett & Wilk, 1994). This is important because the passing of the 1991 Civil Rights Act outlawed the practice of “within-group norming.” By making the practice illegal, this legislation ensured that measures of cognitive ability, one of our most predictive constructs, will generate adverse impact in use at even modest cut scores (Bobko, Roth, & Potosky, 1999; Schmitt, Rogers, Chan, Sheppard, & Jennings, 1997). Consequently, researchers and practitioners began to look for high-validity tests that would produce less adverse impact against minority candidates. The SJTs fit these requirements nicely.
Finally, the face validity inherent in the typical SJT can be an important benefit to selection procedures. Although research has yet to fully examine this question, it seems reasonable to expect SJTs to be readily accepted and explainable to applicants and may even offer the benefit of providing a realistic preview of the job. Thus, SJTs appear to provide validity nearing that of cognitive ability tests, yet produce smaller subgroup differences and possibly more favorable applicant reactions (Clevinger et al., 2001).
Perhaps because the early interest in SJTs was based on addressing these practical issues, research has predominantly focused on showing the relevance of SJTs in predicting job performance with less adverse impact. This work has been valuable in that SJTs have moved from obscurity to an increasingly common predictor method (e.g., in their comprehensive review of predictor validity, Schmidt and Hunter, 1998, do not even list SJTs as a predictor). However, it has proceeded largely without benefit of a theoretical framework and has not addressed the many kinds of practical issues that contribute to effective selection. As a result, it is an appropriate time for researchers and practitioners to take stock of where SJT research has been and to set an agenda for future research and practice. In this book, leading experts have been asked to comment on important issues related to theory of situational judgment, SJT design, and SJT implementation. They address a number of current challenges, offer solutions for better understanding SJTs theoretically and using them practically, and establish a future research agenda. Before considering these chapters, let us first place SJTs in context, both in terms of other similar predictors and the historical development of SJTs.
SITUATIONAL JUDGMENT TESTS COMPARED TO OTHER ASSESSMENTS
It is instructive to distinguish SJTs from other similarly situation-based assessment methods. The situational interview (Latham & Saari, 1984; Latham, Saari, Pursell, & Campion, 1980), wherein applicants are presented with likely job-related situations and the interviewer rates the effectiveness of responses (often using behaviorally anchored rating scales), is a close cousin of the SJT both in form (e.g., Weekley & Gier, 1987) and validity (e.g., McDaniel, Whetzel, Schmidt, & Maurer, 1994). The primary differences between the situational interview and most SJTs are in how they are presented to examinees (verbally vs. in writing); how examinee responses are given (verbally vs. selecting from among a closed-ended set of options), and how responses are scored (interviewer judgment vs. comparison to some scoring key).
SJT also shares some similarity to other situational-based methods such as work samples (Asher & Sciarrino, 1974) and many assessment center exercises such as in-baskets, role plays, and the like (Thornton & Byham, 1982). Work samples and assessment centers, however, go well beyond the SJT format in that they actually put the examinee “in the situation” as opposed to merely presenting a description of it to the examinee. Ployhart, Schneider, and Schmitt (2005) noted that these simulation methods vary on a continuum of physical fidelity, such that SJTs are low fidelity, assessment centers are higher fidelity, and work samples are the highest fidelity. Thus, these methods measure the ability to do rather than to know (Cascio & Phillips, 1979) and behavior is assessed directly rather than through self-reports of what one would or should do. Finally, as with the situational interview, these methods require “assessors” to assign scores to examinees, whereas SJTs can be scored mechanically. Thus, SJTs share some features of interviews, work samples, and assessment centers, but have a number of important differences. These differences are such that SJTs may be easier to score and implement in large-scale testing programs, making them attractive options for early stages of recruitment and selection. Because of these important differences, the historical review that follows is restricted to SJTs as described earlier.
THE EARLY DAYS OF SITUATIONAL JUDGMENT TESTING
Although research interest in SJTs has grown quickly in recent years, the notion of measuring human judgment has been around for a very long time. The earliest example of SJT depends, in part, on how an SJT is defined. As reported by DuBois (1970), the first civil service examinations in the United States contained some items of a distinctly situational nature. For example, one such 1873 test for the Examiner of Trade-Marks, Patent Office, contained the following: “A banking company asks protection for a certain device, as a trade-mark, which they propose to put upon their notes. What action would you take on t...