1 | Selection and Assessment |
Preface
Selection and assessment are seminal topics within occupational psychology. Early interest was driven by the need to select good military personnel. By the mid 1950s, the field was criticized for being overly technical, lacking theoretical insight. Cynicism was compounded by Equal Opportunity arguments that many of the favoured measures discriminated against minority groups. Recent recognition of the strategic potential of selection to enhance organizational performance reinvigorated the field. However, some critics note that many of the most sophisticated selection techniques still await construct-validation. Progress may be held back by practitioners seduced by complex selection procedures regardless of conceptual integrity and irrespective of predictive power. Beliefs about validity are not the same as actual validity. Unfortunately, technical rather than psychological considerations still prevail. Moreover, the criterion (or ‘to be predicted’) domain is relatively impoverished, making it difficult to be clear about exactly how different selection methods work in performance terms. This is especially problematic where commonly the criterion domain is continuously shifting through changing job and organizational demands. This has prompted some to suggest that ‘job-specific criteria’ are no longer appropriate to selection relative to more general and transferable skills, qualities and competences relevant to economic survival today. There is also a ‘coming of age’ of research on the applicant perspective on selection processes and, in particular, the dynamics of selection as a vehicle for understanding the early stages of the employment relationship. This interest assumes a model of the applicant as a subjectively active participant in the selection process rather than as a passive and quantifiable resource to be measured and ‘slotted’ into a job. The first part of this chapter describes the conventional ‘organizational perspective’ on the selection system, explains the selection validation paradigm, asks questions about fairness, and also explores the relatively ill-studied criterion domain. The second part examines the reliability and validity of different selection techniques or predictors (interviews, tests and so on). The third part looks at the ‘applicant perspective’ on selection.
Learning Objectives
Studying the contents of this chapter will enable you to:
- describe the personnel selection system and its component parts;
- describe and explain the selection validation paradigm with particular reference to the concepts of reliability and validity;
- provide a rationale for validity generalization and generalizability theory in the context of selection;
- describe the problem of the criterion in selection research;
- discuss fairness issues arising from selection practices drawing on equal opportunities policy;
- compare and contrast the reliability and validity of commonly used selection techniques and methods;
- describe and debate the features of each different technique or method, including problems of implementation and issues of fairness;
- critically consider the active role played by the applicant in the selection process, and in particular the selection process as a connecting mechanism at the point of entry into an employment relationship;
- compare and contrast the organizational perspective with the applicant perspective on selection;
- identify future theoretical and practical challenges in the field of selection and assessment research.
Case Study 1.1: Daewoo Cars
Daewoo Cars began recruiting in 1994 starting with its management team. Job profiles were drawn up by a consulting company (involving job analysis and person specification) and then advertised locally. Each advert generated 800 enquiries. Information packs contained details about Daewoo Cars, its place within the Daewoo Group, its philosophy for selling cars, its pay and reward system, opportunities for training and development, details about job purpose and accountabilities and instructions for completing the application questionnaire.
Application involved self-assessment of breadth and depth of experience in seven major categories: motor trade, finance and insurance, retail trade, budget and staff management, information technology, dealing with the public and working patterns. From this data, Daewoo’s computer system was able to screen applicants. Daewoo’s Human Resources department then examined all ‘above the line’ applications, selecting a proportion of them for interview, typically four or five for each vacancy. Applicants also answered questions on issues such as ‘customer service’, success through team effort and sales success in previous jobs. In the first of two rounds of interviews, agency personnel trained in competency-based interviewing conducted rigorous 45-minute interviews, scoring each competence on a five-point scale. Around half of the applicants from this stage went on to a second interview with members of Daewoo’s staff, either Human Resources or line management (or for more senior roles, regional managers). Second interviews were more traditional, looking at applicants’ personalities, presentation and interpersonal skills in interviews lasting up to an hour. About half of the applicants interviewed at this stage received an offer. References were checked on job acceptance as means of factual verification.
IDS Study 581, July 1995.
Part 1: The Organizational Perspective on the Selection System
The Daewoo case study illustrates the conventional practice of selection from an ‘organizational perspective’ (Anderson, 2004; Guion, 1998). The selection process assumes that the systematic assessment of the ability or potential of a person to meet a particular set of job criteria can be precisely evaluated. This should enable a forecast or prediction to be made about the suitability of a person for a particular job. This process has a number of advantages relative to a purely intuitive approach. First, it describes a step-by-step procedure for ensuring that selection practice is ‘objective’ and thus also ‘fair’ and open to empirical verification. Second, it offers both practitioners and researchers a model of best practice against which to evaluate the success of a selection intervention.
The larger the selection ratio (number of applicants to number of vacancies), the more discriminating the selection strategy will need to be. A selection device known to be reliable and valid with respect to a particular outcome criterion can be used to select from even a small applicant pool (for example, cognitive ability test for a senior management position). The cost of a selection error will depend on the job. If poor performance could irreparably damage the company (for example, finance director) then the cost of a mistake will be high.
The Selection Validation Paradigm
The above selection process is framed by ‘the selection validation’ approach, the prevailing model in recruitment science. It assumes that the key to predicting performance is to identify a job-relevant outcome criterion or criteria and a predictor that will generate evidence with good discriminating power against the criterion. In this instance, predictors are the methods of assessment. Measures can be both psychometric (standardized and reliant on a statistical metric to produce a score), and non-psychometric (for example, interviews, references). It is generally agreed the more psychometric the measurement method, the more precise the forecasts. In practice, both forms of selection devices are commonly used to obtain applicant evidence.
Adequate description and explanation of the selection validation paradigm requires the introduction of two key psychometric concepts: reliability and validity.
Reliability
Reliability is the extent to which a measurement tool yields a consistent score or set of scores. A highly reliable measure produces the same or similar readings over time for the same thing, like a tape measurement of the length and width of a table. Measurement of physical objects or phenomena using precisely calibrated measures (rulers, thermometers, scales) is less open to error than the measurement of abstract constructs. Scope for variation across different instances of psychological measurement is enormous, arising from test factors (for example, item difficulty level), person factors (for example, motivational state), and the circumstances surrounding the measurement process (for example, test conditions).
The reliability index is the result of a correlation (r) between two or more sets of readings from the same source, using the same tool either immediately or after a short delay. Parallel readings may be derived by splitting the test on an odd–even basis (split-half technique) or by using alternate versions of the test (alternate forms). High reliability does not imply correctness. A measure of intelligence may produce a consistent score for the same person over time, but does it really measure intelligence?
Validity
Validity indicates whether a measure is right for the purpose. High reliability is no guarantee of high validity, although low reliability is a sure indicator of low validity. There are four main ways of assessing validity (Study Box 1.1). Each type of validity provides an important perspective on a measure. A measure with high construct validity may not necessarily have good criterion-related validity. For instance, a test of ‘extroversion’ may be construct-valid (various measures converge in their description of an extravert person as sociable, outgoing, articulate), yet possess dubious criterion validity in the context of job performance. On the other hand, a test with high criterion validity (a test of clerical skills predicts the performance of clerical staff on the job) may nonetheless be low on construct validity (different components of the test do not yield internally consistent results). Indeed any conclusions drawn about predictive validity may not be easily explained. In short, validity of one kind does not necessarily imply validity of another.
Study Box 1.1
Types of validity
Construct validity
Constructs are inferences about psychological phenomena that cannot be directly measured (for example, conscientiousness) so they have to be translated into concrete attitude or behavioural (that is, operational) terms that can be measured (for example, attention to detail, organized, reliable, precise). The test of construct validity is to determine whether the indirect measure is indeed a true indicator of the supposed construct. Estimating this begins with a theory about how the ‘construct’ operates or displays itself. Another way to assess construct validity is to compare test scores from different supposed measures of commitment. The scores should converge. However, debates are inevitable surrounding what measures should converge and why. Recent controversies have arisen over the use of particular techniques like confirmatory factor analysis for ascertaining construct validity (see Lievens & Klimoski, 2001).
Content validity
This refers to the representation of a measure. A measure of cognitive ability should cover the entire range of attitudes and behaviours considered theoretically relevant to the construct. This may be a matter of expert judgement. If the content in question pertains to a well-defined set of behaviours (for example, number manipulation, spatial configuration and manipulation, fluency of word use, speed of information processing), it is easier to describe the construct in empirical terms (for example, convergent or divergent thinking ability).
Criterion-related validity
This pertains to whether the measurement tool can account for significant variance in the ‘criterion’. A valid test of cognitive ability, for example, should yield a score that predicts job success. There are two ways of assessing criterion validity: predictive and concurrent. Predictive validity involves testing job applicants and then comparing their scores with supervisor ratings of on-the-job performance at some later point. Concurrent validity, by contrast, involves testing job incumbents and then correlating the results with supervisor ratings of on-the-job performance.
Face validity
This refers to whether a measure appears relevant to the test domain. Some tests are more obviously job relevant (for example, work samples involving concrete tasks) and others are not (for example, personality tests). The importance of face validity is not so much for selection validation, rather for how applicants react to the assessments they face. Applicant perceptions about test relevance and meaning are critical to their opinions about fair assessment (see Study Box 1.14 later in this chapter). It is crucial to have a defensible criterion against which to gauge each predictor. Nonetheless the issue is a contentious one, leading some to argue that face validity is an important public relations consideration, if only from the standpoint of what it implies for the development of a relationship of ‘trust’ between employee and employer (Herriot, 2001).
Selection validation
Selection validation is synonymous with criterion validity which concerns the appropriateness of a particular selection device for making accurate forecasts about future job performance. A number of conditions need to be specified under which performance forecasting works best. In the initial prediction validation exercise, applicants should not be hired on the basis of their test scores. If the test score is valid, it is difficult to obtain a true assessment of this from pre-selected applicants because the criterion measure is unlikely to yield performance data variable enough to make fine discriminations. This is called the restriction of range problem. However, in practice it may be burdensome to put applicants through a costly assessment process (for example, work sample test) that is strictly not used for selection decisions. This, coupled with a tendency to rely on early performance information to make applicant judgements, accounts for why concurrent validation techniques are more popular (Study Box 1.1). Concurrent techniques have less forecasting potential, but are better than no validation at all.
The most often used index of criterion validity is the validity coefficient describing the relationship between a predictor (x) and a criterion (y) measure assessed by correlation (r). The higher the correlation (expressed from −1 to +1), the higher the strength of the association. Correlation expresses an association, not a causal relationship. A high cognitive ability score predicts but does not necessarily cause high performance. Many other factors like effort and disposition could contribute to this association.
Validity coefficients derived from the correlation of test scores with criterion measures are only one form of evidence. Guion (1998: 114) describes validity as an inference about how confident we can be in attributing meaning to scores, built from various sources of evidence. Other forms of evidence include judgements about test construction procedures. Was there a clear idea of the attribute to be measured? Are the mechanics of the measure consistent with the way the attribute has been defined? For instance, it would not be possible to obtain a valid measure of divergent thinking style from a problem-solving puzzle with only one correct solution. Similarly, the content of the test should have a logical association with the attribute being measured. If the attribute is multi-dimensional does the content of the measure reflect this? Kline (1986) said that if the original item pool was derived from theoretical and empirical insight, it may be reasonable to infer that the scores are a valid indicator of the attribute in question.
Hakel (1986) adds that the validity of the criterion is also crucial to interpretations of the validity coeffi...