Chapter 1
GAP: Assessment of Performance in Teams ā A New Attempt to Increase Validity
Viktor Oubaid, Frank Zinn and Daniela Gundert
German Aerospace Center DLR, Germany
Introduction
Safe and effective performance in aviation, for example pilot proficiency, demands not only excellent technical knowledge, but also pronounced interpersonal competence, which includes the selection and distribution of information, cooperative goal orientation and of course decision making. Moreover, skills in leadership and conflict management are required (Maschke and Rother, 2006).
Modern pilot training methods reflect these demands. āHuman performance and limitationsā, āmulti-crew coordination trainingā and ācrew resource management trainingā are important subjects in the airline pilot education and licensing process.
In order to select skilled candidates for training, a psychological selection including group assessment methods is employed by many airlines and air traffic controller organizations (Goeters, 2003). These methods typically comprise group discussion, group planning and prioritization tasks. The applicantās behavior is observed, registered by the observers using paper and pencil, and lastly, rated within a given guideline framework.
With the advance of computer and information technology there are two significant reasons why these methods require updating (Huelmann and Oubaid, 2004). First, typical tasks in the working environment of pilots are located at the human-machine gateway. Second, in conventional assessment center exercises, the objectivity of behavior ratings is comparatively lower than in other psychological methods. This is due to observer errors, complex interactions, and time-absorbing behavior reporting (paper and pencil).
A further problem of those aforementioned methods is the often low interrater reliability among different observers. This is also a direct result of the varying consideration of which exhibited behavior is deemed significant for a certain dimension. Additionally, noting observations by hand draws the observerās attention away from the continuing process of interaction. This results in missing or misperceiving behavioral units, an additional negative effect on reliability.
To overcome those problems, a computer-based group test system (GAP AssessmentĀ®; Group Assessment of Performance and Behaviour; Oubaid, 2007; Oubaid, Zinn and Klein, 2008) was developed in which behavior observations made by four experts (consisting of training captains and aviation psychologists) and objective behavior measures are integrated into an overall evaluation. The basis of the multi-level observations are taxonomically derived complex scenarios in which three or four applicants gradually receive different assignments and interact with each other face-to-face as well as through their individual touchscreen monitors that are part of the GAP network (see Figure 1.1).
Figure 1.1 Overview of the GAP AssessmentĀ® network including four observers and four applicants
Method
As a first step, a behavioral observation model was developed that also functions as the backbone for the scenario construction. This model is based on three sources: (1) the set of basic competencies used in Lufthansa pilot training, which includes basic interpersonal, technical and procedural competencies for a safe flight accomplishment (Lufthansa, 1999). (2) The VERDI Circumplex Behavioral model for DLR pilot selection (for example, Hoeft, 2003). (3) A Fleishman job requirement analysis for airline pilots (Maschke, Goeters and Klamm, 2000) was integrated to elaborate the areas of competence. Six areas of competence could be identified: leadership, teamwork, communication, decision making, adherence to procedures, and workload management.
As a second step, individual behavioral units ā the behavioral anchors ā were derived to translate the areas of competence into practice. These anchors were presented to assessment-center experts (aviation psychologists, training captains) to be rated regarding their prototypic assignment to the areas of competence following the Act Frequency Approach (Buss and Craik, 1980, 1983). The behavioral anchors were then combined into behavioral subsets (GAP sets). These GAP sets are the basis of behavioral observation in GAP scenarios. In the final version, three different categories of strain symptoms (hypomotor/hypermotor, vegetative, and paralinguistic symptoms) were integrated.
During four sequences the observers use a touch-screen to assign behavioral anchors to register the behavioral units which were presented by the applicants (see Figure 1.2).
The observerās screen also includes:
ā¢ feedback information about tasks the applicants are currently working on;
ā¢ feedback about certain performance parameters like talk-time, matching-task, or errors made;
ā¢ notifications about individual and group messages given by the instructors to intervene.
After each sequence, additional clinical ratings of the past performance are given by observers (for all applicants) and by the applicants (self-rating and peer-rating): rating leadership, teamwork and effectiveness. The observers rate two further dimensions: stress resistance and authenticity.
Figure 1.2 Overview of the GAP AssessmentĀ® observer screen
Typically, candidates tend to overestimate their behavior in assessment centers (Sarges, 1990, p. 529). GAP AssessmentĀ® enhances the quality of self-evaluation by implementing objective self-awareness during the self-evaluation process. According to the Self-awareness theory, people who focus their attention on themselves evaluate and compare their behavior to their internal standards and values (Duval and Wicklund, 1972). In GAP AssessmentĀ®, the self-awareness condition is assisted by presenting the candidateās photograph during self-rating. The photograph is taken during instruction time and also used for the subsequent peer-rating.
To augment the connection between the set of required competencies, the behavioral anchors and the roles played by applicants, scenarios were created on the basis of these anchors. The resulting GAP AssessmentĀ® scenario consists of four sequences. Sequences one and three contain schedule planning. Sequences two and four are conflict tasks in which the individual interests are not fully compatible to each other. In one example scenario, each candidate is one of three or four flight attendants. His or her role involves planning, which entails the rearrangement of passengers on a flight and rotation schedules. The role also involves conflict tasks, which include group decisions about an unattractive rotation and the nomination of an executive position. The information presented on the applicantās screen contains both individual and group role details, task instructions and a set of working rules and restrictions. To estimate the applicantās cognitive workload, a simple matching task is presented during the whole scenario. It involves a pair of randomly drawn letters with a refresh-rate of five seconds (see Figure 1.3).
Figure 1.3 Example of the GAP AssessmentĀ® applicantās screen
Results
The analysis focussed on three aspects:
ā¢ statistics for the GAP anchor measures;
ā¢ the quality of additional dimensional post-sequence ratings (self-rating versus peer- rating and observer rating);
ā¢ correlations between GAP variables and DLR assessment variables.
The sample consisted of N = 131 applicants, n = 115 males, and n = 16 females. The ratio of males versus females reflects approximately the typical ratio of pilot applicant groups. The mean age was 20.9 years. The data was collected between May and July 2010.
The GAP competence areas did not correlate significantly with age or school grades (A-level). However, there was one exception, decision making multiplied by age (see Table 1.1).
Table 1.1 Correlations of GAP anchor scores with biographical data
Table 1.2 displays the correlations between the post-hoc observer ratings with self-ratings and peer-ratings.
Table 1.2 Pearson correlations of observer ratings with self-ratings and peer-ratings
Self-ratings and peer-ratings show a high congruence for leadership.
Table 1.3 Pearson correlations of observerās GAP anchor scores with GAP dimensional ratings
The anchor-based observation results in a comparatively high accordance of judgements with the post-hoc dimensional ratings (Table 1.3). The high correlations of decision making and awareness/workload anchor scores with all dimensional ratings are remarkable and can be attributed to the highly structured GAP tasks.
In the next analysis, GAP anchor scores were compared with the judgements of assessment center experts using the VERDI Circumplex Behavioral model (Hoeft, 2003) and the DCT-Scheme (Stelling, 1999) for DLR pilot selection. The statistical analysis includes comparisons of these judgements on different levels (see Tables 1.4 and 1.5).
Table 1.4 Correlations of GAP anchor scores with VERDI dimensional ratings
Table 1.5 shows the correlations of GAP anchor scores with DCT scores.
Table 1.5 Correlations of GAP anchor scores with DCT scores
Most correlations are plausible and convincing. As expected, correlations between GAP leadership and VERDI coordination are highest due to their common definition. Also, the significant correlation between the matching task and VERDI stress resistance is easy to interpret, because VERDI stress resistance involves the ability to work under stress. The main unexpected result is the low correlation between VERDI cooperation and all GAP variables, namely GAP teamwork. A possible explanation is that VERDI cooperation involves more non-verbal aspects.
The correlations with the DCT are mostly low. One exp...