In the biblical story of Gideon, warriors were selected for a special mission by two screenings. The first considered their motivation; those who did not wish to participate were allowed to withdraw. The second involved observations of the way they drank from a body of water; those who put their heads down to the water were eliminated and those who brought water up to their mouths in their hands were chosen. The presumed rationale was that the latter were more alert and watchful.
Over the centuries, men have often been selected for special roles on the basis of their performance during rigorous training. This method was used to identify the elite Spartans and, centuries later, to determine those qualified for knighthood. On the other side of the world, there were formal procedures for examining candidates for office at least by the beginning of the Chan dynasty in China, three thousand years ago (DuBois, 1970). The tests were job samples, demonstrating the candidatesâ proficiencies. Even today there is no better way of selecting promising applicants than the properly devised and administered work sample. Today, however, the careful personnel psychologist makes an empirical study to establish the validity of all selection procedures he recommends, both predictive tests and work samples.
Thus applied psychology began much earlier than psychology as a basic science, which came into being only during the last century. The measurement of concepts of personality for purposes of pure research is even younger, beginning only a few decades ago.
Development of Procedures
The history of modern personality measurement dates from about World War I, when procedures were devised to obtain observations under somewhat standardized conditions or to collect data economically. Hermann Rorschach (1942) published his Psychodiagnostik in 1921, a year before his untimely death. In this volume, subtitled âA Diagnostic Test Based on Perception,â he reports his observations on the responses to his famous inkblots as given by the 405 subjects he tested, and he tabulates the frequencies of some scores for each of several diagnostic categories. This is one of the earliest landmarks in clinical psychological testing as we know it today. While literally thousands of papers have been published about his test, vastly extending the available data on it, it is noteworthy that a large part of what Rorschach wrote is still accepted by clinicians today.
The other early landmark is the development of the Woodworth Personal Data Sheet, first published in 1919. (Woodworth is better known for the integrative texts on experimental psychology which he wrote in later years.) Designed to take the place of an interview, this inventory asked questions to determine whether the subject was poorly fitted for military service because he possessed certain undesirable characteristics or symptoms. Also developed for use during World War I were rating methods and some group tests of intelligence. In this national emergency, as in World War II, considerable technological development occurred within psychology. During the subsequent decades, several approaches contributed innovations and advances to the measurement of personality.
The Clinical Approach
Although many new scoring categories and composite scores have been devised since Rorschachâs book, inkblot testing remained essentially in its original form until Wayne Holtzman and his colleagues (1961) devised a new procedure with two series of forty-five blots each. The aim was to provide more systematic and comparable protocols. The subject gives just one response to each blot, so that differences in number of responses are no longer a problem. A comprehensive scoring manual enables scorers to obtain high levels of agreement with each other. The scoring categories are construed in dimensional form. Thus the Holtzman Inkblot Technique is a better psychometric instrument for basic research than the less structured and more judgmentally interpreted test on which it is based.
Another commonly used projective test is the Thematic Apperception Test (TAT), devised by Morgan and Murray (1935). In this test, the subject makes up stories based on a number of drawings. These stories can be interpreted or scored in innumerable ways, from identifying qualitative themes specific for the particular subject to rating the strengths of the subjectâs dispositions, as in Murrayâs âneeds.â Like the Rorschach, the TAT is in common clinical use; perhaps more than the Rorschach, it is also employed for empirical studies of normal persons.
The interview is the most widely used method for assessing personality. It is used in the clinic and in personnel selection for applied purposes; personology and social psychology also use it in basic research. Its popularity stems from its naturalness, simplicity, and flexibility. While it seems intuitively valid, many psychologists consider this face validity to be illusory, since much research has found no predictive validity for it. Although many guides for interviewers have been prepared, there is no standard form. Perhaps the most structured and objective type of interview is that developed for surveys of opinions. (In this context, âobjectiveâ means that the protocols themselves and the data derived from them are relatively free from influences associated with the particular interviewer.) Such interviews were used in a survey of views of testing, to be discussed later (Chapter 10). Standardized interviews have also been developed for epidemiological studies of mental disorders.
The Inventory Approach
Following the example of the Personal Data Sheet, many inventories were published during the 1920s and 1930s, and the proliferation still continues. The early ones were often intended to measure maladjustment or neurotic tendencies, but tests were gradually developed for other attributes: dominance, Jungâs introversion, etc.
The most widely used inventory today is probably the Minnesota Multiphasic Personality Inventory (MMPI). It is used not only in clinical psychology but also in personnel selection and in research investigations. In its original form in 1943, it consisted of a series of cards on each of which was printed a statement, which the subject sorted into two piles, one for true and one for false, as he considered they applied to him. A booklet form came into vogue a few years later. The original clinical keys and many later ones were developed empirically, by identifying responses given more frequently by a particular clinical group than by normal people. Thus a key for Hypochondriasis was developed from the responses of patients diagnosed as relatively pure cases of this tendency, uncomplicated by other pathologies; keys for Hysteria, Depression, Paranoia, etc., were constructed in the same way.
Values and interests are also measured frequently by paper-and-pencil instruments. One of the first was the Allport-Vernon Study of Values, which yields scores for Theoretical, Economic, Aesthetic, Social, Political, and Religious values, a classification first proposed by Spranger. This test was created a priori and later subjected to psychometric analysis and refinement.
Another well-known interest test was developed empirically, like the MMPI. Dating back to 1927, the Strong Vocational Interest Blank consists of several hundred items (recreational and work activities, school subjects, etc.) for which the subject indicates his liking or dislike. His responses are then compared with keys for a number of vocations, each key developed by comparing the responses of those who have been in that occupation for a number of years to responses by people in general. The rationale is obvious: if a subjectâs interests resemble those of people who have entered and remained in a vocation, it is more likely that he will enjoy that vocation. The rationale has been supported empirically.
Many dozens of inventories for traits or interests have been devised. Texts on tests and measurements usually describe in some detail a number of these. For comprehensive listings, data on publishers and prices, and critical reviews, see the Mental Measurements Yearbooks, published every few years by Oscar Buros (1965). His Tests in Print (1961) is another valuable reference. (Examples of items from various kinds of tests are given in Figure 5.2.)
Systematic Observations
Another approach tries to capitalize on more naturalistic situations and realistic tasks. A favorite research method of developmental psychologists is observation of nursery-school children on the playground. Systematic techniques for standardized recording and rating yield scores for many characteristics. Less naturalistic but still compelling contexts have also been devised for measuring personality, often without the subject knowing that he is being measured, or at least without his knowing what attribute is being measured. During the late 1920s, Hartshorne, May, and Shuttleworth (1930) studied character in children. For example, they contrived procedures in which children would be tempted to cheat and would believe that such cheating would not be observed. (They report rather low relationships between the scores for the various procedures, a typical kind of finding that will be analyzed intensively in Chapter 11.)
World War II saw the introduction of new approaches to observation, this time in the service of selection for special purposes. These involved fairly realistic work samples (as in speed in completing an obstacle course), the use of informal observations as well as structured testing, and final judgments reached by first having each judge pool his ratings for various procedures into a composite rating and then having several such judges discuss and decide on a group judgment.
The earliest applications of these approaches were made in Germany in the years just before World War II. While not much is known about this work or its adequacy, it seems to have involved observations of performance on standard military tasks as well as effectiveness under stress. Rather similar intensive testing procedures were developed in England for the War Office Selection Boards and for Civil Service Selection Boards after the war.
Shortly after the United States entered the war, the Office of Strategic Services set up âassessment schoolsâ under the direction of Murray and MacKinnon to screen candidates for such special work as espionage, sabotage, and propaganda. This work was reported in Assessment of Men (OSS Assessment Staff, 1948). The candidates were tested for ability to maintain security (e.g., by not giving away their true identity), practical intelligence (as in solving an engineering problem with crude materials), verbal intelligence (as shown in discussion problems), leadership (in leaderless groups and when assigned the role of leader), functioning under stress (such as cross-examination), and ability to get along with others. In addition to taking the situational tests mentioned above, the subjects were interviewed, given projective tests, intelligence tests, and tests for special abilities, and required to fill out a long life-history form. Other procedures included observations at meals and when socializing with liquor, and sociometric reports of the impressions they made on fellow candidates.
Practical necessity made it impossible to obtain adequate data to test the validity of the judgments made by the OSS Assessment Staff. To determine the validity of such an assessment approach and to try to develop procedures for selecting potential clinical psychologists, a research program was instituted after the war at the University of Michigan (Kelly and Fiske, 1951). The design allowed the determination of the incremental contribution of various procedures; e.g., how much did an interview add to information available from various basic application materials? The procedures included more standard paper-and-pencil tests and fewer situational tests than the OSS program.
This research encountered considerable difficulty in its subsequent efforts to assess clinical performance. The measurement of professional competence is always a hard problem. Given the assumption that the various criterion measures were adequate, this study demonstrated that paper-and-pencil measures could predict clinical competences about as well as more expensive and judgmental procedures, such as projective tests, interviews, and situational tests. But none of the predictive coefficients were very high. Furthermore, the ratings of surface traits made during assessment had only modest correlations with subsequent ratings of the same traits. As we shall see later, these findings are fairly consistent with those from many other investigations. Personality measurements tend to be rather specific, varying with the conditions under which they are obtained, with the particular task given the subject, and also with the content of the items or stimuli to which the subject reacts.
At about the same time, a group at the Menninger Foundation started research on the selection of psychiatrists (Holt and Luborsky, 1958). This investigation was restricted essentially to clinical methods: interviews, projective tests, an intelligence test, and several tests devised for the particular purpose.
Stemming more directly from the OSS assessment work, the Institute for Personality Assessment and Research was set up in Berkeley under the direction of MacKinnon. This institute has been more interested in theory than were early assessment programs. It has studied adequacy of functioning in various vocational fields, with particular attention to the optimal functioning found in creative persons (MacKinnon, 1965; Barron, 1968). The emphasis has been increasingly placed not on scores from particular procedures, but on human judgments based on diverse observations, judgments made by the assessors, by the subjects about themselves, even by the subjectsâ parents.
Other Approaches
Clinical psychologists have always used observations of the way a subject goes about a task as one basis for judgments about his personality. In taking an intelligence test, for example, does the subject show caution by making few errors, especially careless ones? More recently, psychologists have studied cognitive stylesâthe way a person categorizes his perceptions, his ability to perceive something independent of its field or context, his ability to ignore distraction, etc. A quite different orientation is the study of the personâs social stimulus value, of how he is perceived by others. Is he socially visible, is he liked, is he rejected, is he seen as a good leader? Finally, personality variables have been approached by measuring psychophysiological reactions. Everyone has heard of lie-detector tests, which assume that a subject will react differently when suppressing the truth than when telling it. Similar techniques are employed to study reactions to stress. Quite recently, careful experimentation has been done on the size of the pupil: with the intensity of illumination held constant, the pupil enlarges with positive interest in a stimulus and may contract with some negative reactions (Hess, 1968). (For a longer history of personality and other psychological testing, see DuBois, 1970.)
Progress or Productivity?
Does the history of the development of personality measurement show much progress? Many psychologists would say that such progress has been quite limited. What it does show is the creation of many new approaches, the refinement of old methods, and the invention of many varieties of specific instruments. There is little convincing and definitive evidence that we are now measuring the concepts of personality appreciably better than we were two or three decades ago, in part because of the many obstacles to determining the adequacy with which any one construct is indexed by one or more procedures. There certainly have been major technical advances in methods for analyzing quantitative measurements. The problems in personality measurement are being more sharply delineated, but we do not have the consensus required for scientific progress toward solutions.
If we dip into the journals published several decades ago, we can see some evidence of progress. Earlier articles are often more literary and less technical than recent ones. While they may b...