The accelerating interest in postconventional and postautonomous stages of human development has given birth to a number of different forms of inquiry. This volume combines many of these, including longitudinal studies, qualitative inquiry, and theoretical explorations. To set this emerging field on a sound foundation, the accurate assessment of high stage development needs to be an ongoing concern. The first part of the present volume is devoted to this subject.
Validity and Reliability of the SCT
This part of the chapter presents the most pertinent and recent findings about the SCT. Technical details about the test can be found in Loevinger (1998b). The available literature that criticizes or supports the construct of ego development theory and its measurement is extensive. The face validity of the instrument is demonstrated by the sheer fact that it has been used in more than 300 research studies. These research studies include such diverse topics as parenting behaviors, managerial effectiveness, and effects of meditation on recidivism rates. Compared with structured tests, the psychometric properties of projective instruments are hard to assess, which is further compounded by the fact that the SCT does not intend to predict behavior, measure social adjustment, or evaluate psychopathology. Instead, it is designed to assess a soft construct, the maturity of personality expressed as a developmental variable. Structural-development theories aim to describe an underlying concept that is unique and can be difficult to define. Not only Loevinger (1976) but also Kohlberg (1969) struggled with the fact that a development stage is not clearly expressed in any particular behavior. However, Loevinger (1998b) cogently argued that correlation with real-life data is important, because a test that does not correlate with anything but another test may be of limited value.
Loevinger (1998b) cited good evidence for the sequentiality of the stages as demonstrated through longitudinal data. Kroger (2008) used Rasch scaling to show sequentiality and also demonstrated correlations with Kegan's (1982) assessment tool called the Subject/Object Interview. SCT scores in several studies correlated, as predicted, with interview data and behavioral observations in regard to cognitive complexity and the understanding of psychological mindedness. Manners and Durkin (2001) reviewed evidence for the construct validity of the SCT and concluded that those studies support the construct related evidence for the SCT. Novy and Frances (1992) administered the SCT and a battery of structured personality tests that addressed positive, inner orientations. All correlations were low, usually around .2. Novy (personal communication, May 16, 2005) attributed the low correlations to the fact that ego development is a more abstract concept than any of the other measures used.
Novy and Frances (1992) completed an extensive reliability study of the current form of the SCT. The reliability for the test is good, and it exceeds that of other projective instruments. Internal consistency as evaluated by Cronbach's coefficient α, which establishes the lowest estimate of reliability, is .91. The interrater agreement of the TPR is .94. The SCT has 36 items. Administering the first and second half of the test separately, these authors found a coefficient α of .84 and .81, respectively for each half. The correlation between the two halves was .79. Shorter tests usually are less reliable than longer tests, meaning they contain more error. The extent of the error in the correlation between two tests can be estimated and compensated for through the use of a statistical formula, called correction of attenuation. After applying this procedure, the correlation between the two halves of the test rose to .96. Novy and Frances suggested that the two test halves are usable as equivalent forms, although Loevinger (1998b) emphasized that only the complete 36-item form allows for optimal results and should therefore be preferred.
One of the recurring issues for projective tests is standardization of test administration. Loevinger (1998b) specified the number of sentence stems per page because the available space may signal the test taker how much is expected in terms of the completion. The SCT is usually administered as a paper-and-pencil measure to a group with the written instruction of āComplete the following sentences.ā Loevinger (1998b) pointed out that all tests show inconsistent results if the instructions change, which is not something specific to the SCT. She strongly urged researchers to use the standardized instructions because that would allow a comparison of the results across studies.
Several researchers have experimented with modified instructions (Blumentritt, Novy, Gaa, & Liberman, 1996; Drewes & Westenberg, 2001; Jurich & Holt, 1987). The consensus seems to be that modified instructions, such as āBe matureā or instructions about the concept of ego development allow participants to achieve higher scores, but the increases are small and consistent across several studies, usually only one stage. The fact that significant changes are not achievable attests to the validity of the theory of epigenetic stage sequencing; individuals cannot understand and intentionally move to a higher stage. Jurich and Holt (1987) and Blumentritt et al. (1996) attributed the modest increases to a reduction in the ambiguity of the stimulus. Under conditions of more specific instructional sets, participants become more engaged and achieve their higher scores through better motivation when taking this test. However, Drewes and Westenberg (2001) argued that a person cannot be seen as being at a fixed stage. Instead individuals express a developmental range, with a functional level that is evidenced under standardized instruction conditions, and an optimal level that is evidenced by more specific instructions. In any given test protocol a person usually gives responses at a variety of levels. The SCT assumes that if a high enough answer is given often enough that this is the modal level of functioning for that person.
A newer concern about test administration deals with computer-based administration. If the test is sent to a research participant as an e-mail attachment, the administration is no longer standardized. As a Word document, the test-taker can change the spaces provided for the answers, and we do not know for sure if the test was actually taken by that person in one session. Only one unpublished study (W. Johnson, personal communication, December 1, 2004) has investigated the issue of computer administration. The results showed that in computer-based testing situations, subjects had a significantly higher word count but ego levels remained unchanged.
Issues concerning the discriminant and convergent validity of the SCT have probably received the most attention recently. Discriminant or divergent evidence is concerned with the uniqueness of a test and its concept from other psychological constructs. Convergent validity is evidenced by high correlations with other factors or test results. Convergent validity or lack of discriminant validity may present a threat to validity because we may not be measuring what we intend; we may just be measuring an established variable and giving it a new name. At the same time, a variable might be conceptually intertwined with another in a meaningful manner, in which case we want to see convergent validity. Loevinger (1998a, 1998b) convincingly argued that in personality testing, correlations are commonly seen, and it may indeed be hard to find out if this presents a distortion or a meaningful relationship.
Loevinger (1998b) pointed out that the SCT correlates with verbosity, which is quantitative production, at about .31. This may not be spurious, because low ego levels are often indicated through short, bland responses, such as āEducationāboringā whereas higher rated stem completions need more words to express complexity. An example would be āEducationāis more than what you learn in school.ā The SCT correlates with education, socioeconomic status, and complexity of work, which has been shown to hold true across international samples. This is not surprising because education and social class relate to aspects of impulse control, goal orientation, and conscious preoccupations, which is exactly what the SCT is meant to assess.
The discriminant validity of the SCT regarding intelligence has been widely debated. Lubinski and Humphreys (1997) specifically argued that personality tests, such as the SCT, add very little to assessments of general aptitude and intelligence. Numerous studies have investigated the discriminant validity of the SCT in regard to intelligence. Cohn and Westenberg (2004) identified 42 such studies and performed a meta-analysis. These authors showed the correlation between the SCT and intelligence tests to be .31 across studies. Consequently, they argued that the discriminant validity is good. Loevinger (1998b) herself argued that almost all tests show some correlation with intelligence because it is indeed an aspect of personality functioning and influences professional aspirations and other aspects of development. Cohn and Westenberg also discussed the incremental validity, which addresses the question of whether a test allows for useful inferences that we could not arrive at without it. The authors controlled for intelligence and identified 16 studies that addressed this question. They concluded that the incremental validity varied significantly among different variables being assessed. Ninety-four percent of the studies reported significant relations between criterion variables and the SCT after intelligence is controlled for. Based on their research they rejected the claim by Lubinski and Humphreys that the SCT does not add anything significant to our understanding of personality.
Loevinger (1998b) discussed other potential threats to validity as well. First, there is the size of the sample. Although the original sample was based on only a few hundred completed tests, a few years later Loevinger and her associates made an effort to get in touch with all researchers who had used the test and requested the copies of the tests that they had scored. This led to a sample size of well over 1,000 tests. Second, there is the question of how representative the sample is. Loevinger emphasized repeatedly that the test is not based on a normative sample representing the whole population because her project team never had the resources to undertake a project that would allow for randomized sampling. Because many different researchers contributed, diverse social groups were represented. Although the original sample was strongly weighted toward women, later efforts compensated for this and special efforts were made to review the test items and the scoring manual with that concern in mind. Third, there is the issue of whether the sample presents a limited range. Loevinger stated that she made a special effort to include the research of psychologists, who had participants presenting the extremes at either end of the developmental spectrum, because the general population does not indeed fall into the middle range. She therefore included data from Harvard graduates at midlife as well as the prison population. In general, we can say that Loevinger, with the help of other researchers, accumulated impressive evidence for the validity of the SCT.