Introduction
The application of Rasch measurement in language assessment1 has expanded exponentially in the past two decades, as evidenced by the rapidly growing number of studies published in the leading journals in the field (e.g., Bachman, 2000; McNamara & Knoch, 2012). However, the growing popularity and extensive acceptance of the Rasch model among mainstream language assessment researchers have not been without controversy. The debates over the application of the Rasch model in language assessment research were described by McNamara and Knoch (2012) as âthe Rasch warsâ, which were fought on several fronts for a lengthy period of time, up to the 1990s (see also McNamara, 1996).
One of the most heated debates surrounding the application of the Rasch model in language assessment was whether Rasch theory was appropriate for the analysis of language test data. Those who voiced their objections argued that the concept of unidimensionality in Rasch theory could not hold for language test constructs (e.g., Hamp-Lyons, 1989; Skehan, 1989). They argued that from an applied linguistic perspective, any language test inevitably entailed the assessment of multiple dimensions rather than one single dimension of language ability. For example, an academic English listening test, as the critics of the Rasch model would reason, taps into several different aspects of a test candidateâs listening proficiency or academic listening skills (e.g., Buck, 2001), thereby making the Rasch model, a priori, inappropriate for investigating the psychometric features of such tests. Those vigorous objections notwithstanding, McNamara (1996) argued that the Rasch model provides a powerful means of examining whether multiple dimensions actually exist in any dataset; consequently, objections to and reservations against the use of the Rasch model were ascribed primarily to a misunderstanding of the empirical notion of unidimensionality in Rasch measurement (McNamara, 1996).
Local independence of test items is another Rasch measurement requirement that should be addressed in analysis. This principle requires that test candidatesâ responses to one item should not be affected by or dependent on their responses to other items in the test. The Rasch model also provides effective means for ascertaining the extent to which such a principle holds true for any particular dataset.
In this chapter, we review the concepts of unidimensionality and local independence and show how to address them empirically. Then we review studies in the field of language assessment that have included examination of these Rasch measurement properties. Finally, we use a dataset from a local English listening test and the Rasch software Winsteps (Linacre, 2017a) to illustrate how unidimensionality and local independence should be investigated. Rasch measurement can be considered as comprising a family of models (McNamara, 1996). The focus of this chapter is Rasch analysis of dichotomous and polytomous data, including the basic Rasch model, the rating scale model, and the partial credit model.
Unidimensionality and local independence
Interpretations of the âdimensionâ concept seem to vary across research fields. In mathematics, for example, dimension refers to âmeasure in one directionâ (Merriam-Webster Dictionary).2 Following this definition, a line has one dimension (length); a square has two dimensions (i.e., length and width); and a cube has three dimensions (i.e., length, width, and height). In Rasch measurement, dimension refers to any one single underlying attribute that is not directly observable (i.e., is latent). Dimension might be used interchangeably with other terms such as âlatent traitâ or âconstructâ. Examples of such dimensions include listening ability and essay writing ability in language assessment, or extroversion, anxiety, and cognitive development in psychological research.
The principle of unidimensionality in Rasch measurement requires that each individual human attribute be measured one at a time (Bond & Fox, 2015). An example from McNamara (1996) assists in elucidating the concept of unidimensionality. Suppose a group of students, including both native and non-native speakers of English, take a mathematics test in which questions are presented in English. Although the stated purpose of the test is to assess studentsâ mathematical ability, it is likely that their English language proficiency level will have a differential impact on their performance on the test items which are not presented exclusively as mathematical symbols. Some non-native English speakers, for example, might struggle with understanding a âword problemâ presented in English due to their low English proficiency level, even though they have the required mathematical ability to construct and solve the relevant mathematical computations. In this scenario, it is reasonable to argue the test is not strictly unidimensional for this sample because it taps both studentsâ ability to solve mathematical problems and their ability to understand English.
The controversy over the concept of unidimensionality has persisted for quite a long time in the field of language assessment. Issues concerning dimensionality largely underpinned the âRasch warsâ, and impeded, to a considerable extent, the broader application of Rasch measurement to language assessment research before the 1990s (McNamara & Knoch, 2012). Language ability, according to applied linguists, was always a complex construct that could not be captured in any one dimension (see McNamara, 1996; McNamara & Knoch, 2012, for details of the debates). Take second language (L2) academic listening ability as an example. This construct entails both L2 listening proficiency and a repertoire of hypothesized enabling skills (e.g., decoding aural input, inferencing, understanding unfamiliar words based on context, constructing macro-propositions) (see Buck, 2001). Therefore, it was regarded as rather self-evident that oneâs L2 academic listening ability could not be explained by a single underlying dimension.
However, as argued by McNamara (1996), the notion of âunidimensionalityâ has two interpretations. In psychology, unidimensionality is used to refer to a single underlying construct or trait; and in measurement, it means âa single underlying measurement dimensionâ or âa single underlying pattern of scores in the data matrixâ (McNamara, 1996, p. 271). The distinction between psychological (conceptual) and psychometric (empirical) unidimensionality is crucial for understanding this notion in Rasch measurement, as well as articulating why Rasch measurement can be applied fruitfully to analyzing language assessment data. Fulfillment of the requirement of unidimensionality is a matter of degree, not just a matter of kind, as Bejar (1983, p. 31) pointed out:
Unidimensionality does not imply that performance on items is due to a single psychological process. In fact, a variety of psychological processes are involved in responding to a set of test items. However, as long as they function in unison â that is, performance on each item is affected by the same process and in the same form â unidimensionality will hold.
The Rasch model, as will be demonstrated in this chapter, provides a powerful means of analyzing psychometric dimensionality in the data.
In addition, local independence requires that test candidatesâ responses to any item should not be affected by their responses (i.e., success or failure) to other items in that test. In language assessment, however, such a principle might often be violated. For example, it is not unusual for language test developers to use several items that share the same prompt in listening or reading comprehension tests. Such practice can ...