Part I
Development of Polytomous IRT Models
CHAPTER 1
New Perspectives and Applications
Remo Ostini
Healthy Communities Research Centre, University of Queensland
Michael L. Nering
Measured Progress
Polytomous item response theory (IRT) models are mathematical models used to help us understand the interaction between examinees and test questions where the test questions have various response categories. These test questions are not scored in a simple dichotomous manner (i.e., correct/incorrect); rather, they are scored in way that reflects the particular score category that an examinee has achieved, been classified into, or selected (e.g., a score point of 2 on an item that is scored from 0 to 4, or selecting āsomewhat agreeā on a survey).
Polytomous items have become omnipresent in the educational and psychological testing community because they offer a much richer testing experience for the examinee while also providing more psychometric information about the construct being measured. There are many terms used to describe polytomous items (e.g., constructed response items, survey items), and polytomous items can take on various forms (e.g., writing prompts, Likert type items). Essentially, polytomous IRT models can be used for any test question where there are several response categories available.
The development of measurement models that are specifically designed around polytomous items is complex, spans several decades, and involves a variety of researchers and perspectives. In this book we intend to tell the story behind the development of polytomous IRT models, explain how model evaluation can be done, and provide some concrete examples of work that can be done with polytomous IRT models. Our goal in this text is to give the reader a broad understanding of these models and how they might be used for research and operational purposes.
Who Is This Book For?
This book is intended for anyone that wants to learn more about polytomous IRT models. Many of the concepts discussed in this book are technical in nature, and will require an understanding of measurement theory and some familiarity with dichotomous IRT models. There are several excellent sources for learning more about measurement generally (Allen & Yen, 1979; Anastasi, 1988; Crocker & Algina, 1986; Cronbach, 1990) and dichotomous IRT models specifically (e.g., Embretson & Reise, 2000; Rogers, Swaminathan, & Hambleton, 1991). Throughout the book there are numerous references that are valuable resources for those interested in learning more about polytomous IRT.
The Approach of This Book and Its Goals
This handbook is designed to bring together the major polytomous IRT models in a way that helps both students and practitioners of social science measurement understand where these state-of-the-art models come from, how they work, and how they can be used. As Hambleton, van der Linden, and Wells (Chapter 2) point out, the handbook is not an exhaustive catalogue of all polytomous IRT models, but the most commonly used models are presented in a comprehensive manner.
It speaks to the maturation of this field that there are now models that appear to have fallen by the wayside despite what could be considered desirable functional properties. Rostās (1988) successive intervals model might be an example of such a model in that very little research has been focused on it. Polytomous IRT also has its share of obscure models that served their purpose as the field was finding its feet but which have been supplanted by more flexible models (e.g., Andrichās (1982) dispersion model has given way to the partial credit model) or by mathematically more tractable models (e.g., Samejimaās (1969) normal ogive model is more difficult to use than her logistic model).
Perhaps the most prominent model to not receive separate treatment in this handbook is the generalized partial credit model (GPCM; Muraki, 1992). Fortunately, the structure and functioning of the model are well covered in a number of places in this book, including Hambleton and colleaguesā survey of the major polytomous models (Chapter 2) and Kim, Harris, and Kolenās exposition of equating methods (Chapter 11).
Rather than focus on an exhaustive coverage of available models, this handbook tries to make polytomous IRT more accessible to a wider range of potential users in two ways. First, providing material on the origins and development of the most influential models brings together the historical and conceptual setting for those models that are not easily found elsewhere. The appendix to Thissen, Cai, and Bockās chapter (Chapter 3) is an example of previously unpublished material on the development context for the nominal model.
Second, this handbook addresses important issues around using the models, including the challenge of evaluating model functioning (Bock & Gibbons, Chapter 7; Glas, Chapter 8) and applying the models in computerized adaptive testing (CAT; Boyd, Dodd, & Choi, Chapter 10), equating test scores derived from polytomous models (Kim et al., Chapter 11), and using a polytomous IRT model to investigate examinee test-taking strategies (Huang & Mislevy, Chapter 9).
Part 1: Development
In this book we attempt to bring together a collection of different polytomous IRT models with the story of the development of each model told by the people whose work is most closely associated with the models. We begin with a chapter by Hambleton, van der Linden, and Wells (Chapter 2), which broadly outlines various influential polytomous models, introducing their mathematical form and providing some of the common historical setting for the models. Introducing a range of models in this consistent way forms a solid basis for delving into the more complex development and measurement issues addressed in later chapters. Hambleton and colleagues also introduce models that are not addressed in later model development chapters (e.g., generalized partial credit model, nonparametric IRT models) and touch on parameter estimation issues and other challenges facing the field.
Thissen and Cai (Chapter 3) provide a succinct introduction to the nominal categories item response model (often known in other places as the nominal response model). They neatly describe derivations and alternative parameterizations of the model as well as showing various applications of the model. Saving the best for last, Thissen and Cai provide a completely new parameterization for the nominal model. This new parameterization builds on 30 years of experience to represent the model in a manner that facilitates extensions of the model and simplifies the implementation of estimation algorithms for the model. The chapter closes by coming full circle with a special contribution by R. Darrell Bock, which provides previously unpublished insight into the background to t...