eBook - ePub

Handbook of Item Response Theory Modeling

Name: Handbook of Item Response Theory Modeling
Author: Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki

Applications to Typical Performance Assessment

Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki

Compartir libro

466 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Handbook of Item Response Theory Modeling

Applications to Typical Performance Assessment

Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Item response theory (IRT) has moved beyond the confines of educational measurement into assessment domains such as personality, psychopathology, and patient-reported outcomes. Classic and emerging IRT methods and applications that are revolutionizing psychological measurement, particularly for health assessments used to demonstrate treatment effectiveness, are reviewed in this new volume. World renowned contributors present the latest research and methodologies about these models along with their applications and related challenges. Examples using real data, some from NIH-PROMIS, show how to apply these models in actual research situations. Chapters review fundamental issues of IRT, modern estimation methods, testing assumptions, evaluating fit, item banking, scoring in multidimensional models, and advanced IRT methods. New multidimensional models are provided along with suggestions for deciding among the family of IRT models available. Each chapter provides an introduction, describes state-of-the art research methods, demonstrates an application, and provides a summary. The book addresses the most critical IRT conceptual and statistical issues confronting researchers and advanced students in psychology, education, and medicine today. Although the chapters highlight health outcomes data the issues addressed are relevant to any content domain.

The book addresses:

IRT models applied to non-educational data especially patient reported outcomes

Differences between cognitive and non-cognitive constructs and the challenges these bring to modeling.

The application of multidimensional IRT models designed to capture typical performance data.

Cutting-edge methods for deriving a single latent dimension from multidimensional data

A new model designed for the measurement of constructs that are defined on one end of a continuum such as substance abuse

Scoring individuals under different multidimensional IRT models and item banking for patient-reported health outcomes

How to evaluate measurement invariance, diagnose problems with response categories, and assess growth and change.

Part 1 reviews fundamental topics such as assumption testing, parameter estimation, and the assessment of model and person fit. New, emerging, and classic IRT models including modeling multidimensional data and the use of new IRT models in typical performance measurement contexts are examined in Part 2. Part 3 reviews the major applications of IRT models such as scoring, item banking for patient-reported health outcomes, evaluating measurement invariance, linking scales to a common metric, and measuring growth and change. The book concludes with a look at future IRT applications in health outcomes measurement. The book summarizes the latest advances and critiques foundational topics such a multidimensionality, assessment of fit, handling non-normality, as well as applied topics such as differential item functioning and multidimensional linking.

Intended for researchers, advanced students, and practitioners in psychology, education, and medicine interested in applying IRT methods, this book also serves as a text in advanced graduate courses on IRT or measurement. Familiarity with factor analysis, latent variables, IRT, and basic measurement theory is assumed.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Handbook of Item Response Theory Modeling un PDF/ePUB en línea?

Sí, puedes acceder a Handbook of Item Response Theory Modeling de Steven P. Reise, Dennis A. Revicki, Steven P. Reise, Dennis A. Revicki en formato PDF o ePUB, así como a otros libros populares de Psychology y Research & Methodology in Psychology. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Routledge

Año

2014

ISBN

9781317565697

Edición

Categoría

Psychology

Categoría

Research & Methodology in Psychology

Part I
Fundamental Issues in Item Response Theory

1
Introduction

Age-Old Problems and Modern Solutions

Steven P. Reise and Dennis A. Revicki

The statistical foundation of item response theory (IRT) is often traced back to the seminal work of Lord, Novick, and Birnbaum (1968). The subsequent development, research, and application of IRT models and related methods link directly to the need of large-scale testing companies, such as the Educational Testing Service, to solve statistical as well as practical problems in educational assessment (i.e., the measurement of aptitude, achievement, and ability constructs). Daunting problems in this include the challenge of administering different test items to demographically diverse individuals across multiple years, while maintaining scores that are comparable on the same scale. This test score comparability problem traditionally has been addressed with “test-score equating” methods, but now more routinely, IRT-based “linking” strategies are used (see Chapter 19).

The application of IRT models and methods in educational assessment is now commonplace (e.g., see most any recent issue of the Journal of Educational Measurement), especially for large-scale testing firms that employ on their research staff dozens of world-class psychometricians, content experts, and item writers. The application of IRT models, and related statistical methods in the fields of personality, psychopathology, patient-reported outcomes (PRO), and health-related quality-of-life (HRQOL) measurement, in contrast, has only recently begun to proliferate in research journals. In these noneducational or “typical performance” domains, the application of IRT has gained popularity for much the same reasons as in large-scale educational assessment; that is, to solve practical and technical problems in measurement.

The National Institutes of Health (NIH) Patient Reported Outcome Measurement Information System (PROMIS^®), for example, has developed multiple item banks for measuring various physical, mental, and social health domains (Cella et al., 2007; Cella et al., 2010). Similarly, the Quality of Life in Neurological Disorders (www.neuroqol.org) and NIH Toolbox (www.nihtoolbox.org) have also employed IRT methods of scale development and item analysis. One of the chief motivations underlying the application of IRT methods in these projects was to solve a long-standing and well-recognized problem in health outcomes research; namely, that for any important construct, there are typically half a dozen or so competing measures of unknown quality and questionable validity. This chaotic measurement situation, with dozens of researchers studying the same phenomena using different measurement tools, fails to promote good research and inhibits the cumulative aggregation of research results.

Large-scale IRT application projects, such as PROMIS^®, have raised awareness not only of the technical and practical challenges of applying IRT models to psychological or PRO data, in general, but also has uncovered the many and varied special problems and concerns that arise in applying IRT outside of educational assessment (see also Reise & Waller, 2009). We will highlight several of these critical challenges later in this chapter to set a context for the present volume. Before doing so, however, we note that thus far, standard IRT models and methods have been imported into noneducational measurement contexts, and essentially without modification. In other words, there has been little in the way of “new models” or “new statistical methods” uniquely appropriate for PRO or any other type of noneducational data (but see Chapter 13).

This equalitarian—the same IRT models and methods should be used for all constructs, educational or PRO—was perhaps critical in early stages of IRT exploration and application in new domains. Inevitably, we believe, further progress will require new IRT-based psychometric approaches particularly tailored to meet measurement challenges in noneducational assessment. We will expand on this in the final chapter. For now, prior to previewing the chapters in this edited volume, in the following section, we briefly discuss some critical differences between educational and noneducational constructs, data, and assessment contexts, as these relate to the application of IRT models. We argue that although there are fundamental technical issues in applying IRT to any domain (e.g., dimensionality issues, assessing model to data fit), unique challenges arise when applying IRT to noneducational data due to the nature of the constructs (e.g., limited conceptual breadth, questionable applicability across the entire population), and item response data (e.g., non-normal latent trait distribution issues).

Educational Versus Noneducational Measurement

It is well recognized that psychological constructs, both cognitive and noncognitive, can be conceptualized as being hierarchically arranged, from very general to middle level, conceptually narrow to specific behaviors (Clark & Watson, 1995).¹ Since Loevinger (1957), it has also been well recognized (although not necessarily realized in practice by scale developers) that the position of a construct in this hierarchy has profound implications for all aspects of scale development, psychometric analyses, and ultimately validation of test score inferences.

Almost by definition, measures of broad bandwidth constructs (intelligence, verbal ability, negative affectivity, general distress, overall life satisfaction, or QOL) must have heterogeneous item content to capture the diversity of trait manifestations.² In turn, item intercorrelations, item-test correlations, and factor-loadings/IRT slopes are expected to be modest in magnitude, with low communality. Moreover, resulting factor structures may (must?) be multidimensional to some degree, perhaps with a strong general factor and several so-called group or specific factors corresponding to more content-homogeneous domains (see Chapter 2).

On the other hand, just the opposite psychometric properties would be expected for measures of conceptually narrow constructs (mathematics self-efficacy, primary narcissism, fatigue, pain interference, germ phobia). That is, in this latter context, the content diversity of trait manifestation is very limited (by definition of the construct), and as a consequence, item content is homogeneous with the conceptual distance between the item content and the latent trait being slim. In turn, this can result in very high item intercorrelations, item-test correlations, and factor-loadings/IRT slopes. In factor analyses, essential unidimensionality would be the expectation, as would high item communalities. Finally, in contrast to broadband measures, where local independence violations are typically caused by clusters of content-similar items, in narrowband measures, local independence violations are typically caused by having the same item content repeated over and over with slight variation (e.g., “I have problems concentrating,” “I find it hard to concentrate,” “I lose my concentration while driving,” “It is sometimes hard for me to concentrate at work”).

In our judgment, applications of IRT in educational measurement have tended toward the more broadband constructs, such as verbal and quantitative aptitude, or comprehensive licensure testing contexts (which also involve competencies across a heterogeneous skill domain). In contrast, we argue that with few exceptions, applications of IRT in noneducational measurement have primarily been with constructs that are relatively conceptually narrow. As a consequence, IRT applications in noneducational measurement contexts present some unique challenges, and the results of such applications can be markedly different from a typical IRT application in education.

For illustration, Embretson and Reise (in preparation) report on an analysis of the PROMIS^® anger item set (see Pilkonis et al., 2010), a set of 29 items rated on a 1 to 5 response scale. Anger is arguably conceptually narrow because there simply are not that many ways of being angry (especially when rated within the past seven days); that is, the potential pool of item content is very limited, unlike a construct, say, such as spelling or reading comprehension where the pool of items is virtually inexhaustible. Accordingly, alpha was 0.96, and an eigenvalue ratio of around 15 to 1, suggesting unidimensionality, or at least a strong common factor. Fitting a unidimensional confirmatory factor analysis resulted in an “acceptable” fit by conventional standards. However, univariate and multivariate Lagrange tests indicated 407 and 157 correlated residuals needed to be estimated (set free), respectively. This unambiguous evidence against the data meeting the unidimensionality/local independence assumption was not due to the anger data being in any real sense of the term “multidimensional,” with substantively interpretable distinct factors, but rather as having many sizeable correlated residuals (violations...