eBook - ePub

Developing and Validating Test Items

Name: Developing and Validating Test Items
ISBN: 9781136961977

Thomas M. Haladyna,

Michael C. Rodriguez,

446 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Developing and Validating Test Items

Thomas M. Haladyna,

Michael C. Rodriguez,

About this book

Since test items are the building blocks of any test, learning how to develop and validate test items has always been critical to the teaching-learning process. As they grow in importance and use, testing programs increasingly supplement the use of selected-response (multiple-choice) items with constructed-response formats. This trend is expected to continue. As a result, a new item writing book is needed, one that provides comprehensive coverage of both types of items and of the validity theory underlying them.

This book is an outgrowth of the author's previous book, Developing and Validating Multiple-Choice Test Items, 3e (Haladyna, 2004). That book achieved distinction as the leading source of guidance on creating and validating selected-response test items. Like its predecessor, the content of this new book is based on both an extensive review of the literature and on its author's long experience in the testing field. It is very timely in this era of burgeoning testing programs, especially when these items are delivered in a computer-based environment. Key features include …

Comprehensive and Flexible – No other book so thoroughly covers the field of test item development and its various applications.

Focus on Validity – Validity, the most important consideration in testing, is stressed throughout and is based on the Standards for Educational and Psychological Testing, currently under revision by AERA, APA, and NCME

Illustrative Examples – The book presents various selected and constructed response formats and uses many examples to illustrate correct and incorrect ways of writing items. Strategies for training item writers and developing large numbers of items using algorithms and other item-generating methods are also presented.

Based on Theory and Research – A comprehensive review and synthesis of existing research runs throughout the book and complements the expertise of its authors.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Education

Subtopic

Education General

A Foundation for Developing and Validating Test Items

Part I covers four important, interrelated concerns in item development and validation.

This first chapter provides definitions of basic terms and distinctions useful in identifying what is going to be measured. The first chapter also discusses validity and the validation process as it applies to item development. The second chapter presents the essential steps in item development and validation. The third chapter presents information on the role of content and cognitive demand in item development and validation. The fourth chapter presents a taxonomy of selected-response (SR) and constructed (CR) test item formats for certain types of content and cognitive demands.

The Role of Validity in Item Development

Overview

This chapter provides a conceptual basis for understanding the important role of validity in item development. First, basic terms are defined. Then the content of tests is differentiated. An argument-based approach to validity is presented that is consistent with current validity theory. The item development process and item validation are two related steps that are integral to item validity. The concept of item validity is applied throughout all chapters of this book.

Defining the Test Item

A test item is a device for obtaining information about a test taker’s domain of knowledge and skills or a domain of tasks that define a construct. Familiar constructs in education are reading, writing, speaking, and listening. Constructs also apply to professions: medicine, teaching, accountancy, nursing, and the like. Every test item has the same three components:

1. Instructions to the test taker,

2. Conditions for performance, and

3. A scoring rule.

A test item is the basic unit of observation in any test. The most fundamental distinction for the test item is whether the test taker chooses an answer (selected-response: SR) or creates an answer (constructed-response: CR). The SR format is often known as multiple-choice. The CR format also has many other names including open-ended, performance, authentic, and completion. This SR–CR distinction is the basis for the organization of chapters in this book. The response to any SR or CR item is scorable. Some items can be scored dichotomously, one for right and zero for wrong, or polytomously using a rating scale or some graded series of responses. Refined distinctions in item formats are presented in greater detail in chapter 4.

Thorndike (1967) advised item and test developers that the more effort we put into building better test items, the better the test is likely to be. To phrase it as to validity, the greater effort expended to improve the quality of test items in the item bank, the greater degree of validity we are likely to attain. As item development is a major step in test development, validity can be greatly affected by a sound, comprehensive effort to develop and validate test items.

Toward that end, we should develop each test item to represent a single type of content and a single type of cognitive behavior as accurately as is humanly possible. For a test item to measure multiple content and cognitive behaviors goes well beyond our ability to understand the meaning of a test taker’s response to such an item.

Defining the Test

A test is a measuring device intended numerically to describe the degree or amount of a construct under uniform, standardized conditions. Standardization is a very important idea when considering a test and the most important feature of a test is the validity of its test score interpretation and use. “Measurement procedures tend to control irrelevant sources of variability by standardizing the tasks to be performed, the conditions under which they are performed, and the criteria used to interpret the results” (Kane, 2006b, p. 17).

In educational achievement testing, most tests contain a single item or set of test items intended to measure a domain of knowledge or skills or a domain of tasks representing an ability. The single test item might be a writing prompt or a complex mathematics problem. Responses to a single test item or a collection of test items are scorable using complex scoring guides and highly trained raters. The use of scoring rules helps to create a test score that is based on the test taker’s responses to these test items. In this book, we are less concerned with tests and solely concerned with developing highly effective items and then assembling validity evidence for each item response’s valid interpretation and use. Readers are directed to the Handbook of Test Development (Downing & Haladyna, 2006) for comprehensive discussions of issues and steps in the test development process. The fourth edition of Educational Measurement (Brennan, 2006) also provides current treatments of many important issues in test development and validation.

What Do Tests and Test Items Measure?

In this section, two issues we face in the measurement of any cognitive ability are presented and discussed. The first is the dilemma provided when we fail to define a construct operationally that we want to measure. The second is a distinction between achievement and intelligence.

A construct is something definable that we want to measure. Constructs have characteristics that help define it. Another good way to make a construct clear is to list examples and non-examples. In educational and psychological testing, the most important concepts we measure include reading, writing, speaking, listening, mathematical problem-solving, scientific problem-solving, and critical thinking as applied in literature analysis and in social studies. Some concepts are subject-matter-based, for example language arts, mathematics, science, social studies, physical education, and English language proficiency. Professional competence is another type of concept that we often test for certification and licensure. Medicine, nursing, dentistry, accountancy, architecture, pharmacy, and teaching are all constructs of differing professional competence.

Operational Definitions and Constructs

Operational definitions are commonly agreed on by those responsible and most highly qualified for measuring the construct. In other words we have a consensus by highly qualified subject-matter experts (SMEs). In the Conduct of Inquiry, it was stated:

To each construct there corresponds a set of operations involved in its scientific use. To know these operations is to understand the construct as fully as science requires; without knowing them, we do not know what the scientific meaning of the construct is, not even whether it has scientific meaning. (Kaplan, 1963, p. 40)

With an operational definition, we have no surplus meaning or confusion about the construct. We can be very precise in the measurement of an operationally defined construct. We can eliminate or reduce random or systematic error when measuring any operationally defined construct. Instances of operationally defined constructs include time, volume, distance, height, speed, and weight. Each can be measured with great precision because the definition of each of these constructs is specific enough. Test development for any construct that is operationally defined is usually very easy.

However, many constructs in education and psychology are not amenable to operational definition. Validity theorists advise that the alternative strategy is one of defining and validating constructs. By doing so, we recognize that the construct is too complex to define operationally (Cronbach & Meehl, 1955; Kane 2006b; Kaplan, 1963; Messick, 1989). As previously noted, constructs include reading and writing. Also, each profession or specialty in life is a construct. For example baseball ability, financial analysis, quilt-making, and dentistry are examples of constructs that have usefulness in society. Each construct is very complex. Each construct requires the use of knowledge and skills in complex ways. Often we can conceive of each construct as to a domain of tasks performed.

For every construct, we can identify some aspects that can be operationally defined. For instance, in writing, we have spelling, punctuation, and grammatical usage that is operationally defined and easily measured. In mathematics, computation can be operationally defined. In most professions, we can identify sets of tasks that are either performed or not performed. Each of these tasks is operationally defined. However, these examples of operational definition within a construct represent the minority of tasks that comprise the construct. We are still limited to construct measurement and the problems it brings due to the construct’s complexity and the need for expert judgment to evaluate performance.

Because constructs are complex and abstractly defined, we employ a strategy known as construct validation. This investigative process is discussed later in this chapter and used throughout this book. The investigation involves many important steps, and it leads to a conclusion about validity.

Achievement and Intelligence

The context for this book is the measuring of achievement that is the goal of instruction or training. Most testing programs are designed for elementary, secondary, college and graduate school education. Another large area of testing involves certifying professions, such as medicine, dentistry, accountancy, and the like.

Achievement is usually thought of as planned changes in cognitive behavior of students that result from instruction or training, although certainly achievement is possible due to factors outside instruction or training. All achievement can be defined in terms of content. This content can be represented in two ways. The first is a domain of knowledge and skills. The second is as a cognitive ability for which there is a domain of tasks to be performed. Chapter 3 refines the distinctions between these two types of content. However, introducing these distinctions in the realm of achievement is important as we consider item development and validation because it involves validity.

Knowledge is a fundamental type of learning that include facts, concepts, principles, and procedures that can be memorized or understood. Most student learning consists of knowledge. Knowledge is often organized as a domain that consists of an organized set of instructional objectives/content standards.

A skill is a learned, observable, performed act. A skill is easily recognize...

Cover
Half Title
Title Page
Copyright Page
Contents
Preface
Acknowledgments
Part I: A Foundation for Developing and Validating Test Items
Part II: Developing Selected-Response Test Items
Part III: Developing Constructed-Response Test Items
Part IV: Unique Applications for Selected-Response and Constructed-Response Formats
Part V: Validity Evidence Arising From Item Development and Item Response Validation
Part VI: The Future of Item Development and Validation
References
Author Index
Subject Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Developing and Validating Test Items by Thomas M. Haladyna,Michael C. Rodriguez in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions