Automatic item generation (AIG) represents a relatively new and unique research area where specific cognitive and psychometric theories are applied to test construction practices for the purpose of producing test items using technology. The purpose of this book is to bring researchers and practitioners up-to-date on the growing body of research on AIG by organizing in one volume what is currently known about this research area. Part I begins with an overview of the concepts and topics necessary for understanding AIG by focusing on both its history and current applications. Part II presents two theoretical frameworks and practical applications of these frameworks in the production of item generation. Part III summarizes the psychological and substantive characteristics of generated items . Part IV concludes with a discussion of the statistical models that can be used to estimate the item characteristics of generated items, features one future application of AIG, describes the current technologies used for AIG, and also highlights the unresolved issues that must be addressed as AIG continues to mature as a research area.

Comprehensive – The book provides a comprehensive analysis of both the theoretical concepts that define automatic item generation and the practical considerations required to implement these concepts.

Varied Applications – Readers are provided with novel applications in diverse content areas (e.g., science and reading comprehension) that range across all educational levels – elementary through university.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Initial Considerations for Automatic Item Generation

Automatic Item Generation

An Introduction

Mark J. Gierl and Thomas M. Haladyna

A major motivation for this book is to improve the assessment of student learning, regardless of whether the context is K-12 education, higher education, professional education, or training. The Standards for Educational and Psychological Testing (AERA, APA, & NMCE, 1999) described assessment in this way:

Any systematic method of obtaining information from tests and other sources, used to draw inferences about characteristics of people, objects, or programs. (p. 112)

A test or exam is one of the most important sources of information for the assessment of student learning. Our book is devoted to improving assessment of student learning through the design, development, and validation of test items of superior quality that are used for tests that improve assessment.

Although automatic item generation (AIG) has a relatively short history, it holds much promise for many exciting, innovative, and valuable technologies for item development and validation. The chapters in this book provide a coordinated and comprehensive account of the current state of this emerging science. Two concepts of value to readers in this volume are the content and cognitive demand of achievement constructs. Every test item has a content designation and an intended cognitive demand. The content of an achievement construct is generally considered as existing in one of two types of domains. The first type of domain consists of using knowledge, skills, and strategies in complex ways. A review of national, state, and school district content standards for reading, writing, speaking, and listening, and mathematical and scientific problem solving provides good examples of this first type of domain. The second type of domain focuses on a single cognitive ability. Abilities are complex mental structures that grow slowly (Lohman, 1993; Messick, 1984; Sternberg, 1998). For credential testing in the professions, the domain consists of tasks performed in that profession (Raymond & Neustel, 2006). A domain, therefore, represents one of these cognitive abilities. Kane (2006a, 2006b) refers to these tasks as existing in a target domain. Test items are intended to model the tasks found in the target domain. The reference to knowledge, skills, and ability refers to either of these two types of constructs as they exist in current, modern measurement theory and practice.

Cognitive demand refers to the mental complexity in performing a task. The task might be a test item where the learner selects among choices or the learner creates a response to an item, question, or command. One critic referred to the classification of cognitive demand as a conceptual “swamp,” due to the perfusion of terms used to describe various types of higher-level thinking (Lewis & Smith, 1993). More recently, Haladyna and Rodriguez (in press) listed 25 different terms signifying higher-level thinking. For our purpose no taxonomy of higher-level thinking has been validated or widely accepted on scientific grounds, and none is advocated. However, several contributors of this book describe useful methods for uncovering ways to measure complex, cognitive behaviors via AIG.

Our introductory chapter has two main sections. The first section presents a context for change in educational measurement that features AIG. The second section provides a brief summary of the chapters in this book, as well as highlighting their interrelationships.

A Context for Automatic Item Generation

The Greek philosopher Heraclitus (c. 535 BC–475BC) provided some foresight into the state of 21st century educational measurement when he claimed that the only constant was change. The evolution of educational measurement—where interdisciplinary forces are stemming from the fusion of the cognitive sciences, statistical theories of test scores, professional education and certification, educational psychology, operations research, educational technology, and computing science—is occurring rapidly. These interdisciplinary forces are also creating exciting new opportunities for both theoretical and practical changes. Although many different examples could be cited, the state of change is most clearly apparent in the areas of computer-based testing, test design, and cognitive diagnostic assessment. These three examples are noteworthy as they relate to the topics described in this book, because changes in computerized testing, test design, and diagnostic testing will directly affect the principles and practices that guide the design and development of test items.

Example #1: Computer-Based Testing

Computer-based testing, our first example, is dramatically changing educational measurement research and practice because our current test administration procedures are merging with the growing popularity of digital media, along with the explosion in internet use to create the foundation for new types of tests and testing resources. As a historical development, this transition from paper-to computer-based testing has been occurring for some time. Considerable groundwork for this transition can be traced to the early research, development, and implementation efforts focused on computerizing and adaptively administering the Armed Services Vocational Aptitude Battery, beginning in the 1960s (see Sands, Waters, & McBride, 1997). A computer-adaptive test is a paperless test administered with a computer, using a testing model that implements a process of selecting and administering items, scoring the examinee’s responses, and updating the examinee’s ability estimate after each item is administered. This process of selecting new items based on the examinee’s responses to the previously administered items is continued until a stopping rule is satisfied where there is considerable confidence in the accuracy of the score. The pioneers and early proponents of computer-adaptive testing were motivated by the potential benefits of this testing approach, which included shortened tests without a loss of measurement precision, enhanced score reliability particularly for low-and high-ability examinees, improved test security, testing on-demand, and immediate test scoring and reporting. The introduction and rapid expansion of the internet has enable many recent innovations in computerized testing. Examples include computer-adaptive multistage testing (Luecht, 1998; Luecht & Nungester, 1998; see also Luecht, this volume), linear on-the-fly testing (Folk & Smith, 2002), testlet-based computer adaptive testing (Wainer & Kiely, 1987; Wainer & Lewis, 1990), and computerized mastery testing (Lewis & Sheehan, 1990). Now, many educational tests which were once given in a paper format are now administered by computer via the internet. Many popular and well-known tests can be cited as examples, including the Graduate Management Admission Test, the Graduate Record Examination, the Test of English as a Foreign Language, the Medical Council of Canada Qualifying Exam Part I, and the American Institute of Certified Public Accountants Uniform CPA Examination. Education Week’s 2009 Technology Counts also reported that almost half the U.S. states now administer some form of internet-based computerized educational test.

Internet-based computerized tests offer many advantages to students and educators, as compared to more traditional paper-based tests. For instance, computers enable the development of innovative item types and alternative item formats (Sireci & Zenisky, 2006; Zenisky & Sireci, 2002); items on computer-based tests can be scored immediately, thereby providing examinees with instant feedback (Drasgow & Mattern, 2006); computers permit continuous testing and testing on-demand (van der Linden & Glas, 2010). But possibly the most important advantage of computer-based testing is that it allows testing agencies to measure more complex performances by integrating test items and digital media to substantially improve the measurement of complex thinking (Bartram, 2006; Zenisky & Sireci, 2002).

The advent of computer-based, internet testing has also raised new challenges, particularly in the area of item development. Large numbers of items are needed to develop the item banks necessary for computerized testing because items are continuously administered and, therefore, exposed. As a result, these item banks need frequent replenishing in order to minimize item exposure and maintain test security. Unfortunately, traditional item development content requires experts to use test specifications, and item-writing guides to author each item. This process is very expensive. Rudner (2010) estimated that the cost of developing one operational item using the current approach in a high-stakes testing program can range from $1,500 to $2,000 per item. It is not hard to see that, at this price, the cost for developing a large item bank becomes prohibitive. Breithaupt, Ariel, and Hare (2010) recently claimed that a high-stakes 40-item computer adaptive test with two administrations per year would require, at minimum, a bank containing 2,000 items. Combined with Rudner’s per item estimate, this requirement would translate into a cost ranging from $3,000,000 to $4,000,000 for the item bank alone. Part of this cost stems from the need to hire subject-matter experts to develop test items. When large numbers of items are required, more subject-matter experts are needed. Another part of this cost is rooted in the quality-control outcomes. Because the cognitive item structure is seldom validated and the determinants of item difficulty are poorly understood, all new test items must be field tested prior to operational use so that their psychometric properties can be documented. Of the items that are field tested, many do not perform as intended and, therefore, must be either revised or discarded. This outcome further contributes to the cost of item development. Haladyna (1994) stated, for example, that as many as 40% of expertly created items fail to perform as intended during field testing, leading to large numbers of items being either revised or discarded. In short, agencies that adopt computer-based testing are faced with the daunting task of creating thousands of new and expensive items for their testing programs. To help address this important task, the principles and practices that guide AIG are presented in this book as an alternative method for producing operational test items.

Example #2: Test Design

Although the procedures and practices for test design and development of items for traditional paper-and-pencil testing are well established (see Downing & Haladyna, 2006; Haladyna, 2004; Schmeiser & Welch, 2006), advances in computer technology are fostering new approaches for test design (Drasgow, Luecht, & Bennett, 2006; Leighton & Gierl, 2007a; Mislevy, 2006). Prominent new test design approaches that differ from more traditional approaches are emerging, including the cognitive design system (Embretson, 1998), evidence-centered design (Mislevy, Steinberg, & Almond, 2003; Mislevy & Riconscente, 2006), and assessment engineering (Luecht, 2006, 2007, 2011). Although the new approaches to test design differ in important ways from one another, these approaches are united by a view that the science of educational assessment will prevail to guide test design, development, administration, scoring, and reporting practices. We highlight the key features in one of these approaches, assessment engineering (AE) (Luecht, Chapter 5, this volume). AE is an innovative approach to measurement practice where engineering-based principles and technology-enhanced processes are used to direct the design and development of tests as well as the analysis, scoring, and reporting of test results. This design approach begins by defining the construct of interest using specific, empirically derived construct maps and cognitively based evidence models. These maps and models outline the knowledge and skills that examinees need to master in order to perform tasks or solve problems in the domain of interest. Next, task models are created to produce replicable assessment resources. A task model specifies a class of tasks by outlining the shared knowledge and skills required to solve any type of task in that class. Templates are then created to produce test items with predictable difficulty that measure the content for a specific task model. Finally, a statistical model is used for examinee response d...

Cover Page
Title Page
Copyright Page
Contents
List of Figures and Tables
Acknowledgments
PART I: Initial Considerations for Automatic Item Generation
PART II: Connecting Theory and Practice in Automatic Item Generation
PART III: Psychological Foundations for Automatic Item Generation
PART IV: Technical Developments in Automatic Item Generation
Author Index
Subject Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Automatic Item Generation by Mark J Gierl, Thomas M. Haladyna, Mark J Gierl,Thomas M. Haladyna in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions