eBook - ePub

Technology and Testing

Name: Technology and Testing
ISBN: 9781317975885

Improving Educational and Psychological Measurement

Fritz Drasgow,

356 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Technology and Testing

Improving Educational and Psychological Measurement

Fritz Drasgow,

About this book

From early answer sheets filled in with number 2 pencils, to tests administered by mainframe computers, to assessments wholly constructed by computers, it is clear that technology is changing the field of educational and psychological measurement. The numerous and rapid advances have immediate impact on test creators, assessment professionals, and those who implement and analyze assessments. This comprehensive new volume brings together leading experts on the issues posed by technological applications in testing, with chapters on game-based assessment, testing with simulations, video assessment, computerized test development, large-scale test delivery, model choice, validity, and error issues.

Including an overview of existing literature and ground-breaking research, each chapter considers the technological, practical, and ethical considerations of this rapidly-changing area. Ideal for researchers and professionals in testing and assessment, Technology and Testing provides a critical and in-depth look at one of the most pressing topics in educational testing today.

The Open Access version of this book, available at http://www.taylorfrancis.com, has been made available under a Creative Commons Attribution-Non Commercial-No Derivatives 4.0 license.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Routledge

Year

2015

Print ISBN

9780415717151

eBook ISBN

9781317975885

Edition

Topic

Education

Subtopic

Education General

1 Managing Ongoing Changes to the Test

Agile Strategies for Continuous Innovation

Cynthia G. Parshall and Robin A. Guille

Introduction

When an exam program is delivered via computer, a number of new measurement approaches are possible. These changes include a wide range of novel item types, such as the hot spot item or an item with an audio clip. Beyond these relatively modest innovations lie extensive possibilities, up to and including computerized simulations. The term innovative item type is frequently used as an overarching designation for any item format featuring these types of changes (Parshall, Spray, Kalohn, & Davey, 2002). The primary benefit that these new item types offer is the potential to improve measurement. When they are thoughtfully designed and developed, novel assessment formats can increase coverage of the test construct, and they can increase measurement of important cognitive processes (Parshall & Harmes, 2009). A further advantage provided by some innovations is the opportunity to expand the response space and collect a wider range of candidate behaviors (e.g., DiCerbo, 2004; Rupp et al., 2012).

However, the potential benefits offered by innovative item types are not guaranteed. Furthermore, to successfully add even a single new item type to an exam may require substantial effort. When an exam program elects to add several innovations, the costs, complexities, and risks may be even higher. Part of the challenge in adding innovative item types is that so much about them is new to testing organization staff and stakeholders. And while the standard approaches for processes and procedures serve an exam program well in the development of traditional item types, it fails to meet the needs that arise when designing new item types. A more flexible approach is needed in these cases, ideally one that provides for “experimental innovation” (Sims, 2011), in which solutions are built up over time, as learning occurs. Looking to the future, a likely additional challenge with test innovations is that the measurement field and all aspects of technology are going to continue to advance. Testing organizations may need to begin thinking of innovation and change as an ongoing, continuous element that needs to be addressed.

The research and development team at the American Board of Internal Medicine (ABIM) sought a strategic approach that would help them manage the task of continuous change in their exam programs. The methods presented in this chapter enable their goal for a strategic and sustainable process. The heart of the process is an Agile implementation philosophy (Beck et al., 2001) coupled with a semistructured rollout plan.

These approaches, individually and in combination, are presented in this chapter as useful strategies for managing ongoing assessment innovation. They are also illustrated through a case study based on one of ABIM’s recent innovations. It is hoped that these methods will also be useful to other organizations that anticipate the need to strategically manage continuous innovations.

Background on Innovative Item Types

The primary reason for including innovative item types on an assessment is to improve the quality of measurement (Parshall & Harmes, 2009). The ideal innovative item type would increase construct representation while avoiding construct irrelevant variance (Huff & Sireci, 2001; Sireci & Zenisky, 2006). Potential benefits of innovative item types include greater fidelity to professional practice (Lipner, 2013), the opportunity to increase measurement of higher-level cognitive skills (Wendt, Kenny, & Marks, 2007), the ability to measure broader aspects of the content domain (Strain-Seymour, Way, & Dolan, 2009), and the possibility of scoring examinees’ processes as part of the response as well as their products (Behrens & DiCerbo, 2012).

The term “innovative item types” has been used most often to describe these alternative assessment methods, though in the field of educational testing, the term “technology-enhanced items” has also become common (e.g., Zenisky & Sireci, 2013). Both phrases are broadly inclusive terms that have been used to encompass a very wide range of potential item types and other assessment structures. In general, any item format beyond the traditional, text-based, multiple-choice item type may be considered to be an innovative item type, though the most complex computerized assessment structures are more typically referred to as case-based simulations (Lipner et al., 2010). Item formats that are possible but rarely used in paper-based testing are often included in the category of innovative item types, because the computer platform may mean they are easier to deliver (e.g., an item with a full-color image or an audio clip) or to score (e.g., a short-answer item, a drag-and-drop matching item).

The range of innovative item types that could be created is so great that various compendia and taxonomies have been produced in an effort to help define the field. For example, Sireci and Zenisky (2006) present a large number of item formats, including extended multiple choice, multiple selection, specifying relationships, ordering information, select and classify, inserting text, corrections and substitutions, completion, graphical modeling, formulating hypotheses, computer-based essays, and problem-solving vignettes. Multiple categorization schemas for innovative item types have also been proposed (e.g., Scalise & Gifford, 2006; Strain-Seymour, Way, & Dolan, 2009; and Zenisky & Sireci, 2002). For example, in Parshall, Harmes, Davey, and Pashley’s (2010) taxonomy, seven dimensions are used to classify innovative item types. These dimensions are assessment structure, response action, media inclusion, interactivity, complexity, fidelity, and scoring method.

The extensive lists of innovative item types provided in compendia and taxonomies typically include a fair number that have never been used operationally. In some cases, an item type was developed as part of the preliminary research a testing organization devoted to new item types. As such, even the incomplete development of an alternative item type might have been a valuable learning experience for the organization. In other cases, intractable problems (e.g., a scoring solution) were uncovered late in the development process, and the novel item type was forced to be abandoned.

For the first decade or more of operational computer-based tests (CBTs), if an exam program wanted to implement any nontraditional item types, custom software development was required. In fact, all the early CBTs required custom software development, even to deliver the traditional multiple-choice item type, since there were no existing CBT applications. Nevertheless, expanding beyond multiple-choice items required further effort, and most exam programs continued to deliver tests using that sole item type. Only a handful of exam programs pursued customized item type development (e.g., Bejar, 1991; Clauser, Margolis, Clyman, & Ross, 1997; O’Neill & Folk, 1996; Sands, Waters, & McBride, 1997). It was an expensive and time-consuming process, as extensive work was needed to support the underlying psychometrics, as well as the software development, and the effort did not always result in an item type that could be successfully used.

Over time, wide-scale changes in the CBT field occurred. These changes included the development of commercial CBT software, such as item banks and test-delivery applications. Testing organizations are now able to contract with a commercial firm for applications such as these rather than undertaking proprietary software development. In a related development, measurement-oriented interoperability specifications such as the Question and Test Interoperability standard (QTI; IMS, 2012) were established. The QTI specification represents test forms, sections, and items in a standardized XML syntax. This syntax can be used to exchange test content between software products that are otherwise unaware of each other’s internal data structures. As a result of these technological developments, all testing organizations have become much less isolated. There is much greater integration and communication across software systems, as well as more standardization of the elements included in different software applications.

Under these newer, more integrated software conditions, the development of customized assessment innovations is relatively streamlined in comparison to the past. In some cases, the IT department at a testing organization may develop a plug-in for an item type feature that will then work within the larger set of CBT software for delivery and scoring. In other cases, a testing organization may work with a third-party vendor that specializes in CBT item/test software development to have a more elaborate innovation custom developed (e.g., Cadle, Parshall, & Baker, 2013). These technological changes have undoubtedly made the development of customized item types more achievable, though substantial challenges, including potentially high costs, remain.

One area of interest requiring customization is the development of multistep, integrated tasks or scenarios. Behrens and DiCerbo (2012) refer to this approach as the shift from an item paradigm to an activity paradigm. One goal often present when these task-based assessments are considered is the opportunity to focus on the examinee’s process as well as the end product (e.g., Carr, 2013; DiCerbo, 2004; Mislevy et al., this volume; Rupp et al., 2012). In some cases, though the task may be designed to be process oriented, the outcome is still product oriented (Zenisky & Sireci, 2013). The response formats in these cases often include traditional approaches such as the multiple-choice and essay item types (e.g., Steinhauer & Van Groos, 2013). Other response formats use more complex approaches (e.g., Cadle, Parshall, & Baker, 2013; Carr, 2013; Steinhauer & Van Groos, 2013). When researchers and developers are interested in the examinee’s process, they may also seek ways to score attributes of the examinee’s response set, either in addition to or instead of the response’s correctness (Behrens & DiCerbo, 2012). Examples of assessments that can score attributes of the examinee’s response include interactive tasks in the National Assessment of Educational Progress (NAEP) Science Assessment; student responses to these tasks can be evaluated to determine if they were efficient and systematic (Carr, 2013). In addition, children’s task persistence has been investigated in a game-based assessment (DiCerbo, 2004), while users’ effectiveness and efficiency in responding to computer application tasks has also been considered (Rupp et al., 2012).

As some of these examples suggest, a “digital ocean of data” (DiCerbo, 2004) may be available for analysis. Potential data sources can include computer log files (Rupp et al., 2012), the user’s clickstream, resource use pattern, timing, and chat dialogue (Scalise, 2013). Determining which elements to attend to in these cases can be a challenging problem (Rupp et al., 2012). Luecht and Clauser (2002) describe this as the need to identify the “universe of important actions.”

Scoring these types of assessments is often a challenging problem to resolve. Use of a much larger examinee response space and evaluation of multiple attributes naturally suggests a need for new analysis methods (Behrens et al., 2012 ; Gorin & Mislevy, 2013; Olsen, Smith, & Goodwin, 2009; Way, 2013; Williamson, Xi, & Breyer, 2012). As new analysis methods are developed for novel types of assessments, investigations are also needed into the types of response analysis and feedback that item writers find most useful in their task of item review and revision (Becker & Soni, 2013).

At the same time as interest in this new wave of customized innovations has been growing, several modestly innovative item types have been incorporated into many popular CBT applications. Depending on the specific CBT applications, these built-in innovative item types can include the multiple-response (also referred to as the multiple-answer–multiple-choice); items with graphics, audio, or video clips; the hot spot; the short-answer item type; and the drag-and-drop item. Several of these item types have a potential utility across a fairly large number of content areas, and have in fact been used on a considerable number of operational exams.

In some cases, the availability of these built-in, or off-the-shelf, item types within an application can mean that their inclusion on an exam is fairly easy. However, it is still not unusual for software support of these built-in item types to be incomplete across the full set of applications needed to deliver an exam (from item banking through test delivery and on to scoring and reporting). And because exam programs are so dependent on standardized measurement software and delivery vendors, whether the exam includes off-the-shelf or customized innovative item types, it is essential that all these elements interface seamlessly with each other.

The future of measurement is likely to include more novel item types and customized tasks. Testing organizations are increasingly likely to need strategies to help them manage the process of continuous innovation.

Strategies for Continuous Innovation

The recommended process for an exam program to follow when initially considering innovative item types is to begin with the test construct and to identify any current measurement limitations in the exam that innovations could help address (e.g., Parshall & Harmes, 2008, 2009; Strain-Seymour, Way, & Dolan, 2009). Through this analysis, a list of desirable new item types is often developed; this list might include both item types that are provided within a CBT vendor’s software and one or more that require custom development. At the same time, other exam innovations may also be on the table (e.g., some form of adaptive testing). In a short while, these possible improvements to the exam may be in competition with each other, and staff may be overwhelmed by the decisions needed and the work required.

In addition to potential software development challenges, every exam program has multiple stakeholders, and these stakeholder groups may have very different or even conflicting opinions regarding the value of a potential innovation. New materials for communicating with these stakeholder groups will be needed, just as new materials will be needed to support the work of the item writers and staff. Furthermore, new procedures for a host of test-development activities are often important in the development and delivery of an innovative item type.

At ABIM, this set of challenges led the research and development team to seek out a flexible yet consistent approach for the overall development of a broad set of potential innovations. The goal was to utilize the flexible and iterative nature of Agile software development methods, while at the same time including a standardized framework to ensure that the full assessment context would always be considered. ABIM anticipates that these methods, use of Agile principles and the innovation rollout plan, will be useful for many years into the future. These methods can support the current set of planned innovations and should also be robust enough to be helpful in years to come, even given the ongoing changes in medicine, technology, and measurement that will occur.

The strategies we propose for managing ongoing change are illustrated throughout this chapter via one specific innovation ABIM recently undertook. This case study involves the inclusion of patient–physician interaction video clips within standard multiple-choice items.

Case Study—Introduction

ABIM certifies physicians in the specialty of internal medicine and, additionally, in the 18 subspecialty areas within internal medicine, such as cardiovascular disease and medical oncology. Its multiple-choice examinations largely measure medical knowledge, which is but one of six competencies assessed by the certification process. In order to best manage the research and development of innovations, ABIM formed a cross-departmental innovations team, with content, psychometric, and computing backgrounds.

Many of the innovations considered by this cross-departmental research team seek to improve the multiple-choice examinations by enhancing fidelity to practice, both in enhanced look and feel of case presentation and in improved alignment of the thinking required to a...

Cover
Title
Copyright
Contents
List of Contributors
Foreword
Preface
1. Managing Ongoing Changes to the Test: Agile Strategies for Continuous Innovation
2. Psychometrics and Game-Based Assessment
3. Issues in Simulation-Based Assessment
4. Actor or Avatar? Considerations in Selecting Appropriate Formats for Assessment Content
Commentary on Chapters 1–4: Using Technology to Enhance Assessments
5. Using Technology-Enhanced Processes to Generate Test Items in Multiple Languages
6. Automated Test Assembly
7. Validity and Automated Scoring
Commentary on Chapters 5–7: Moving From Art to Science
8. Computer-Based Test Delivery Models, Data, and Operational Implementation Issues
9. Mobile Psychological Assessment
10. Increasing the Accessibility of Assessments Through Technology
11. Testing Technology and Its Effects on Test Security
Commentary on Chapters 8–11: Technology and Test Administration: The Search for Validity
12. From Standardization to Personalization: The Comparability of Scores Based on Different Testing Conditions, Modes, and Devices
13. Diagnostic Assessment: Methods for the Reliable Measurement of Multidimensional Abilities
14. Item Response Models for CBT
15. Using Prizes to Facilitate Change in Educational Assessment
Commentary on Chapters 12–15: Future Directions: Challenge and Opportunity
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Technology and Testing by Fritz Drasgow in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over 1.5 million books available in our catalogue for you to explore.

Technology and Testing

Improving Educational and Psychological Measurement

Technology and Testing

Improving Educational and Psychological Measurement

About this book

Trusted by 375,005 students

Information

1

Managing Ongoing Changes to the Test

Agile Strategies for Continuous Innovation

Introduction

Background on Innovative Item Types

Strategies for Continuous Innovation

Case Study—Introduction

Table of contents

Frequently asked questions