Integrating Timing Considerations to Improve Testing Practices
eBook - ePub

Integrating Timing Considerations to Improve Testing Practices

  1. 188 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Integrating Timing Considerations to Improve Testing Practices

About this book

Integrating Timing Considerations to Improve Testing Practices synthesizes a wealth of theory and research on time issues in assessment into actionable advice for test development, administration, and scoring. One of the major advantages of computer-based testing is the capability to passively record test-taking metadata—including how examinees use time and how time affects testing outcomes. This has opened many questions for testing administrators. Is there a trade-off between speed and accuracy in test taking? What considerations should influence equitable decisions about extended-time accommodations? How can test administrators use timing data to balance the costs and resulting validity of tests administered at commercial testing centers?

In this comprehensive volume, experts in the field discuss the impact of timing considerations, constraints, and policies on valid score interpretations; administrative accommodations, test construction, and examinees' experiences and behaviors; and how to implement the findings into practice. These 12 chapters provide invaluable resources for testing professionals to better understand the inextricable links between effective time allocation and the purposes of high-stakes testing.

The Open Access version of this book, available at http://www.taylorfrancis.com, has been made available under a Creative Commons Attribution-Non Commercial-No Derivatives 4.0 license.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Integrating Timing Considerations to Improve Testing Practices by Melissa J. Margolis, Richard A. Feinberg, Melissa J. Margolis,Richard A. Feinberg in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2020
Print ISBN
9781138479760
eBook ISBN
9781351064767
Edition
1

1

A History of Test Speededness

Tracing the Evolution of Theory and Practice
Daniel P. Jurich
There are many practical reasons for administering tests with time limits, most of which relate to the logistics and efficiency of test administration (Bandalos, 2018, p. 59; Morrison, 1960; Rindler, 1979). For example, time limits help to control costs for test developers who often must pay expenses associated with the testing space as well as staff costs for necessary personnel (e.g., test proctors). However, time limits can also serve essential measurement-related functions. Perhaps most importantly, they help to standardize the testing conditions and improve the ability to compare performance across examinees. Concrete evidence of timed standardized testing dates back at least to the Chinese Civil Service examinations administered in the 15th century. At that time, candidates were given one night and one day to complete poems and essays that were used to evaluate their style and penmanship (Martin, 1870). In the United States, the Army led early applications of timed structured cognitive and noncognitive testing through exams such as the Army Alpha and Beta. Beginning in 1917, these tests were used to evaluate World War I recruits on a variety of cognitive skills such as arithmetic reasoning and verbal aptitude (Gregory, 2004; Schnipke & Scrams, 2002). Since these beginnings, standardized examinations with time limits have become ubiquitous within modern society.
Although the implementation of time limits in standardized testing usually occurs due to reasons unrelated to measurement, time constraints can have substantial impact on the validity of scores. Accurate measurement is predicated on the assumption that test scores represent an examinee’s true proficiency with respect to the intended constructs. When the speed with which an examinee completes a test is not of interest, a restrictive time limit that does not allow examinees to exhibit their true proficiency can have negative consequences by introducing construct-irrelevant variation into examinee performance. Even when purposefully measuring speed, an inadequately timed assessment can yield questionable or even invalid results if the degree to which speed affects scores is different from what is expected based on the construct definition. The potential for speed to threaten the validity of scores has been referred to in the literature as test speededness.
This chapter presents a historical overview of the testing literature that exemplifies the theoretical and operational evolution of test speededness. As will be shown, the definition of speededness has evolved throughout the history of measurement and to this day remains a debated topic. The current Standards for Educational and Psychological Testing provide a framework for conceptualizing test speededness as the “extent to which test takers’ scores depend on the rate at which work is performed as well as on the correctness of the responses” (AERA, APA, NCME, p. 223). In other words, speededness occurs when the allotted testing time influences examinee performance such that both speed and the construct of interest contribute to score variation. Several comprehensive literature reviews have summarized different aspects of the relationship between timing and testing (e.g., Lu & Sireci, 2007; Morrison, 1960; Schnipke & Scrams, 2002). This chapter presents a historical overview that focuses on how the concept of speededness evolved and how this evolution in conceptualization has influenced the methods that practitioners have used, and are now using, for evaluations of speededness. By describing how the field arrived at current philosophies and exploring the issues that still remain unaddressed, this brief historical review intends to serve as a foundation for the subsequent chapters within this book.

The Early Years: Speed and Ability as Interchangeable Measures

As the scientific study of testing burgeoned after World War I, initial theories posited that speed would not influence response quality independent of the intended construct (Spearman, 1927). Though practitioners recognized that speed and proficiency were conceptually distinct, the prevailing theory presumed that the high correlation between the two traits made them indistinguishable from a measurement perspective (Davidson & Carroll, 1945). In other words, timing could not introduce construct-irrelevant variance because speed was interchangeable with the construct of interest. Some context of the testing era is helpful to understand the logic in this theory. It is axiomatic that numeric scores, such as number correct, will decrease when examinees lack sufficient time to consider all items. However, test scores in this era were predominately used to rank-order examinees. Although total scores can differ substantially under different time limits, rank order would stay comparable if speed and proficiency correlated near perfectly (see Ruch & Koerth, 1923).
There was also an empirical basis for considering the evolution of speed and proficiency as interchangeable. To elaborate on this work, we must distinguish between speed tests and power tests, concepts formalized by Gulliksen in 1950 but used colloquially prior to Gulliksen’s work. A pure speed test is one that is intended to evaluate how quickly an examinee can complete a set of test items within a fixed period of time. As such, speed tests are designed to have strict time limits and to include items of such ease that examinees can respond to all items correctly. Scores on speed tests then reflect the number of items responded to within the time limit and provide an indication of the speed and accuracy with which an examinee processes information. In contrast, pure power tests have no time limits and contain items of varying difficulty to capture the range of proficiency on the construct(s) of interest; scores on these tests reflect the number of items examinees answer correctly out of all items and are used to evaluate ability apart from the speed with which questions are answered. The distinction between pure speed and power tests is primarily theoretical. Many educational examinations function as a mixture of both power and speed tests, intending to primarily measure the construct of interest (i.e., power), but also containing a speed component resulting from time limits that are imposed to address practical constraints (Lu & Sireci, 2007; Chapter 3, this volume). Although theoretical in nature, the concepts of speed and power tests served as a foundation for the methodological developments throughout the evolution of speededness.
Restating Spearman’s theory in these terms, rank order should be consistent whether an examination is administered as a speed or a power test. The belief that speed served as a proxy for cognitive ability partially stemmed from research in the 1920s and 1930s that investigated the relationship between scores from tests taken under both speed and power conditions. This research generally involved having examinees take a timed examination with a pencil; when the time limit was reached, they then finished taking the test using a different colored pencil or pen so that scores under both speed and power conditions could be distinguished (e.g., Paterson & Tinker, 1930; Peak & Boring, 1926; Ruch & Koerth, 1923). The empirical evidence indicated that scores under the two conditions were highly correlated. For example, Ruch and Koerth (1923) administered the aforementioned Alpha Army examination to 122 examinees under two timed conditions and a power condition, and multicolored pencils were used to capture response markings under the different conditions. Examinees first were given the standard amount of time suggested by the testing manual to respond to questions using a black pencil (single time). After the first time limit expired, examinees were provided the same amount of time to continue or revise answers using a blue pencil (double time), and after that time limit expired they switched to a red pencil to complete or change responses under an untimed period (untimed). Results indicated that rank ordering remained consistent—single to double time total scores correlated at 0.966 and single to untimed total scores correlated at 0.945—and therefore seemed to support the comparability between speed and accuracy.

Distinctions between Speed and Power

Taken at face value, Spearman’s philosophy implies that time limits could be applied capriciously without consequence to validity (Morrison, 1960). As the study of mental testing matured, and likely motivated by the implication of Spearman’s theory for practice, empirical research began to contradict the interchangeability of time and proficiency (Baxter, 1941; Davidson & Carroll, 1945). Davidson and Carroll provided a strong theoretical and empirical critique of this accepted practice. The authors expressed strong beliefs that scores from tests administered under time limits—particularly restrictive limits—reflected a mixture of examinees’ knowledge and rate. This led the authors to claim, “the indiscriminate use of time-limit scores is one of the more unfortunate characteristics of current psychological testing …” (p. 411). Davidson and Carroll first criticized the established method of correlating scores from timed and untimed administrations of the same examination because the untimed score reflects a combination of the timed component, responses to the unreached items, and any answer changes made by the examinee. As the timed scores represent a part of the total untimed score, this method spuriously inflates correlations. The problems with this approach were exacerbated when the timed condition allowed examinees to reach the vast majority of the items. In this situation, the timed scores would almost fully reflect the final untimed scores (and the two necessarily would be highly correlated).
The authors followed up their methodological critique with an empirical study focusing on establishing a distinction between speed and knowledge. Utilizing various sections from a revised Alpha Army and several other examinations measuring a number of different constructs, the authors captured responses from examinees under timed and untimed conditions. They also collected data on the time it took each examinee to finish the exam after the time limit expired. A factor analysis found that scores from the untimed administration and completion speed loaded on separate orthogonal factors representing power and speed, respectively. Moreover, scores from the timed administration loaded on both the power factor and the speed factor, indicating that timed...

Table of contents

  1. Cover
  2. Half Title
  3. Series Page
  4. Title Page
  5. Copyright Page
  6. Contents
  7. Foreword
  8. Acknowledgments
  9. Contributors
  10. 1. A History of Test Speededness: Tracing the Evolution of Theory and Practice
  11. 2. The Impact of Time Limits and Timing Information on Validity
  12. 3. Timing Considerations in Test Development and Administration
  13. 4. Extended Time Testing Accommodations for Students with Disabilities: Impact on Score Meaning and Construct Representation
  14. 5. Relationship between Testing Time and Testing Outcomes
  15. 6. How Examinees Use Time: Examples from a Medical Licensing Examination
  16. 7. Timing Considerations for Performance Assessments
  17. 8. Impact of Technology, Digital Devices, and Test Timing on Score Comparability
  18. 9. Using Response Time for Measuring Cognitive Ability Illustrated with Medical Diagnostic Reasoning Tasks
  19. 10. Response Times in Cognitive Tests: Interpretation and Importance
  20. 11. A Cessation of Measurement: Identifying Test Taker Disengagement Using Response Time
  21. 12. Concurrent Use of Response Time and Response Accuracy for Detecting Examinees with Item Preknowledge
  22. Index