
- 190 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
About this book
Validity is a clear, substantive introduction to the two most fundamental aspects of defensible testing practice: understanding test score meaning and justifying test score use. Driven by evidence-based and consensus-grounded measurement theory, principles, and terminology, this book addresses the most common questions of applied validation, the quality of test information, and the usefulness of test results. Concise yet comprehensive, this volume's integrated framework is ideal for graduate courses on assessment, testing, psychometrics, and research methods as well as for credentialing organizations, licensure and certification entities, education agencies, and test publishers.
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Validity by Gregory J. Cizek in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.
Information
Topic
EducationSubtopic
Education General1
Introduction
Validity has long been one of the major deities in the pantheon of the psychometrician.(Ebel, 1961, p. 640)
Any treatment of the topic of validity must first acknowledge the central importance of the concept to any measurement process developed or used in the social sciences. As many authoritative sources have assertedâand as is affirmed hereâvalidity is the most important and essential characteristic of test scores. According to the Standards for Educational and Psychological Testing, âvalidity is ⌠the most fundamental consideration in developing tests and evaluating testsâ (2014, p. 11). And, the importance of validity is ongoing. Validity is not only pursued as a single time point endeavor: concern for validity should pervade the entire testing processâfrom when a test is first conceptualized to when test scores are reported.
As a beginning point for appreciating the importance of validity, this chapter first reviews some key prerequisites that are necessary for its understanding. These ideas are broadly applicable and underlie the compelling need for attention to validity across highly diverse testing applications. To aid readers who may not already be familiar with some background concepts that are essential for understanding validity and fully engaging with the content of this book, key terms such as test, inference, construct, and assessment are first defined and illustrated.
Of course, assertions about the preeminence of validity beg two questions that this chapter will also address: (1) âWhat is validity?â and (2) âWhy is validity important?â Along those lines, the second aim of this chapter is to present a definition of validity that will serve as a reference point. The concept of validity will be presented in both technical and practical terms.
Finally, with a definition of validity in place, the ways in which validity is related to other aspects of defensible testing practice will be examined. Accordingly, the third aim of this chapter is to briefly introduce a substantial reconceptualization of validity, nested in an overall framework for defensible testing. This reconceptualization will be more completely elaborated and illustrated in subsequent chapters, but is foreshadowed in this chapter to give the reader a sense of what lies ahead.
Foundational Measurement Concepts Underlying Validity
In this section, foundational concepts necessary for fully understanding validity are presented, including test, inference, construct, and assessment in the social sciences. Examples of each of these concepts from diverse areas of the social sciences are provided.
What Is a Test?
In the social sciences, a straightforward and broadly generalizable definition is that a test is a sample of information about some intended characteristic of persons that is gathered under specified, systematic conditions. However, the simplicity of this definition belies that fact that âtestâ is a frequently misunderstood concept on at least two counts.
Tests as Samples
First, it is important to realize that a test is a sample, and only a sample of a test takerâs knowledge, skill, ability, interest, or other attribute which cannot be directly observed and about which information is desired. It is often incorrectly concluded that a test score represents a highly definitive, concrete, or conclusive piece of information about a test taker. The fact that a test is only a sample of a test takerâs responses suggests otherwise. Although we might want to get as much information as possible about some characteristic, it is typically impossible or impractical to observe everything about a test taker. Indeed, the sample of information collected by a test may be very small. Tests typically capture only a small portion of what could be observed, so it is essential that the sample is one that is carefully structured.
To illustrate this first principle of testing, it is useful to consider some extremes. It is perhaps obvious that it would not usually be of greatest interestâor a basis for awarding a medical licenseâwhether a medical student could correctly respond to the following, specific, multiple-choice question: âWhich medication should not be given to a child at risk of Reyeâs Syndrome?â Instead, a medical board responsible for such a test would typically wish to extrapolate from the examineeâs correct response to this question (âaspirinâ) and several others like it, to a larger domain of knowledge about contraindications for various drugs as one component for making a licensure decision. Likewise, it would not usually be of interest that a third grade student was name-calling on the playground during the first recess period last Tuesday. Instead, an educator trying to understand the extent of an elementary schoolâs bullying climate would typically wish to make several systematic observations across various students, grade levels, days, and contexts. A basketball scout would not want to offer a contract to a potential player having seen the player attempt a single free throw. And, a voterâs response to âDo you favor or oppose more metered, on-street parking?â would not usually be very helpful to a pollster in determining the personâs political philosophy.
It should be clear that in each of these situations, making a judgment about a medical studentâs competence from a single question on aspirin, conclusions about a schoolâs bullying climate from a single recess observation, judgments about a playerâs potential from single free throw attempt, or impressions of a voterâs political orientation based on his or her position on a single issue are likely to be both highly undependable and inaccurate. That is why the medical licensure examination samples more broadly, including several questions about various drugs that are most likely to be encountered in practice; it is why the educator performs a number of observations across various contexts where bullying is likely to be experienced in elementary schools; it is why sports teams view many examples of a potential playerâs performances; and it is why the pollsterâs focus groups would touch on a range of political and policy topics that commonly reflect differential political attitudes.
In summary, as regards the first essential characteristic of a test, it should be recognized that even one question on drugs, a single recess observation, a single free-throw attempt, or a single interview about on-street parking is a test. They all qualify as tests because each one represents the collection of a sampleâalbeit a very small oneâof the test takerâs knowledge, skill, attitude, and so on. However, it is probably also obvious that a test, though only a sample, provides the most accurate and dependable information about a test takerâs knowledge, skill, or attitude when it is more carefully and comprehensively constructed.
Tests: Agnostic as to Format
The second aspect of what constitutes a test is what a test is not. Referring to the definition of test provided earlier, it can be seen that its meaning is untethered to any specific format. Sometimesâbut again incorrectlyâit is believed that âtestâ connotes a collection of multiple-choice questions, bubble sheets, number 2 pencils, and a stopwatch. Although multiple-choice questions administered in a standardized way, under timed conditions, and scored by optical scanners might qualify as a test, it is only one of many possibilities. The type of tests just described may be used more often in large-scale education contexts for gauging student achievement; however, this configuration of a test may be rarely or never used in most other social science education.
What defines a test has little or nothing to do with the format or type of questions or tasks presented to test takers. And, as will be shown, although a degree of standardization is useful for some purposes in testing, it is essential to recognize that the characteristics of how test takers provide responses (e.g., orally, on bubble sheets, as performances), how test taker responses are scored (by humans, by scanners, by automated scoring algorithms, etc.), and other features of the setting, timing, and aids used during testing may be largely irrelevant to something qualifying as a test. So what makes a test, a test?
Recalling the definition provided earlier, because a test, broadly conceived, is any systematic sample of a personâs knowledge, skill, attitude, ability, or other characteristic collected under specified conditions, there is a vast number of configurations that would qualify as a test. The issue of care and comprehensiveness in sampling has been addressed; we now turn to the conditions that must be in place so that the sampling yields dependable and accurate information.
Standardized Tests
It was mentioned earlier that some degree of standardization is useful, where standardization simply refers to the prescriptive administration conditions that a test developer has indicated should be followed. A test developer will typically conduct research to determine, and then carefully specify, the conditions that must be in place for the results of testing to have the meaning intended. The collection of prescribed administration conditions for a given testâranging from few, informal guidelines to many, highly prescriptive and detailed procedures and prohibitionsâare what make a test âstandardized.â The medical licensure examination described previously would be called standardized to the extent that certain content coverage was mandated and/or specific time limits were in place. The observations of bullying would be called standardized to the extent that they were collected at prescribed periods during the school day, in specific contexts. The scouting of professional athletes would be called a standardized test to the extent that players were required to attempt the same number and types of basketball shots using regulation equipment and perform the same physical demonstrations. The pollsterâs interviews would be called standardized tests to the extent that the same topics were addressed, the same questions asked, and a common checklist was used for noting responses.
It should be noted that, although the second characteristic of a test is the presence of specified administration conditions, there may also be deviations from the standard administration conditions that the test developer deems to be allowable because they do not alter what the test attempts to measure or how scores on the test can be interpreted. These allowable deviations are often referred to as accommodationsâchanges in various aspects of testing that can support the validity of scores obtained from a test.
The Standards provide some general guidance on accommodations and distinguish them from changes that do alter the construct being assessed (called test modifications) and undermine the intended interpretations of scores on a test (see AERA, APA, & NCME, 2014, pp. 59â62). A list of categories of some common aspects of testing that a test developer might prescribe is provided below; a more elaborated list with several examples of each category is shown in Table 1.1.
Table 1.1 Aspects of testing a test developer might prescribe/allow to promote validity
| General category | Specific examples |
Mode of presentation | Paper or computer-based |
Text, audio, video presentation | |
Font type, size, or other print or screen characteristics (e.g., display size, resolution) | |
Test instructions or questions read aloud to test taker | |
Braille, ASL, or alternative language presentation of test directions or materials | |
Mode of response | Written, key-entered, bubbled, oral, performance |
Real-time response vs. recorded response | |
Use of a âscribeâ to record responses | |
Test scheduling | Fixed date vs. on-demand test scheduling |
Defined testing âwindowsâ during which test may be taken at any point in a range of dates | |
Time of day (e.g., morning, afternoon) | |
Fixed vs. flexible order in which sections of a test must be taken | |
Test setting | Seating configurations (e.g., âsit a seat apartâ), specifications for computer screen orientation, spacing, dividers, etc. |
Group or individual administration | |
Allowable setting variations (e.g., distraction-free setting, quiet room) | |
Prescribed lighting, temperature, ventilation, seating, work surfaces, etc. | |
Prohibited test setting materials (e.g., charts, posters, maps, or other materials on walls, doors, desks, etc.) | |
Test timing | Specified time limits vs. allowable extended time |
Allowable breaks, frequency and duration of breaks between test... |
Table of contents
- Cover
- Half Title
- Title Page
- Copyright Page
- Dedication
- Copyright Acknowledgment
- Table of Contents
- Preface
- 1 Introduction
- 2 Validity: The Consensus
- 3 Validity and the Controversy of Consequences
- 4 A Comprehensive Framework for Defensible Testing Part I: Validating the Intended Meaning of Test Scores
- 5 A Comprehensive Framework for Defensible Testing Part II: Justifying the Intended Uses of Test Scores
- 6 How Much Is Enough?
- 7 Conclusions and Future Directions
- References
- Index