eBook - ePub

Setting Performance Standards

Name: Setting Performance Standards
ISBN: 9781136946714

Foundations, Methods, and Innovations

Gregory J. Cizek,

588 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Setting Performance Standards

Foundations, Methods, and Innovations

Gregory J. Cizek,

About this book

Setting standards of performance is a ubiquitous task in education licensure, certification, and credentialling. It is found in elementary schooling, the professions, commercial applications, and governmental and private organizations. It is one of the most complex, controversial, and vexing issues facing specialists and policy makers today. This second edition solidifies Setting Performance Standards as the only book providing a comprehensive profile of both the issues and the "how-to" methods that define this thorny field. Four chapters have been removed; 11 chapters have been added; 2 chapters have major revisions; and all chapters have been updated.

Comprehensive – Part I provides a conceptual overview of standard setting and its overarching issues; Part II provides practical (how-to) information on the newest standard setting methods; Part III provides information and advice on persistent and potential challenges in standard setting.

Practical – Part II (the heart of the book) reviews 16 of the newest standard setting methods, far more than any other book.

Expertise – Most of the well-known authors from the 1st edition return, with authors of equal stature contributing new chapters.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Section III

Standard Setting Methods

Variations on a Theme

The Modified Angoff, Extended Angoff, and Yes/No Standard Setting Methods

BARBARA S. PLAKE AND GREGORY J. CIZEK

Perhaps the most familiar of the methods for setting performance standards bears the name of the person who first suggested the outlines of an innovative criterion-referenced approach to establishing cut scores: William Angoff. Scholars involved in standard setting research, facilitators of standard setting studies, and even many panelists themselves are likely to at least recognize the phrase Angoff method.

Although the frequency with which the Angoff method is used in education contexts has waned since the introduction of the Bookmark standard setting approach (see Lewis, Mitzel, Ricardo, & Schulz, Chapter 12 of this volume), it likely remains, overall, the most popular standard setting method in use today. In 1988, Mills and Melican reported that “the Angoff method appears to be the most widely used. The method is not difficult to explain and data collection and analysis are simpler than for other methods in this category” (p. 272). In a 1986 review, Berk concluded that “the Angoff method appears to offer the best balance between technical adequacy and practicability” (p. 147). More recent appraisals suggest that it is the most oft-used method in licensure and certification testing and it is still commonly used in educational testing contexts (Meara, Hambleton, & Sireci, 2001; Plake, 1998; Sireci & Biskin, 1992).

Perhaps one reason for its enduring popularity is that the procedure first suggested by Angoff (1971) has been successfully refined and adapted in numerous ways. The original Angoff method and variations such as the Modified Angoff method, Extended Angoff method, and Yes/No Method are the focus of this chapter.

Origins

It is perhaps an interesting historical note that the standard setting method that came to be known as the Angoff method was not the primary focus of the original work in which the method was first described. The method first appeared in a chapter Angoff (1971) wrote on scaling, norming, and equating for the measurement reference book, Educational Measurement (Thorndike, 1971) in which he detailed “the devices that aid in giving test scores the kind of meaning they need in order to be useful as instruments of measurement” (p. 508). Clearly, what Angoff had in mind by that were score scales, equated scores, transformed scores, and the like; his chapter made essentially no reference to setting performance standards—a topic that today has itself been detailed in book length treatment such as this volume and others (see Cizek, 2001; Cizek & Bunch; 2007). In fact, the contents of the nearly 100-page chapter devote only two paragraphs to standard setting. The method is proposed by Angoff in one paragraph:

A systematic procedure for deciding on the minimum raw scores for passing and honors might be developed as follows: keeping the hypothetical “minimally acceptable person” in mind, one could go through the test item by item and decide whether such a person could answer correctly each item under consideration. If a score of one is given for each item answered correctly by the hypothetical person and a score of zero is given for each item answered incorrectly by that person, the sum of the item scores will equal the raw score earned by the “minimally acceptable person. A similar procedure could be followed for the hypothetical “lowest honors person.” (1971, pp. 514–515)¹

Three aspects of this description are noteworthy and, in retrospect, can be seen as influencing the practice of standard setting in profound ways for years to come. First, it is perhaps obvious, but should be noted explicitly that, Angoff’s description of a “minimally acceptable person” was not a reference to the acceptability of an examinee as a person, but to the qualifications of the examinee with respect to the characteristic measured by the test and the level of that characteristic deemed acceptable for some purpose. In the years since Angoff described his method, the terms borderline, minimally competent examinee, and minimally qualified candidate have been substituted when the Angoff procedure is used. Those constructions notwithstanding, this fundamental idea put forth by Angoff—the conceptualization of a minimally competent or borderline examinee—remains a key referent for the Angoff and similar standard setting methods. Indeed, in the conduct of an actual standard setting procedure, it is common that a considerable portion of the training time is devoted to helping participants refine and acquire this essential conceptualization.

A second noteworthy aspect is that the Angoff method was rooted in the notion that participants could be asked to make judgments about individual test items for purposes of determining a performance standard. The term test-centered model was used by Jaeger (1989) to describe the Angoff and other approaches that rely primarily on judgments about test content, as opposed to direct judgments about examinees (called examinee-centered models by Jaeger). With few exceptions, all modern criterion-referenced standard setting approaches are primarily test-centered.

The third noteworthy aspect of the Angoff’s original formulation is that it could be adapted to contexts in which more than one cut score was needed. That is, it could be applied to situations in which only dichotomous (i.e., pass/fail) classifications were needed, but it could also be applied to situations in which more than two categories were required. This can be seen in the context of Angoff’s original description, where two cut scores were derived to create three categories: Failing, Acceptable/Passing, and Honors. Further, although the method was originally conceived to be applied to tests in which the multiple-choice question (MCQ) format was used exclusively, the method has also been successfully applied to tests comprised of constructed-response (CR) items, and to tests with a mixture of both MCQ and CR formats.

Other features that have become fairly commonplace in modern standard setting were included in the second of the two paragraphs in which Angoff’s method was described. For one, Angoff’s proposal permitted the calculation of criterion-referenced cut scores by summarizing the independent judgments of a group of standard setting panelists prior to the administration of a test. Additionally, he proposed a potential, albeit rudimentary, strategy for validation of the resulting cut scores:

With a number of judges independently making these judgments it would be possible to decide by consensus on the nature of the scaled score conversions without actually administering the test. If desired, the results of this consensus could later be compared with the number and percentage of examinees who actually earned passing and honors grades. (1971, p. 515)²

As described by Angoff, the task presented to participants is to make dichotomous judgments regarding whether the minimally competent examinee could answer each item correctly (thereby assigning a value of 1 to each such item) or not (resulting in a value of zero being assigned to those item). This would most appropriately be called the Basic or Unmodified Angoff method, and is the foundation for what has subsequently been developed into the Yes/No Method (Impara & Plake, 1997) and which is described in greater detail later in this chapter. In practice, however, the use of the phrase original or unmodified Angoff method refers to an alternative to the basic approach that Angoff described in a footnote to one of the two paragraphs. The alternative involved asking participants to make a finer judgment than simply assign zeros and ones to each item in a test form. According to Angoff’s footnote:

A slight variation of this procedure is to ask each judge to state the probability that the “minimally acceptable person” would answer each item correctly. In effect, judges would think of a number of minimally acceptable persons, instead of only one such person, and would estimate the proportion of minimally acceptable persons who would answer each item correctly. The sum of these probabilities would then represent the minimally acceptable score. (1971, p. 515, emphasis added)

That refinement—asking participants to provide probability judgments with respect to the borderline examinees’ chances of answering items correctly—has become at the same time highly influential to the extent it incorporated and highlighted the probabilistic nature of standard setting judgments and, as will be described, it has also been the source of modest controversy.

At present, the most popular manifestation of the Angoff method is likely what has come to be called the traditional or modified Angoff approaches. In actuality, there are numerous ways in which the basic Angoff method has been modified. By far the most common modification involves requiring participants to make more than one set of judgments about each item—with those multiple judgments occurring in “rounds” between which the participants are provided with one or more pieces of additional information to aid them in making more accurate, consistent estimations of borderline examinee performance.

The balance of this chapter describes each of the contemporary adaptations of the original Angoff approach, including a traditional Angoff method applied to MCQs, the Yes/No variation with MCQs, and variations of the Angoff method for use with polytomously-scored (i.e., CR) items or tasks. The chapter concludes with common limitations of the Angoff method and recommendations for the future.

Traditional Angoff Method with MCQs

The purpose of this section is to describe a traditional Angoff (1971) standard setting procedure with items of a multiple-choice format, involving a panel of subject matter experts (SME) as judges, and using multiple rounds of ratings by the panelists with some information (i.e., feedback) provided to the panelists between rounds. This is also often called a Modified Angoff standard setting method because having multiple rounds with feedback in between is a modification of the original Angoff method that only involved a single round of ratings and no provision of feedback. In this section, several elements common to a Modified Angoff process will be presented, including: information about the composition of the panel; generating probability estimates; the role of Performance Level Descriptors (PLDs); the steps in undertaking the method; the rounds of ratings; the types of feedback provided to the panelists between rounds, and the methods typically used to compute the cut score(s).

Composition of the Panel

Like panelists in any standard setting study, the composition of a panel using the Angoff method varies based on the purpose of the test and uses of the results. In some cases, such as in licensure and certification testing programs, the panel is exclusively composed of subject matter experts (SMEs). In other instances, a mix of SMEs and other stakeholders are included in the panel, such as the case with the National Assessment of Educational Progress (NAEP) standard setting studies (see Loomis & Bourque, 2001; see Loomis, Chapter 6 of this volume, for more information about issues to consider when deciding on the composition of the panel for a standard setting study). Because it is the panelists who provide the data (ratings) for making cut score recommendations using the Angoff standard setting method (and most other judgmental standard setting methods), the representativeness of the panel is a crucial element bearing on the validity of the cut scores that are generated from the standard setting study.

Generating Probability Estimates

The Modified Angoff method involves having panelists make item-level estimates of how certain target examinees will perform on multiple-choice questions. In particular, panelists are instructed to estimate, for each item in the test, the probability that a randomly selected, hypothetical, minimally competent candidate (MCC) will answer the item correctly. Because these estimates are probability values they can range from a low of 0 to a high of 1.

These probability judgments can be difficult for participants to make, however. To aid in completing these estimates, panelists in an Angoff standard settings study are often instructed to conceptualize 100 MCCs and then estimate the proportion (or number) of them that would get the item right. In essence, the probability estimation is shifted to be an estimate of the proportion (or p-value) that would result from administering the item to a sample of 100 MCCs. Notice that this estimation is of the probability or the proportion of MCC that would answer the item correctly, not the proportion that should answer the item correctly. The focus on would instead of should takes into account many factors that might influence how such candidates perform on the test questions, including their ability and the difficulty of the item, but also other factors such as anxiety over test performance in a high-stakes environment, administrative conditions, and simple errors in responding.

Because panelists are asked to make estimates of item performance of a specific subgroup of the examinee population, it is critical that the panelists have a conceptual understanding of the knowledge, skills, and abilities (KSAs) of the MCCs. Often the SMEs who form the panel have first-hand knowledge of members of this subgroup of the examinee population, as when panel members have direct interactions with the examinee population through their educational or work experience. For setting cut scores on tests in K–12 educational contexts, the panel is typically composed of grade level and subject matter teachers or educational leaders; in licensure programs the panel is often comprised of SMEs who teach or supervise entry level professionals in their field. In some instances, policy and business leaders or representatives of the public are also members of the panel; special attention is needed in these instances to ensure that these panelists have a...

Cover Page
Half Title Page
Title Page
Copyright Page
Contents
Preface
About the Authors
Section 1 Conceptual and Practical Foundations of Standard Setting
Section II Common Elements in Standard Setting Practice
Section III Standard Setting Methods
Section IV Contemporary Issues in Standard Setting
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Setting Performance Standards by Gregory J. Cizek in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions