eBook - ePub

What We Know About Grading

Name: What We Know About Grading
Author: Thomas R. Guskey, Susan M. Brookhart

What Works, What Doesn't, and What's Next

Thomas R. Guskey, Susan M. Brookhart

Buch teilen

236 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

What We Know About Grading

What Works, What Doesn't, and What's Next

Thomas R. Guskey, Susan M. Brookhart

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Grading is one of the most hotly debated topics in education, and grading practices themselves are largely based on tradition, instinct, or personal history or philosophy. But to be effective, grading policies and practices must be based on trustworthy research evidence.

Enter this book: a review of 100-plus years of grading research that presents the broadest and most comprehensive summary of research on grading and reporting available to date, with clear takeaways for learning and teaching. Edited by Thomas R. Guskey and Susan M. Brookhart, this indispensable guide features thoughtful, thorough dives into the research from a distinguished team of scholars, geared to a broad range of stakeholders, including teachers, school leaders, policymakers, and researchers. Each chapter addresses a different area of grading research and describes how the major findings in that area might be leveraged to improve grading policy and practice. Ultimately, Guskey and Brookhart identify four themes emerging from the research that can guide these efforts: - Start with clear learning goals,
- Focus on the feedback function of grades,
- Limit the number of grade categories, and
- Provide multiple grades that reflect product, process, and progress criteria.

By distilling the vast body of research evidence into meaningful, actionable findings and strategies, this book is the jump-start all stakeholders need to build a better understanding of what works—and where to go from here.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist What We Know About Grading als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu What We Know About Grading von Thomas R. Guskey, Susan M. Brookhart im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Education & Evaluation & Assessment in Education. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

ASCD

Jahr

2019

ISBN

9781416627654

Thema

Education

Thema

Evaluation & Assessment in Education

Chapter 1

Reliability in Grading and Grading Scales

Susan M. Brookhart and Thomas R. Guskey

. . . . . . . . . . . . . . . . . . . .

Few people today would question the premise that students' grades should reflect the quality of their work and not depend on whether their teachers are "hard" or "easy" graders. But how much subjectivity on the part of teachers is involved in the grading process, and what do we know about its influence? The earliest research on grading dates to the 1800s and was concerned with this very issue. These early studies questioned the reliability of teachers' grading.

Why Is This Area of Research Important?

Reading the research on grading gives present-day educators cause for consternation. On the one hand, early studies of grading reliability clearly were motivated by dissatisfaction, and sometimes disdain, by researchers for teachers' unreliable practices. Our reaction to this, of course, is indignation: That's not right! On the other hand, the extent of the unreliability in grading identified in these early studies was huge. Grades for the same work varied dramatically from teacher to teacher, resulting in highly divergent conclusions about students, their learning, and their future studies. That's not right, either.

In this chapter, we describe these early studies of grade reliability as well as one contemporary study that replicated an early study. We gently critique some of the underlying bias in these studies, and then offer some practical suggestions for applying the studies' results to grading practices today. Despite their biases and flaws, these early studies do offer several clear implications for practice.

What Significant Studies Have Been Conducted in This Area?

In our review of the research, we found 16 individual studies of grading reliability from the early 20th century, plus two early reviews of grading studies by Kelly (1914) and Rugg (1918). These are described in Figure 1.1. We reference these early reviews because they include dozens of early studies in addition to the published studies we were able to locate. Some of the studies Kelly and Rugg reviewed were unpublished reports from school districts or universities that are unavailable to us a century later. In addition, we found an early statistical treatise on the subject by Edgeworth (1888) that we describe first because it set the stage for the research that followed.

Figure 1.1. Early Studies of the Reliability of Grades

Studies: Ashbaugh (1924)

Participants: University education students

Main Findings

Grading the same 7th grade arithmetic paper on three occasions, the mean remained constant, but the scores got closer together.
Inconsistencies among graders increased over time.
After discussion, graders devised a point scheme for each problem and grading variability decreased.

* * *

Studies: Bolton (1927)

Participants: 6th grade arithmetic teachers

Main Findings

Average deviation was 5 points out of 100 on 24 papers.
Lowest-quality work presented the greatest level of variation.

* * *

Studies: Brimi (2011)

Participants: English teachers

Main Findings

Range of scores was 46 points out of 100 and covered all five letter-grade levels.

* * *

Studies: Eells (1930)

Participants: Teachers in a college measurement course

Main Findings

Elementary teachers displayed grading inconsistency over time grading three geography and two history questions.
Estimated reliability was low.
Most agreement was found on one very poor paper.

* * *

Studies: Healy (1935)

Participants: 6th grade written compositions from 50 different teachers

Main Findings

Format and usage errors were weighed more heavily in grades than the quality of ideas.

* * *

Studies: Hulten (1925)

Participants: English teachers

Main Findings

Teacher inconsistency was revealed over time grading five compositions.
20 percent changed from pass to fail or vice versa on the second marking.

* * *

Studies: Jacoby (1910)

Participants: College astronomy professors

Main Findings

There was little disagreement on grades for five high-quality exams.

* * *

Studies: Lauterbach (1928)

Participants: Teachers grading handwritten and typed papers

Main Findings

Student work quality was a source of grade variability.
In absolute terms, there was much variation by teacher for each paper.
In relative terms, teachers' marks reliably ranked students.

* * *

Studies: Shriner (1930)

Participants: High school English and algebra teachers

Main Findings

Teachers' grading was reliable.
There was greater teacher disagreement in grades for the poorer papers.

* * *

Studies: Silberstein (1922)

Participants: Teachers grading one English paper that originally passed in high school but was failed by the New York Regents

Main Findings

When teachers regraded the same paper, they changed their grade.
Scores on individual questions on the exam varied greatly, explaining the overall grading disagreement (except on one question about syntax, where grades were more uniform).

* * *

Studies: Sims (1933)

Participants: Reanalysis of four studies of grading arithmetic, algebra, high school English, and psychology exams

Main Findings

There were two kinds of variability in teachers' grades: (1) differences in students' work quality, and (2) "differences in the standards of grading found among school systems and among teachers within a system" (p. 637).
Teachers disagreed significantly on grades.
Changing from a 100-point scale to grades reduced disagreements.

* * *

Studies: Starch (1913)

Participants: College freshman English instructors

Main Findings

Teacher disagreement was significant, especially for the two poorest papers.
Four sources of variation were found and probable error reported for each: (1) differences among the standards of different schools (no influence), (2) differences among the standards of different teachers (some influence), (3) differences in the relative values placed by different teachers upon various elements in a paper, including content and form (larger influence), and (4) differences due to the pure inability to distinguish between closely allied degrees of merit (larger influence).

* * *

Studies: Starch (1915)

Participants: 6th and 7th grade teachers

Main Findings

Average teacher variability of 4.2 (out of 100) was reduced to 2.8 by forcing a normal distribution using a five-category scale (poor, inferior, medium, superior, and excellent).

* * *

Studies: Starch & Elliott (1912)

Participants: High school English teachers

Main Findings

Teacher disagreement in assigning grades was large (a range of 30–40 out of 100 points).
Teachers disagreed on rank order of papers.

* * *

Studies: Starch & Elliott (1913a)

Participants: High school mathematics teachers

Main Findings

Teacher disagreement on a mathematics exam was larger than it was on the English papers in Starch and Elliott (1912).
Teachers disagreed on the grade for one item's answer about as much as they did on the composite grade for the whole exam.

* * *

Studies: Starch & Elliott (1913b)

Participants: High school history teachers

Main Findings

Teacher disagreement on one history exam was larger than for the English or math exams in prior Starch and Elliott studies (1912, 1913a).
Study concluded that variability isn't due to subject, but "the examiner and method of examination" (p. 680).

Source: From "A Century of Grading Research: Meaning and Value in the Most Common Educational Measure," by S. M. Brookhart, T. R. Guskey, A. J. Bowers, J. H. McMillan, J. K. Smith, L. F. Smith, et al., 2016, Review of Educational Research, 86(4), pp. 803–848. Copyright 2016 by American Educational Research Association. Adapted with permission.

The earliest investigation we could find is a statistical study published by the Journal of the Royal Statistical Society in the United Kingdom and rarely cited in the U.S. grading literature. And it's a doozy—the study begins with a table of contents outlining 26 separate points the author wants to make! Professor F. Y. Edgeworth (1888), author of the study, made an important contribution to both statistics and grading research by applying normal curve theory—he called it the "Theory of Errors" (p. 600)—to the case of grading examinations. Normal curve theory was fairly new at the time. Mathematician Carl Friedrich Gauss introduced the theory in the early 1800s and pointed out its usefulness for estimating the size of error in any measure. Edgeworth deserves a lot of credit for realizing this advance in statistics could help us with practical problems in education.

Unlike some of the researchers who followed him, Edgeworth's motivation was not to criticize teachers and professors, but rather to make things fairer for students. He explained that when students' performance is poorly measured, bad decisions result, including mistakes in identifying students for "honours" upon graduation (by which Edgeworth meant "'successful candidates' in an open competition for the Army or the India or Home Civil Service" [p. 603]). Thus, unreliable grades had real consequences for students.

Edgeworth described the plight of those whose achievement was good enough for these important future opportunities but whose grades did not confirm it: "There are some of the pass men as good as some of the honour men; but, like the unsung brave 'who lived before Agamemnon,' they are huddled unknown amongst the ignominious throng, for want, not of talent, or learning, or industry, or judgment, but luck" (p. 616). Edgeworth considered this part of an argument for improving grading reliability. We love Edgeworth's poetic and righteous indignation.

Normal curve theory allowed Edgeworth to measure the amount of error in grades due to chance, which in itself was a contribution to research. But Edgeworth went beyond that to tease out different sources of grading error: (1) chance, (2) personal differences among graders regarding the whole exam and individual items on the exam, and (3) "taking [the examinee's] answers as representative of his proficiency" (p. 614). He did this by using both hypothetical and real data to calculate the probable amount of error in examination grades, under different conditions and for different exams.

The idea that multiple factors led to unreliable grades was a huge step forward. It gave educators a window into things they could do about the problem. We can't really do much about the fact that measures tend to vary by chance. We can, however, take steps to help graders develop a shared view of what knowledge and skills the items and tasks on an exam are supposed to measure. We also can take steps to make sure the items and tasks on an exam really are representative of what we would now call desired learning outcomes.

What Questions Have Been Addressed in This Research? What Have the Results of Those Studies Revealed?

As we have noted, the most valuable early studies of grading reliability investigated sources of variation in grading. The least valuable of these studies simply investigated whether variability in grading existed at all, found that it did (of course it did), and simply proclaimed it a bad thing. More valuable studies investigated whether grading variability was affected by the quality of the student work or its format (e.g., by asking whether teachers find it easier to agree on grades for good papers or poor ones). Other studies investigated whether changing the grading scale would make grading more reliable. In the early 20th century, the most prevalent grading scale was the 0 to 100 percentage scale, which proved to be exceedingly unreliable. Teachers were much more consistent when using grading scales with fewer categories, especially those with five categories or fewer.

The main finding from these early studies was that great variation existed in the grades teachers assign to students' work (Ashbaugh, 1924; Brimi, 2011; Eells, 1930; Healy, 1935; Hulten, 1925; Lauterbach, 1928; Silberstein, 1922; Sims, 1933; Starch, 1913, 1915; Starch & Elliott, 1912, 1913a, 1913b). This finding agrees with the two early reviews of grading studies by Kelly (1914) and Rugg (1918). Not every early study, however, was quite so pessimistic. Studies by Jacoby (1910), Bolton (1927), and Shriner (1930) argued that grading was not as unreliable as commonly believed at the time.

Early researchers attributed the inconsistency in teachers' grades to one or more of the following sources:

The criteria for evaluating the work...