Chapter 1
The Culture of Grading
. . . . . . . . . . . . . . . . . . . .
People are doing a lot of rethinking about education these days. The pundits agree that something is wrong with Kā12 education, and everyone has a solution: a longer school day, a longer school year, more testing, less recess. Columnists, talk show hosts, and politicians on both sides lament that we've lost our edge. Competition is global and according to the tests we are not keeping up.
Why does global competitiveness matter? The intersection of globalization and technology has created an international competition for jobs and even college admissions. We can now easily compete, connect, and collaborate with people around the world (Friedman & Mandelbaum, 2012). "In today's interconnected world, our students are not competing with students from the state or city next door, but with students from Singapore, Shanghai, and Stockholm" (Stewart, 2012, p. 3). To be average is no longer good enough.
How are we doing? Some say there is a crisis in U.S. education. Others say the crisis is overblown. But there are some indisputable facts. On international tests, our students are performing poorly compared with students from other countries. Three international tests compare math, science, and reading performanceāTrends in International Mathematics and Science Survey (TIMMS), Progress in International Reading Literacy Study (PIRLS), and the Programme for International Student Assessment (PISA). In 2011, U.S. 8th graders came in 7th place in math and 9th place in science on the TIMMS. In that same year, U.S. 4th graders ranked 6th out of 53 countries in reading on the PIRLS. PISA is the most widely used international test, measuring performance in 65 countries. It is also the most challenging in that its goal is to measure not merely content knowledge, but the ability of students to apply knowledge to solve real-life problems. On the PISA in 2012, U.S. students scored well below other developed countriesā23rd in science, 30th in math, and 20th in reading (U. S. Department of Education, 2012a, 2012b, 2012c).
Even if we discount standardized test scores as an indicator of how our students are doing, we know this: In the United States, we not only have a skills gap (jobs that can't be filled due to a lack of skilled labor) but also a learning gap (an unacceptable high school and college dropout rate as well as college students who need remediation). The skills gap is evident in the lack of workers with specific skills needed for some of today's jobs, jobs that did not exist only a few years ago. In early 2012, in spite of the recession, there were more than three million jobs vacant in the United States due to a lack of math, reading comprehension, or technical skills required by companies (Friedman & Mandelbaum, 2012). The learning gap is twofold. First, 25 percent of our students will not graduate high school. The high school graduation rate in the United States of 74.7 percent ranks 12th among 28 developed countries (Education Week, 2013; Stewart, 2012). A high school diploma qualifies graduates for only a few low-wage jobs; adults without a high school diploma face dismal job prospects. Second, if U.S. students do make it to college, one-third of them must take at least one remedial course in reading, writing, or math. Only 54 percent of those entering college in the U.S. will complete a degree, ranking near the bottom when compared to other country's rates of college completion. Slots at elite universities are increasingly filled by better-prepared students from other countries.
People bemoan the sad state of U.S. competitiveness and insist that education is both the cause and the fix of our woes. Yet no one seems to have a definitive answer about what in education needs to be fixed. No one has the answer because there isn't just one answer. It's a series of related problems that overlap among curriculum, instruction, and assessment.
Although Kā12 educational reform is not the cure-all for the ills of the United States, the reform of one educational practiceāgradingāhas the potential to drive related changes in other practices. The culture of grading and all the baggage it encompasses has perpetuated a system that obstructs many other educational reforms. What is the relationship between grading reform and overall educational reform? Are grades a reflection of a dysfunctional system or a driver of the system? Hard to say. Grades are supposed to reflect what students know and how well the teacher has taught. But they often don't.
We now know that something is wrong with grades. Every day we see the mismatchāon one hand, the stellar performance on standardized tests from B and C students (thus labeled "underachievers"), and on the other, poor performance on standardized tests from straight A students. We know that many students leave high school with high grade point averages yet struggle academically in college.
Let us reflect for a moment on the "what ifs"āthat perhaps the answer lies in the reform of a traditional educational practice that has not changed in decades. What if grading practices were a piece of a bigger picture? What if by changing the way we use grades we could ignite authentic high-level learning? What if student empowerment could make learning more dynamic and change the outcome? What if our beliefs about grading were misguided?
If we dare to question our beliefs about grading, more "what if" questions emerge.
- What if an A student was a compliant one rather than a learned one?
- What if the premise that high grades were a predictor of success in life was faulty?
- What if grades, as the marker of success in school, were a flawed, or worse yet, meaningless tool?
- What if parents, by directing their children to focus on grades, inadvertently created an addiction to form over substance?
The challenge of reforming grading practices is a difficult one. The "what ifs" reveal a practice that is deeply ingrained not only in education but in our culture. Grading is a language, a schemaāwe grade presidents and we grade meat. For grading reform to happen, we must acknowledge and accept how our beliefs have influenced grading practices.
A Brief History of Education
How did we get here? Three historical forces converged to create and perpetuate traditional grading practices that are common todayāthe roots of education in moral development, the use of education to sort and rank students, and the prevalence of behaviorism in school practices.
Teacher as Moral Educator
In a young and often chaotic colonial America, moral stability was necessary for the survival of society. The original establishment of schools was primarily for the purpose of moral education, and schools were viewed as an important social agency to promote virtue, character, and good habits. From the earliest days of our country, the goal of mass literacy was driven by the need to read the Bible and thus save one's soul. Contrary to today's practice of secular education, schools were the servant of religion, and moral education in the schools was a logical outgrowth of religion. "In the eyes of Puritans religious and moral education were inextricably intertwined" (McClellan, 1999, p. 2). Learning was valued not as an end in itself but as "an instrument for clarifying the ways of God to man and thus rendering certain the conditions of eternal salvation" (Thayer, 1965, p. 12). Teachers worked hard to promote in students the virtues of self-restraint, industry, honesty, punctuality, and orderliness. Discipline in school was viewed as a way to model full obedience to God.
A basic fear of the fragility of human virtue pervaded our societyāthat without constraints and vigilance our youth would fall prey to unsavory temptations. This fear was grounded in the 17th century conception of original sin, that man was predisposed to choose evil over good (Thayer, 1965). The fears of our founding fathers were not much different from the general concerns for our youth today, and the roots of moral education are evident in today's common educational practices. We reward the modern version of virtue and punish the lack of it. We reward responsibility, effort, hard work, neatness, and homework completion. We penalize tardiness, sloppiness, late work, and cheating. For this noble goal of instilling morality in students, grades have been a most convenient tool. Unfortunately, this use of grades has led to a school culture that often places more value on compliance and working than learning.
Schools as a Mechanism for Sorting and Ranking
Early in the 20th century, compulsory attendance laws changed the practice of Kā12 education in the United States. Elementary schools grew in popularity and large numbers of students started attending high school. From 1870 to 1910 the number of high schools in the United States grew from 500 to 10,000, and the total number of students in public elementary and high schools grew from 6,871,000 to 17,813,000 (Kirschenbaum, Simon, & Napier, 1971). While elementary schools continued to report student learning with narratives, the sheer number of students at the secondary level made such descriptive reports burdensome. Secondary schools, eager for a more efficient alternative, began examining techniques used in colleges.
In the late 1700s, Yale was probably the first college to rank student performance into four categories, a practice that evolved into the use of a four-point scale (a precursor of the four-point grade point average). In 1877, Harvard began classifying students using percentages, which was later replaced by classifying students into five groups, the lowest of which failed the class (Durm, 1993). In 1897, Mount Holyoke College adopted a system that combined descriptive adjectives with percentages and letters:
A = Excellent, equivalent to 95ā100 percent
B = Good, equivalent to 85ā94 percent
C = Fair, equivalent to 76ā84 percent
D = Passed (barely), equivalent to 75 percent
E = Failed (below 75 percent) (Durm, 1993, p. 3)
This system of grading, with variations from school to school, evolved to become the standard for sorting and ranking college students and was soon adopted by secondary schools. Letter grades were an easy, efficient method not only for telling students how they were doing, but also for ability-grouping students for instruction. As the number of high school students applying to college increased, colleges starting using high school grades to screen applicants.
In 1912, some powerful research emerged about the lack of consistency in percentage grades. When English exams from two students were scored by 142 different teachers, the scores on one exam ranged from 64 to 98 percent and scores on the other exam ranged from 50 to 97 percent. The same experiment with geometry papers showed even more discrepancy, with the grades ranging from 28 to 95 percent (Starch & Elliott, 1912, 1913). This research was viewed as so damaging to the practice of using percentages that educators began moving away from the 100-point scale to the five categories of A, B, C, D, and F. Fewer categories seemed more "fair."
Around this time, a new method became popularāgrading on the curve. The bell curve, technically called the normal distribution, can actually be traced back to the work of statisticians and mathematicians as early as the 18th century. It became popular in the 20th century after it was shown that many physical and psychological phenomena (such as height) presented as a normal distribution. The use of the bell curve in education became popular when IQ scores of a random group of children were shown to fall into a bell-shaped curve (Curreton, 1971; Jensen, 1980). Grading on the curve was believed to be appropriate because, at that time, the distribution of students' intelligence test scores approximated a normal probability curve. Since innate intelligence and school achievement were thought to be directly related, such a procedure seemed both fair and equitable (Guskey, 1996). The assumption that individual aptitude was fixed and that aptitude varied among students led to the logical conclusion that achievement should also present as a bell curve. Regardless of its validity, the bell curve became popular as a way to produce a "fair" distribution of grades. For the purpose of sorting and ranking students, the bell curve was ideal.
Upon closer investigation, however, the logic was shown to be faulty. The normal curve was not considered statistically valid unless the group was large, random, and untreated (Kelly, 2009). "The normal bell-shaped curve describes the distribution of randomly occurring events when nothing intervenes" (Guskey, 2011, p. 18). But teachers interveneāthey teach with the goal of having all students learn. "If the distribution of student learning after teaching resembles a normal bell-shaped curve, that, too, shows the degree to which our intervention failed. It made no difference" (Guskey, 2011, p. 18). More recent research has also shown that the relationship between aptitude/intelligence and school achievement is dependent upon the appropriateness of instructional conditions (Hanushek, 2004; Hershberg, 2005). "When the instructional quality is high and well matched to students' learning needs, the magnitude of the relationship between aptitude/intelligence and school achievement diminishes drastically and approaches zero" (Guskey, 2011, p. 18). In spite of this fact, the bell curve is still mistakenly revered today as evidence of rigor.
Behaviorism as a Tool for Compliance
If morality and sorting and ranking were what we wanted, behaviorism was the way to reach the goal.
You may find the roots of behaviorism and its counterpart in education, behavior modification, to have germinated from some unusual sourcesāa salivating dog, a ringing bell, and, more recently, some chocolate covered candies. (Freiberg, 1999, p. 5)
Behaviorism, a major contribution to the field of psychology, dates back to the late 17th century. At that time, Edward Thorndike theorized about the Law of Effect, that behavior leading to a positive consequence will be repeated (Kohn, 1999). Skinner's theory of operant conditioning showed that human behavior could be shaped through positive reinforcement. For example, the practice of placing chocolate candies on students' desks was used to reinforce good behavior (Freiberg, 1999).
Pavlov learned that he could get a dog to salivate simply by associating the sound of a ringing bell with food. Pavlov won the Nobel Prize in 1904 for his study of digestion, but the research was applied to a new psychological theory for humans (Freiberg, 1999). As behaviorism grew in popularity, it led to a new approach to classroom management. Behavior modification emerged in the 1960s and rapidly became the dominant philosophy for classroom management. One of the more popular programs was Lee Canter's Assertive Discipline.
Canter's Assertive Discipline model was a structure of rewards and consequences that were used to control student behavior. The most familiar consequence for bad behavior was that of writing the student's name on the board, followed by check marks behind the name for each additional infraction. The name on the board was meant as a warning, the check mark as a consequence. "If you do break this rule again, or any other rule during the day, I'll put a check next to your nameā¦this means that you have chosen to sit for five minutes in the time-out area" (Canter & Canter, 1992, p. 103). A popular reward for good behavior was Marbles in a Jar.
When the class is doing what you want, you take a marble and drop it in a jar. The sound of the marble dropping into the jar immediately lets the students know they are doing what you want and that you recognize their efforts. Each marble can be worth, for example, 30 seconds to one minute of free choice at the end of the day. (Canter, 1976, p. 141)
The widespread use of behavior modification for classroom management generalized to other school practices such as detentions for misbehavior, awards for perfect attendance, and even the use of bells (particularly ironic in light of Pavlov). Behavior management became the dominant paradigm in schools for controlling the behavior of learning (or so we thought) as well as controlling classroom behavior.
Today the idea that behavior can be controlled by rewards and punishment is so embedded in the day-to-day practices of schools, one rarely even notices it (Kohn, 1999). Given the fact that "behaviorism permeates virtually every aspect of American education" (Kohn, 1999, p. 143), it is no surprise that grades have become a major tool for rewarding good. So although grades originally evolved for the purpose of sorting and ranking, they turned out to be quite handy for rewarding virtue and punishing vice. They were an all-purpose toolāfor rewarding not only achievement but behaviors such as compliance and responsibility.
Beliefs About Grading
From the historical forces of morality, sorting, and behaviorism, a culture of grading evolved that was a mix of moralistic views of human nature, the puritan work ethic, and the use of reward and punishment to shape behavior. The culture is evident in a set of beliefs about grading, the quasi-superstitions that drive educational practice. Those beliefs developed from the most honorable motives. As educators we have been concerned not only about intellectual growth, but also moral development and the preparation of children for adulthood. We've used grades for more than academics because we believe our job is more than academicsāour goals have always included shaping children into better people. But our well-meaning beliefs and their unintended consequences deserve closer examination.
Belief #1: Good Teachers Give Bad Grades
As we accepted that the role of the school was to sort and rank, we came to believe that in a rigorous educational system, success was scarce. Scarcity of high grades equaled rigor and only a few should be "winners." (A student once said to her professor, "Well, if everyone got an A, then it doesn't mean anything.") From a practical standpoint, we also realized if there were too many high grades...