eBook - ePub

Teacher Evaluation That Makes a Difference

Name: Teacher Evaluation That Makes a Difference
Author: Robert J. Marzano,Michael D. Toth

A New Model for Teacher Growth and Student Achievement

Robert J. Marzano,

Michael D. Toth,

192 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Teacher Evaluation That Makes a Difference

A New Model for Teacher Growth and Student Achievement

Robert J. Marzano,

Michael D. Toth,

About this book

In this essential new book, best-selling author and researcher Robert J. Marzano and teacher-effectiveness expert Michael D. Toth lay out a framework for the "next generation" of teacher evaluation: a model focused primarily on helping educators develop and improve their practice. By taking into account multiple accurate, data-rich measures of teacher performance and student growth, the model ensures that all teachers receive fair, meaningful, and reliable evaluations. The book includes

* Standards, rubrics, and suggested rating methodologies;
* A detailed, five-phase plan for implementing the model;
* Guidelines for calibrating evaluation criteria according to teachers' experience levels;
* A step-by-step guide to creating a coaching program for teachers who require intervention; and
* Recommendations for using technology platforms to enable teacher collaboration.

Teacher evaluation has too often focused on affixing ratings to teachers rather than helping them grow into mastery. The practical, field-tested model proposed in Teacher Evaluation That Makes a Difference has everything your school or district needs to provide teachers—and, by extension, their students—with the support necessary for success.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Teacher Evaluation That Makes a Difference by Robert J. Marzano,Michael D. Toth in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Chapter 1

The Changing Landscape of Teacher Evaluation

. . . . . . . . . . . . . . . . . . . .

Both the rhetoric and substance of teacher evaluation have changed dramatically over the last few years, due, in part, to a number of commentaries that have made strong claims regarding the inadequacies of traditional teacher evaluation systems. For example, Toch and Rothman (2008) said of traditional evaluation practices that they are "superficial, capricious, and often don't even directly address the quality of instruction, much less measure students' learning" (p. 1). Similarly, Weisberg, Sexton, Mulhern, and Keeling (2009) explained that teacher evaluation systems have traditionally failed to provide accurate and credible information about the effectiveness of individual teacher's instructional performance. A 2012 report from the Bill and Melinda Gates Foundation entitled Gathering Feedback for Teaching summarized the failings of teacher evaluation systems in the following way:

The nation's collective failure to invest in high-quality professional feedback to teachers is inconsistent with decades of research reporting large disparities in student learning gains in different teachers' classrooms (even within the same schools). The quality of instruction matters. And our schools pay too little attention to it. (p. 3)

Examples of similar sentiments abound in current discussions of teacher evaluation reform (e.g., Kelley, 2012; Strong, 2011).

Evidence for the Need for Change

Claims like those cited above have credible evidence supporting them. One can make a case that evidence impugning teacher evaluation started to accrue in the 1980s as a result of a study conducted by the RAND group entitled Teacher Evaluation: A Study of Effective Practices (Wise, Darling-Hammond, McLaughlin, & Bernstein, 1984). Along with their general finding that teacher evaluation systems were not specific enough to increase teachers' pedagogical skills, the researchers noted that teachers were the biggest critics of their current, narrative evaluation systems and the strongest proponents of a more specific and rigorous approach: "In their view, narrative evaluation provided insufficient information about the standards and criteria against which teachers were evaluated and resulted in inconsistent ratings among schools" (Wise et al., 1984, p. 16). Since this study first appeared, evidence of the inadequacies of teacher evaluation systems and commentary on that evidence has been mounting in the research and theoretical literature (e.g., Glatthorn, 1984; McGreal, 1983; Glickman, 1985; Danielson, 1996).

Without question, two reports, both of which we cited previously, catapulted the topic of inadequacies of teacher evaluation into the limelight: Rush to Judgment (Toch & Rothman, 2008) and The Widget Effect (Weisberg et al., 2009). Rush to Judgment detailed a study that found that 87 percent of the 600 schools in the Chicago school system did not give a single unsatisfactory rating of their teachers even though over 10 percent of those schools had been classified as failing educationally. In total, only 0.3 percent of all teachers in the system were rated as "unsatisfactory." By contrast, 93 percent of the city's 25,000 teachers received "excellent" or "superior" ratings.

The Widget Effect derives its name from the fact that teacher evaluation systems have traditionally not discriminated between effective and ineffective teachers:

The Widget Effect describes the tendency of school districts to assume classroom effectiveness is the same from teacher to teacher…. In its denial of individual [teacher] strengths and weaknesses, it is deeply disrespectful to teachers; in its indifference to instructional effectiveness, it gambles with the lives of students. (Weisberg et al., 2009, p. 4)

The authors of The Widget Effect found that, in a district with 34,889 tenured teachers, only 0.4 percent received the lowest rating, whereas 68.75 percent received the highest rating. These findings and others were publicized in the popular 2010 movie Waiting for ‘Superman.’ This movie, along with a veritable flood of commentaries on local and national news shows, brought the issue of teacher evaluation into sharp relief.

By the end of the first decade of the new century, the inadequacies of teacher evaluation systems were well known and a matter of public discussion. This enhanced level of public awareness, along with federal legislation, placed educator evaluation in the spotlight.

The Federal Impetus for Evaluation Reform

On July 24, 2009, President Barack Obama and Secretary of Education Arne Duncan announced the $4.35 billion education initiative Race to the Top (RTT). Designed to spur nationwide education reform in K–12 schools, the grant program was a major component of the American Recovery and Reinvestment Act of 2009. The program offered states significant funding if they were willing to overhaul their teacher evaluation systems. To compete, states had to agree to implement new systems that would weight student learning gains as part of teachers' yearly evaluation scores and had to implement performance-based standards for teachers and principals. The U.S. Department of Education's A Blueprint for Reform (2010) stated: "We will elevate the teaching profession to focus on recognizing, encouraging, and rewarding excellence. We are calling on states and districts to develop and implement systems of teacher and principal evaluation and support, and to identify effective and highly effective teachers on the basis of student growth and other factors" (p. 4). The report went on to explain: "Grantees must be able to differentiate among teachers and principals on the basis of their students' growth and other measures, and must use this information to differentiate, as applicable, credentialing, professional development, and retention and advancement decisions, and to reward highly effective teachers and principals in high-need schools" (p. 16).

In addition to stimulating the discussion about teacher evaluation, RTT legislation generated substantive and concrete change. A Center for American Progress report released in March 2012 noted that "Overall, we found that although a lot of work remains to be done, RTT has sparked significant school reform efforts and shows that significant policy changes are possible" (Boser, 2012, p. 3). The author went on to say:

We suffer under no illusion that a single competitive grant program will sustain a total revamping of the nation's education system. Nor do we believe that a program like RTT will be implemented exactly as it was imagined—one of the goals of the program was to figure out what works when it comes to education reform. Yet two things have become abundantly clear. There's a lot that still needs to be done when it comes to Race to the Top, and many states still have some of the hardest work in front of them. But it's also clear that a program like Race to the Top holds a great deal of promise and can spark school reform efforts and show that important substantive changes to our education system can be successful. (p. 5)

Currently, the two major changes being implemented in teacher evaluation are directly traceable to RTT legislation: (1) use of measures of student growth as indicators of teacher effectiveness, and (2) more rigor in measuring the pedagogical skills of teachers. Both of these initiatives come with complex issues in tow.

Issues with Measuring Student Growth

As we have seen, including measures of students' growth in teacher evaluation systems is not only a popular idea, but an explicit part of RTT legislation. There is an intuitive appeal to using such measures and some literature supporting this practice. For example, a report from the Manhattan Institute for Policy Research (Winters, 2012) noted:

On this last point, modern statistical tools present a promising avenue for reform. These measures, used in tandem with traditional subjective measures of teacher quality, could help administrators make better-informed decisions about which teachers should receive tenure and which should be denied it. Statistical evaluations can also be used to identify experienced teachers who are performing poorly, with an objectivity that reduces the risk of a teacher being persecuted by an administrator. (p. 2)

The report further explained that growth measures "can be a useful piece of a comprehensive evaluation system. Claims that it is unreliable should be rejected. [Value-added measures], when combined with other evaluation methods and well-designed policies, can and should be part of a reformed system that improves teacher quality and thus gives America's public school pupils a better start in life" (p. 7). Similar conclusions were reported in a study by the National Bureau of Economic Research (Chetty, Friedman, & Rockoff, 2011):

Students assigned to … teachers [with high value-added scores] are more likely to attend college, attend higher-ranked colleges, earn higher salaries, live in higher [socioeconomic status] neighborhoods, and save more for retirement. They are also less likely to have children as teenagers. Teachers have large impacts in all grades from 4 to 8. On average, a one standard deviation improvement in teacher [value-added scores] in a single grade raises earnings by about 1% at age 28. (p. 2)

The term commonly used to describe measures of student growth is value-added measure (VAM). In laymen's terms, a VAM is a measure of how much a student has learned since some designated point in time (e.g., the beginning of the school year). State-level tests are typically used to compute VAM scores for each student, and the average VAM score for a teacher's class is used as a measure of the teacher's impact on students. An assumption underlying the use of VAMs is that teachers whose students have higher VAM scores are doing a better job than teachers whose students have lower scores. As intuitively logical as this might seem, many researchers and theorists strongly object to using VAMs as a component of teacher evaluation. For example, Darling-Hammond, Amrein-Beardsley, Haertel, and Rothstein (2012) articulated a comprehensive critique of the assumptions underlying the use of VAMs. They began by noting:

Using VAMs for individual teacher evaluation is based on the belief that measured achievement gains for a specific teacher's students reflect that teacher's "effectiveness." This attribution, however, assumes that student learning is measured well by a given test, is influenced by the teacher alone, and is independent from the growth of classmates and other aspects of the classroom context. None of these assumptions is well supported by current evidence. (p. 8)

The authors then listed three criticisms of VAMs that they claimed rendered them inappropriate as high-stakes measures of teacher effectiveness:

Criticism #1: VAMs of teacher effectiveness are inconsistent. Research indicates that a teacher's VAM score can change rather dramatically from year to year. For example, Darling-Hammond and colleagues cited a study by Newton, Darling-Hammond, Haertel, and Thomas (2010) that examined VAM data from five school districts. The researchers found that of the teachers who scored in the bottom 20 percent of rankings one year, only 20 to 30 percent scored in the bottom 20 percent the next year while 25 to 45 percent moved to the top part of the distribution. These changes might have little or nothing to do with an increase or decrease in teacher competence but a great deal to do with differences in students from year to year.

Criticism #2: VAM scores differ significantly when different methods are used to compute them and when different tests are used. Equations used to compute VAMs can take a variety of forms, which we discuss in greater detail in Chapter 2. For now, let's simply say that equations used to compute VAMs can differ in the variables they use to predict student achievement and in the weights given to those variables. For example, one type of VAM equation might rely heavily on measures of student achievement in prior years, whereas another type might not. Darling-Hammond and colleagues cited studies indicating that different equations can produce rather dramatically different teacher rankings: "For example, when researchers used a different model to recalculate the value-added scores for teachers published in the Los Angeles Times in 2011, they found that from 40% to 55% of teachers would get noticeably different scores" (p. 9). In other words, teacher rankings can change based on the type of VAM equation used.

Additionally, tests that purportedly measure the same content can produce different VAM scores (Bill & Melinda Gates Foundation, 2011; Lockwood et al., 2007). If, for example, two different tests of mathematics achievement are used within a district, teacher rankings based on these two different measures could vary considerably. Darling-Hammond and colleagues noted that "[t]his raises concerns about measurement error and … the effects of emphasizing ‘teaching to the test’ at the expense of other kinds of learning, especially given the narrowness of most tests in the United States" (p. 9).

Criticism #3: Ratings based on VAMs can't disentangle the many influences on student progress. Darling-Hammond and colleagues concluded that teacher effectiveness "is not a stable enough construct to be uniquely identified even under ideal conditions" (p. 11). For example, a teacher might be very effective with one group of students but not with another. To illustrate, the authors cited the example of an 8th grade science teacher with low VAM scores who exchanged classes with a 6th grade science teacher who had high VAM scores under the assumption that the 6th grade teacher would be able to produce better learning with the 8th grade teacher's students. Instead, the 8th grade teacher started to receive high VAM scores with the 6th grade students and the 6th grade teacher started to receive low VAM scores with the 8th grade students. Darling-Hammond and colleagues note: "This example of two teachers whose value-added ratings flip-flopped when they exchanged assignments is an example of a phenomenon found in other studies that document a larger association between the class taught and value-added ratings than the individual teacher effect itself" (p. 12).

Issues with Measuring Teachers' Pedagogical Skills

In Chapter 3, we consider effective techniques for measuring teacher pedagogical skill. Here, we briefly introduce the topic and place it in the context of research on teacher effectiveness.

Over the years, the research has been consistent regarding the powerful effects teachers can have on their students' achievement. Many large-scale studies have provided evidence to this end. Three have been particularly influential. The first study, conducted in the mid-1990s, involved five subject areas (mathematics, reading, language arts, social studies, and science) and some 60,000 students across grades 3 through 5 (Wright, Horn, & Sanders, 1997). The authors' overall conclusion was as follows:

The results of this study well document that the most important factor affecting student learning is the teacher. In addition, the results show wide variation in effectiveness among teachers. The immediate and clear implication of this finding is that seemingly more can be done to improve education by improving the effectiveness of teachers than by any other single factor. Effective teachers appear to be effective with students of all achievement levels regardless of the levels of heterogeneity in their classes [emphasis in original]. If the teacher is ineffective, students under that teacher's tutelage will achieve inadequate progress academically, regardless of how similar or different they are regarding their academic achievement. (Wright et al., 1997, p. 63)

The second study conducted in the early 2000s (Nye, Konstantopoulos, & Hedges, 2004) involved 79 elementary schools in 42 school districts in Tennessee. It is noteworthy in that it also involved random assignment of students to classes and controlled for factors such as students' previous achievement, socioeconomic status, ethnicity, and gender, as well as class size and whether or not an aide was present in class. The study authors reported:

These findings would suggest that the difference in achievement gains between having a 25th percentile teacher (a not so effective teacher) and a 75th percentile teacher (an effective teacher) is over one-third of a standard deviation (0.35) in reading and almost half a standard deviation (0.48) in mathematics. Similarly, the difference in achievement gains between having a 50th percentile teacher (an average teacher) and a 90th percentile teacher (a very effective teacher) is about one-third of a standard deviation (0.33) in reading and somewhat smaller than half a standard deviation (0.46) in mathematics…. These effects are certainly large enough effects to have policy significance. (Nye et al., 2004, p. 253)

The third study was designed to determine the persistence of teacher effects in elementary grades and the extent to which these effects are persistent over multiple years (Konstantopoulos & Chung, 2011). After examining data from over 2,500 students across multiple grades, the authors concluded:

In sum, the results of this study are robust and consistently show that teachers matter in early grades. The effects of teachers persist through the sixth grade for all achievement tests. In addition, the cumulative teacher effects were substantial and highlighted the importance of having effective teachers for multiple years in elementary grades. (Konstantopoulos & Chung, 2011, p. 384)

For well over a decade, the research has consistently demonstrated that an individual classroom teacher can have a powerful, positive effect on the learning of his or her students. To dramatize the research findings over the years, Strong (2011) cited the extensive research and commentary of the economist Eric Hanushek (Hanushek, 1971, 1992, 1996, 1997, 2003, 2010; Hanushek, Kain, & Rivkin, 2004; Hanushek & Rivkin, 2006; Hanushek, Rivkin, Rothstein, & Podgursky, 2004). Basing his conclusions on Hanushek's work, Strong noted "the economic value of having a higher-quality teacher, such that a teacher who is significantly above aver...