Washback refers to the influence of language testing on teaching and learning. This volume, at the important intersection of language testing and teaching practices, presents theoretical, methodological, and practical guidance for current and future washback studies.

In the field of language testing, researchers' major interest has traditionally been focused on issues and solving problems inherent in tests in order to increase their reliability and validity. However, the washback effect goes well beyond the test itself to include factors, such as curriculum, teacher and learner behaviors inside and outside the classroom, their perceptions of the test, and how test scores are used. Only recently have researchers started to empirically investigate the phenomenon of washback. This volume of such research serves two essential purposes by:

*providing an overview of the complexity of washback and the various contextual factors entangled within testing, teaching, and learning; and
*presenting empirical studies from around the world that offer insights into the effects of washback in specific educational contexts and models of research on which future studies can be based.

The extensive use of test scores for various educational and social purposes in society nowadays makes the washback effect a high-interest phenomenon in the day-to-day educational activities of teachers, researchers, program coordinators/directors, policymakers, and others in the field of education. Washback in Language Testing: Research Contexts and Methods is a valuable resource for those who are interested in the application of findings to actual teaching and learning situations or conduct washback research in their own contexts, including educational and psychological testing experts, as well as alternative assessment people in all fields, and for policy- and decision-makers in educational and testing organizations.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

PART
I

CONCEPTS AND METHODOLOGY OF WASHBACK

CHAPTER
1 Washback or Backwash: A Review of the Impact of Testing on Teaching and Learning

Liying Cheng
Andy Curtis
Queen’s University

Washback or backwash, a term now commonly used in applied linguistics, refers to the influence of testing on teaching and learning (Alderson & Wall, 1993), and has become an increasingly prevalent and prominent phenomenon in education—“what is assessed becomes what is valued, which becomes what is taught” (McEwen, 1995a, p. 42). There seems to be at least two major types or areas of washback or backwash studies—those relating to traditional, multiple-choice, large-scale tests, which are perceived to have had mainly negative influences on the quality of teaching and learning (Madaus & Kellaghan, 1992; Nolan, Haladyna, & Haas, 1992; Shepard, 1990), and those studies where a specific test or examination¹ has been modified and improved upon (e.g., performance-based assessment), in order to exert a positive influence on teaching and learning (Linn & Herman, 1997; Sanders & Horn, 1995). The second type of studies has shown, however, positive, negative, or no influence on teaching and learning. Furthermore, many of those studies have turned to focus on understanding the mechanism of how washback or backwash is used to change teaching and learning (Cheng, 1998a; Wall, 1999).

WASHBACK: THE DEFINITION AND ORIGIN

Although washback is a term commonly used in applied linguistics today, it is rarely found in dictionaries. However, the word backwash can be found in certain dictionaries and is defined as “the unwelcome repercussions of some social action” by the New Webster’s Comprehensive Dictionary, and “unpleasant after-effects of an event or situation” by the Collins Cobuild Dictionary. The negative connotations of these two definitions are interesting, as they inadvertently touch on some of the negative responses and reactions to the relationships between teaching and testing, which we explore in more detail shortly.

Washback (Alderson & Wall, 1993) or backwash (Biggs, 1995, 1996) here refers to the influence of testing on teaching and learning. The concept is rooted in the notion that tests or examinations can and should drive teaching, and hence learning, and is also referred to as measurement-driven instruction (Popham, 1987). In order to achieve this goal, a “match” or an overlap between the content and format of the test or the examination and the content and format of the curriculum (or “curriculum surrogate” such as the textbook) is encouraged. This is referred to as curriculum alignment by Shepard (1990, 1991b, 1992, 1993). Although the idea of alignment—matching the test and the curriculum—has been descried by some as “unethical,” and threatening the validity of the test (Haladyna, Nolen, & Haas, 1991, p. 4; Widen, O’Shea, & Pye, 1997), such alignment is evident in a number of countries, for example, Hong Kong (see Cheng, 1998a; Stecher, Barron, Chun, Krop, & Ross, 2000). This alignment, in which a new or revised examination is introduced into the education system with the aim of improving teaching and learning, is referred to as systemic validity by Frederiksen and Collins (1989), consequential validity by Messick (1989, 1992, 1994, 1996), and test impact by Bachman and Palmer (1996) and Baker (1991).

Wall (1997) distinguished between test impact and test washback in terms of the scope of the effects. According to Wall, impact refers to “. . . any of the effects that a test may have on individuals, policies or practices, within the classroom, the school, the educational system or society as a whole” (see Stecher, Chun, & Barron, chap. 4, this volume), whereas washback (or backwash) is defined as “the effects of tests on teaching and learning” (Wall, 1997, p. 291).

Although different terms are preferred by different researchers, they all refer to different facets of the same phenomenon—the influence of testing on teaching and learning. The authors of this chapter have chosen to use the term washback, as it is the mostly commonly used in the field of applied linguistics.

The study of washback has resulted in recent developments in language testing, and measurement-driven reform of instruction in general education. Research in language testing has centered on whether and how we assess the specific characteristics of a given group of test takers and whether and how we can incorporate such information into the ways in which we design language tests. One of the most important theoretical developments in language testing in the past 30 years has been the realization that a language test score represents a complex of multiple influences. Language test scores cannot be interpreted simplistically as an indicator of the particular language ability we think we are measuring. The scores are also affected by the characteristics and contents of the test tasks, the characteristics of the test takers, the strategies test takers employ in attempting to complete the test tasks, as well as the inferences we draw from the test results. These factors undoubtedly interact with each other.

Nearly 20 years ago, Alderson (1986) identified washback as a distinct—and at that time emerging—area within language testing, to which we needed to turn our attention. Alderson (1986) discussed the “potentially powerful influence offsets” (p. 104) and argued for innovations in the language curriculum through innovations in language testing (also see Wall, 1996, 1997, 2000). At around the same time, Davies (1985) was asking whether tests should necessarily follow the curriculum, and suggested that perhaps tests ought to lead and influence the curriculum. Morrow (1986) extended the use of washback to include the notion of washback validity, which describes the relationship between testing, and teaching and learning (p. 6). Morrow also claimed that “. . . in essence, an examination of washback validity would take testing researchers into the classroom in order to observe the effects of their tests in action” (p. 6). This has important implications for test validity.

Looking back, we can see that examinations have often been used as a means of control, and have been with us for a long time: a thousand years or more, if we include their use in Imperial China to select the highest officials of the land (Arnove, Altback, & Kelly, 1992; Hu, 1984; Lai, 1970). Those examinations were probably the first civil service examinations ever developed. To avoid corruption, all essays in the Imperial Examination were marked anonymously, and the Emperor personally supervised the final stage of the examination. Although the goal of the examination was to select civil servants, its washback effect was to establish and control an educational program, as prospective mandarins set out to prepare themselves for the examination that would decide not only their personal fate but also influence the future of the Empire (Spolsky, 1995a, 1995b).

The use of examinations to select for education and employment has also existed for a long time. Examinations were seen by some societies as ways to encourage the development of talent, to upgrade the performance of schools and colleges, and to counter to some degree, nepotism, favoritism, and even outright corruption in the allocation of scarce opportunities (Bray & Steward, 1998; Eckstein & Noah, 1992). If the initial spread of examinations can be traced back to such motives, the very same reasons appear to be as powerful today as ever they were. Linn (2000) classified the use of tests and assessments as key elements in relation to five waves of educational reform over the past 50 years: their tracking and selecting role in the 1950s; their program accountability role in the 1960s; minimum competency testing in the 1970s; school and district accountability in the 1980s; and the standards-based accountability systems in the 1990s (p. 4). Furthermore, it is clear that tests and assessments are continuing to play a crucial and critical role in education into the new millennium.

In spite of this long and well-established place in educational history, the use of tests has, constantly, been subject to criticism. Nevertheless, tests continue to occupy a leading place in the educational policies and practices of a great many countries (see Baker, 1991; Calder, 1997; Cannell, 1987; Cheng, 1997, 1998a; Heyneman, 1987; Heyneman & Ransom, 1990; James, 2000; Kellaghan & Greaney, 1992; Li, 1990; Macintosh, 1986; Runte, 1998; Shohamy, 1993a; Shohamy, Donitsa-Schmidt, & Ferman, 1996; Widen et al., 1997; Yang, 1999; and chapters in Part II of this volume). These researchers, and others, have, over many years, documented the impact of testing on school and classroom practices, and on the personal and professional lives and experiences of principals, teachers, students, and other educational stakeholders.

Aware of the power of tests, policymakers in many parts of the world continue to use them to manipulate their local educational systems, to control curricula and to impose (or promote) new textbooks and new teaching methods. Testing and assessment is “the darling of the policy-makers” (Madaus, 1985a, 1985b) despite the fact that they have been the focus of controversy for as long as they have existed. One reason for their longevity in the face of such criticism is that tests are viewed as the primary tools through which changes in the educational system can be introduced without having to change other educational components such as teacher training or curricula. Shohamy (1992) originally noted that “this phenomenon [washback] is the result of the strong authority of external testing and the major impact it has on the lives of test takers” (p. 513). Later Shohamy et al. (1996; see also Stiggins & Faires-Conklin, 1992) expanded on this position thus:

the power and authority of tests enable policy-makers to use them as effective tools for controlling educational systems and prescribing the behavior of those who are affected by their results—administrators, teachers and students. School-wide exams are used by principals and administrators to enforce learning, while in classrooms, tests and quizzes are used by teachers to impose discipline and to motivate learning. (p. 299)

One example of these beliefs about the legislative power and authority of tests was seen in 1994 in Canada, where a consortium of provincial ministers of education instituted a system of national achievement testing in the areas of reading, language arts, and science (Council of Ministers of Education, Canada, 1994). Most of the provinces now require students to pass centrally set school-leaving examinations as a condition of school graduation (Anderson, Muir, Bateson, Blackmore, & Rogers, 1990; Lock, 2001; Runte, 1998; Widen, O’Shea, & Pye, 1997).

Petrie (1987) concluded that “it would not be too much of an exaggeration to say that evaluation and testing have become the engine for implementing educational policy” (p. 175). The extent to which this is true depends on the different contexts, as shown by those explored in this volume, but a number of recurring themes do emerge. Examinations of various kinds have been used for a very long time for many different purposes in many different places. There is a set of relationships, planned and unplanned, positive and negative, between teaching and testing. These two facts mean that, although washback has only been identified relatively recently, it is likely that washback effects have been occurring for an equally long time. It is also likely that these teaching–testing relationships are likely to become closer and more complex in the future. It is therefore essential that the education community work together to understand and evaluate the effects of the use of testing on all of the interconnected aspects of teaching and learning within different education systems.

WASHBACK: POSITIVE, NEGATIVE, NEITHER OR BOTH?

Movement in a particular direction is an inherent part of the use of the washback metaphor to describe teaching–testing relationships. For example, Pearson (1988) stated that “public examinations influence the attitudes, behaviors, and motivation of teachers, learners and parents, and, because examinations often come at the end of a course, this influence is seen working in a backward direction—hence the term ‘washback’ ” (p. 98). However, like Davies (1985), Pearson believed that the direction in which washback actually works must be forward (i.e., testing leading teaching and learning).

The potentially bidirectional nature of washback has been recognized by, for example, Messick (1996), who defined washback as the “extent to which a test influences language teachers and learners to do things they would not necessarily otherwise do that promote or inhibit [emphasis added] language learning” (p. 241, as cited in Alderson & Wall, 1993, p. 117). Wall and Alderson also noted that “tests can be powerful determiners, both positively and negatively, [emphasis added] of what happens in classrooms” (Alderson & Wall, 1993, p. 117; Wall & Alderson, 1993, p. 41).

Messick (1996) went on to comment that some proponents have even maintained that a test’s validity should be appraised by the degree to which it manifests positive or negative washback, which is similar to Frederiksen and Collins’ (1989) notion of systemic validity.

Underpinning the notion of direction is the issue of what it is that is being directed. Biggs (1995) used the term backwash (p. 12) to refer to the fact that testing drives not only the curriculum, but also the teaching methods and students’ approaches to learning (Crooks, 1988; Frederiksen, 1984; Frederiksen & Collins, 1989). However, Spolsky (1994) believed that “backwash is better applied only to accidental side-effects of examinations, and not to those effects intended when the first purpose of the examination is control of the curriculum” (p. 55). In an empirical study of an intended public examination change on classroom teaching in Hong Kong, Cheng (1997, 1998a) combined movement and motive, defining washback as “an intended direction and function of curriculum change, by means of a change of public examinations, on aspects of teaching and learning” (Cheng, 1997, p. 36). As Cheng’s study showed, when a public examination is used as a vehicle for an intended curriculum change, unintended and accidental side effects can also occur, that is, both negative and positive influence, as such change involves elaborate and extensive webs of interwoven causes and effects.

Whether the effect of testing is deemed to be positive or negative should also depend on who it is that actually conducts the investigation within a particular education context, as well as where, the school or university contexts, when, the time and duration of using such assessment practices, why, the rationale, and how, the different approaches used by different participants within the context.

If the potentially bidirectional nature of washback is accepted, and movement in a positive direction is accepted as the aim, the question then becomes methodological, that is, how to bring about this positive movement. After considering several definitions of washback, Bailey (1996) concluded that more empirical research needed to be carried out in order to document its exact nature and mechanisms, while also identifying “concerns about what constitutes both positive and negative washback, as well as about how to promote the former and inhibit the latter” (p. 259).

According to Messick (1996), “for optimal positive washback there should be little, if any, difference between activities involved in learning the language and activities involved in preparing for the test” (pp. 241–242). However, the lack of simple, one-to-one relationships in such complex systems was highlighted by Messick (1996): “A poor test may be associated with positive effects and a good test with negative effects because of other things that are done or not done in the education system” (p. 242). In terms of complexity and validity, Alderson and Wall (1993) argued that washback is “likely to be a complex phenomenon which cannot be related directly to a test’s validity” (p. 116). The washback effect should, therefore, refer to the effects of the test itself on aspects of teaching and learning.

The fact that there are so many other forces operating within any education context, which also contribute to or ensure the washback effect on teaching and learning, has been demonstrated in several washback studies (e.g., Anderson et al., 1990; Cheng, 1998b, 1999; Herman, 1992; Madaus, 1988; Smith, 1991a, 1991b; Wall, 2000; Watanabe, 1996a; Widen et al., 1997). The key issue here is how those forces within a particular educational context can be teased out to understand the effects of testing in that environment, and how confident we can be in formulating hypotheses and drawing conclusions about the nature and the scope of the effects within broader educational contexts.

Negative Washback

Tests in general, and perhaps language tests in particular, are often criticized for their negative influence on teaching—so-called “negative washback”—which has long been identified as a potential problem. For example, nearly 50 years ago, Vernon (1956) claimed that teachers tended to ignore subjects and activities that did not contribute directly to passing the exam, and that examinations “distort the curriculum” (p. 166). Wiseman (1961) believed that paid coaching classes, which were intended for preparing students for exams, were not a good use of the time, because students were practicing exam techniques rather than language learning activities (p. 159), and Davies (1968) believed that testing devices had become teaching devices; that teaching and learning was effectively being directed to past examination papers, making the educational experience narrow and uninteresting (p. 125).

More recently, Alderson and Wall (1993) referred to negative washback as the undesirable effect on teaching and learning of a particular test deemed to be “poor” (p. 5). Alderson and Wall’s poor here means “something that the teacher or learner does not wish to teach or learn.” The tests may well fail to reflect the learning principles or the course objectives to which they are supposedly related. In reality, teachers and learners may end up teaching and learning toward the test, regardless of whether or not they support the test or fully understand its rationale or aims.

In general education, Fish (1988) found that teachers reacted negatively to pressure created by public displays of classroom scores, and also found that relatively inexperienced teachers felt greater anxiety and accountability pressure than experienced teachers, showing the influence of factors such as age and experience. Noble and Smith (1994a) also found that high-stakes testing could affect teachers directly and negatively (p. 3), and that “teaching test-taking skills and drilling on multiple-choice worksheets is likely to boost the scores but unlikely to promote general understanding” (1994b, p. 6). From an extensive qualitative study of the role of external testing in elementary schools in the United States, Smith (1991b) listed a number of damaging effects, as the “testing progr...

Cover Page
Title Page
Copyright Page
Foreword
Preface
About the Authors
Part I Concepts and Methodology of Washback
Part II Washback Studies from Different Parts of the World
References

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Washback in Language Testing by Liying Cheng, Yoshinori Watanabe, WITH Andy Curtis, Liying Cheng,Yoshinori Watanabe,WITH Andy Curtis in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

PART I CONCEPTS AND METHODOLOGY OF WASHBACK

CHAPTER 1

Washback or Backwash: A Review of the Impact of Testing on Teaching and Learning

WASHBACK: THE DEFINITION AND ORIGIN

WASHBACK: POSITIVE, NEGATIVE, NEITHER OR BOTH?

Negative Washback

Table of contents

Frequently asked questions

PART
I

CONCEPTS AND METHODOLOGY OF WASHBACK

CHAPTER
1