Part I
The power of tests
Traditional testing focuses on the quality of tests, following accepted models and procedures for maximizing the accuracy of tests. Yet, very little attention is given to how tests are used, their importance in the lives of test takers and their place in society. Traditional testing does not pay much attention to the testing experience, or to the meanings and feelings that tests evoke in test takers. However, by listening to the voices of test takers it is possible to obtain evidence of the power of tests and the detrimental decisions they lead to for test takers. The use of tests as disciplinary tools by those in authority causes test takers to fear tests but at the same time to obey their rules.
What are the special features that tests possess that legitimize their power and influence? How did an instrument intended to democratize and provide equal opportunities turn into a threatening device? What features of tests tempt those in authority to use them for policy making? How did tests emerge as tools used for policy making? These are some of the topics that will be addressed in Part I of this book in an attempt to establish the dimensions of the power of tests.
1
āUse-orientedā testing
Traditional testing
Traditional testing is a scientific field, with precise boundaries and criteria. It consists of a well-defined and systematic body of knowledge. Its main focus and purpose is the creation of quality tests that can accurately measure the knowledge of those tested. Results obtained from tests are used for comparing scores of test takers, classifying test takers into appropriate proficiency levels, assigning grades and accepting or rejecting test takers. Tests, therefore, need to be of high quality and follow the careful rules of the science of psychometrics.
Testing is therefore a professional field with strict rules and applications as to what constitutes appropriate practice. High-quality tests are expected to provide the users with precise answers as to the knowledge being measured. It employs a variety of techniques for developing high-quality test items and tasks, most often of the objective mode. Its body of knowledge includes topics such as methods for computing different types of reliability (i.e. how accurate test scores are), obtaining evidence of validity (i.e. the extent to which tests measure what they are expected to measure) and procedures for examining the quality of items and tasks (i.e. the extent to which test items and tasks measure the content being tested).
Traditional testing has relied mostly on objective type items, as these minimize statistical unreliability. It rarely deviates from this as the model seems safe. While other testing methods, such as summaries, reports, and role plays, are becoming widely used, they are often accompanied by cautionary advice regarding their accuracy (Nitko, 1996). Even the uses of now popular procedures such as portfolios, self-assessment and peer assessment require that these procedures be subject to criteria judgements typical of objective testing. Only if, and when, such procedures demonstrate that they possess the ātraditionalā psychometric properties can they be accepted as legitimate āmembersā of the ātraditional testing clubā.
In traditional testing the focus is primarily on the test; the test taker is important only as a means for examining the quality of the test. The only time the test taker is mentioned is in discussing the difficulty, discrimination and other indices so that āgoodā test takers get items right while the ābadā ones get them wrong. āGoodā and ābadā are generally defined by the performance on the test being examined. Rarely are there any investigations or discussions as to the sources, causes and reasons that make test items good or bad, easy or difficult. Is it that the teaching that preceded the test was ineffective, that the material tested was too difficult, that the test taker was absent from class when the material had been taught, or that the test items required cognitive processing that the test taker did not possess? The general rule is that the test takers need to match their performances to the tests rather than the tests to the test takers.
Traditional testing, then, is not interested in test use. Once the test is designed and developed, its items written and administered, its format piloted, items and statistics computed, reliability calculated and evidence of validity obtained, the role of the tester is complete. The task ends when psychometrically sound results are satisfactorily achieved. This is the point at which the test is being delivered to those who contracted it and is ready to be used with āreal lifeā people, the test takers.
Thus, traditional testing is not interested in the motives for introducing tests, in the intentions and rationale for using tests or in the examinations of whether intentions were fulfilled. It is not interested in the steps taken in preparation for tests or in how test takers feel about tests. It is especially not interested in the consequences of tests and their effects on those who failed or succeeded in them. It also overlooks how the test affected knowledge, learning patterns and habits. Traditional testing views tests as isolated events, detached from people, society, motives, intentions, uses, impacts, effects and consequences.
āUse-orientedā testing
āUse-orientedā testing views testing as embedded in educational, social and political contexts. It addresses issues related to the rationale for giving tests and the effects that tests have on test takers, education and society. It is concerned with what happens to the test takers who take the tests, the knowledge that is created by tests, the teachers who prepare for the tests, the materials and methods used for tests, the decisions to introduce tests, the uses of the results of tests, the parents whose children are subject to the tests, the ethicality and fairness of the tests, and the long- and short-term consequences that tests have on education and society.
In examining the use of tests, attention is given to the reasons and intentions for introducing tests, to the test takers who take the tests, to the teachers who teach for tests, to the students who practise for the tests, to the knowledge created by tests, to ethical and fair behaviours of tests, to the rationale for introducing tests, to the educational systems where tests are used, and to the effects and consequences that tests have on education and society.
There is also a realization that while testers are busy creating āthe perfectā tests, these tests are often used for purposes other than those for which they were intended. This is especially noted with regard to commercial enterprises, government agencies and organizations that use tests in ways which some would consider to be unethical. There is therefore a growing awareness of the need to examine tests from broader and more expanded perspectives consisting of various dimensions of the use of tests.
While for many years professionalism in testing meant the development of high-quality tests that pass accuracy criteria, some testers are realizing that it is not enough, as tests cannot be viewed as neutral instruments. There is therefore a growing concern about the power of tests and their uses in society.
In the field of testing, issues about the use of tests - i.e. intentions, effects and consequences - were considered to be external to traditional testing but there has recently been a renewed interest in this topic. Messick (1981, 1989, 1994, 1996), for example, claims that tests embody values that too often are unrecognized and unexamined, as they are connected to psychological, social and political variables that have effects on curriculum, ethicality, social classes, bureaucracy, politics and knowledge. He, therefore, emphasized the need to study aspects related to the consequences of tests and noted that such aspects should be considered as part of a broader definition of validity, as they involve questions of values and consequences of score interpretation and test use:
The consequential aspect of construct validity includes evidence and rationale for evaluating the intended and unintended consequences of score interpretation and use in both the short and long term, especially those associated with bias in scoring and interpretation, with unfairness in test use, and with positive or negative washback effects on teaching and learning.
(Messick, 1994: 251)
Gipps (1994) interprets this phenomenon by claiming that testing is experiencing a shift from a purely technical perspective to a test-use perspective.
In the field of language testing as well, testers have begun to show a growing interest in the roles that language tests play in society. Spolsky (1998) argues that rather than putting all the effort into building more and more reliable measures of less and less important elements of language proficiency, testers should support the study of the meaning and use of the inaccurate measures they already have. Testers, he notes, should accept the inevitable uncertainty of tests and turn their attention to the ways in which tests are used.
As a result, language testers have begun to address various issues of test use1 focusing on topics of test ethicality, test bias, the effect and impact of tests on teaching and learning, and the use of tests. Additional topics are the extent to which language tests define linguistic knowledge, determine membership, classify people and stipulate criteria for success and failure of individual test takers. These are some of the topics that will be addressed and discussed in the following chapters.
Note
2
Voices of test takers
It is probably not possible to find a person in the modern world who did not go through a testing experience at least once in his or her lifetime. It is difficult to find a person who does not have a testing story that relates to how a single test affected and changed his or her life, for good or for bad. The experiences of taking tests are remembered by test takers for many years after the events have taken place. It is through the voices of test takers who report on the testing experiences and their consequences that the features of the use of tests can be identified.
Yet, in the testing literature test takers are often kept silent; their personal experiences are not heard or shared. It seems that the testing profession -those who produce tests - are not interested in such accounts. However, as will be noted in this chapter, listening to the voices of test takers can provide testers with a new and unique perspective and a deep insight into tests and their meanings.
Personal accounts
The excerpts given below represent a sample of some personal accounts of what testers say and feel about tests.
A death wish
In the following poem the Irish novelist J. McGahern (1977, quoted in Madaus, 1990) describes the impact that tests have on a child who is about to take a test:
Please God may I not fail
Please God may I get over sixty per cent
Please God may I get a high place
Please God may all those likely to beat
me get killed in road accidents and
may they die roaring.
This poem shows how fearful test takers are of failing tests and how detrimental the experience is for them. They clearly feel a lack of control, as succeeding on a test is like āan act of God\ They pray to pass the test in the same way as they pray to be saved from a terrible disaster or an awful danger. Further, doing well on tests implies that every peer is in competition; friends turn into enemies and rivals in the high-stake race where the only survival strategy is the elimination of the competitors. The poem also shows how central tests are in oneās life and the high price that test takers are willing to pay in order to succeed.
Causing deterioration in oneās life
On a memorable night in a bar named Dingo in Arnheim, the Netherlands, during a conference on language testing, my friend and colleague Tim McNamara and myself found ourselves deeply engaged in a conversation with a drug junkie. Upon asking what brought him to this low point in his life he told us a long story about what started it all. He recalled the traumatic event of taking a standardized test in 7th grade and failing it badly. His failure was such a disappointment for his father, a university literature professor, that from that point on his father started rejecting him. This eventually led to a series of events that turned our conversation partner into an outcast in his family leading him to leave home and gradually reach the point where he is at now. Needless to say we felt responsible, a face-to-face encounter with one of āour ownā victims ā¦
Whether the story told is an accurate account of the events or whether the test was used as a pretext, is not the main point of the story. It is important, though, that the story reflects the perception of a person about a single event, a high-stake test, that had a detrimental effect on his life. The specific event - failure on a standardized test - is perceived as connected and as responsible for a number of additional events, contexts and consequences. In this case a failure on a test evoked rejections from a family, low self-esteem and self-worth, a general negative attitude towards life to the point of criminal behaviour. It points to a phenomenon whereby the results of a test get out of control. It shows how a single event is so central that it becomes accountable and is perceived as responsible for future behaviour and events.
Forcing into a profession
This is a story of a person who was wondering about her life possibilities and therefore decided to examine her opportunities and talents by taking a number of tests in different areas. Little did she know that the exceptionally high scores she received on one of the tests would lead representatives of a secret agency to wait at her doorstep and attempt to recruit her to the Canadian secret service. They claimed that her high scores on the test convinced them, without any doubt, that she was an unusually appropriate candidate with exceptional talents for a specific job in the service. They would offer her high sums of money if she would join. She described how difficult, almost impossible, it had been for her to convince the agents that she had no interest whatsoever in ājoining the forcesā and in taking such a job. She was also surprised how they could deduce her exceptional talents and capabilities based on her performance on a single test. She eventually was able to get out of the offer while learning a lesson about the risks of ātaking tests, just for fanā.
In this example, again, the performance on one single test has the potential of affecting other events for better or for worse. It provides farther evidence of the blind trust that users of tests have in test scores, believing that tests can provide valid predictions and indications of all kind of performances, especially for high-status and responsible jobs. It can be compared to the trust in fortune tellers possessing supernatural powers of predicting the future. The example further shows how powerful the owners of the testing information are in relation to the powerlessness of test takers.
Stigmatizing people as failures
On June 12 th, 1991 an article reporting on the administration of a national test in reading comprehension appeared in the Israeli newspaper Haaretz. The article described some of the consequences of administering a high stake test in one school. Specifically, there were a number of interviews with children who were not allowed to participate in the test which took place in their school. The reason given by the principal was the concern that their participation would lower the average score of the school. Instead, the article reports, they were sent to the gym to ...