Neil J. Dorans and Linda L. Cook1
Introduction
Fairness, a concept familiar to most readers, can mean different things to different people. The concept of fairness has a long history, with a definition that has evolved over time. Legal prescriptions and proscriptions of fairness have also changed with time. When Justice Potter Stewart first joined the Supreme Court in 1958, he said, âfairness is what justice really isâ (National Affairs: The Young Justice, 1958). Potter linked fairness with justice. But what is fairness? It is probably easier to detect unfairness when we see it than it is to define fairness.
From an historical perspective using a 21st century vantage point, several practices that were accepted during the early part of the 20th century would be judged unfair by present standards. At the start of the 20th century, colonialism, was rampant, a byproduct of the imperialism of the late 19th century. In addition, at the start of the 20th century, racism was legally sanctioned in many parts of the United States,2 and women were not allowed to vote in national elections or hold elective national office. The end of World War II and the decades immediately following it saw the beginning of extensive decolonization, the passage of the United Nations Declaration of Human Rights, and the codification of laws and practices in the United States that led to equal rights and equal protection for U.S. citizens.
Fairness touches many aspects of human existence. Young children recognize fair and unfair play. Laws and regulations exist to ensure fair play in sports, fair trade in economics, fair allocation of resources, fair access to education, housing, and employment.
This volume examines fairness in the context of educational assessment and measurement. Our focus is primarily on educational assessment in the United States, and it is written from that perspective. We begin by journeying back in time to the first third of the 20th century.
A Testing Example from the Early 20th Century
Eugenics was a term attributed to the British polymath Sir Francis Galton, who, among other things, made important contributions to psychometrics and statistics. Galton (1883), who introduced the concepts of standard deviation and correlation to the field of statistics, concluded in Inquiries into Human Faculty and Its Development that superior social position was due to a superior genetic makeup, essentially a causal inference based on analyses of observational data from a variety of biographical sources.
The social movement of eugenics played a significant role in the history and culture of early 20th century United States and other countries. Eugenics was widely accepted. It was supported by the influential and respected, including British statesman Winston Churchill, President Theodore Roosevelt, Margaret Sanger, proponent of birth control rights for women, and playwright George Bernard Shaw, among many others (Kelves, 1985). Applied eugenic practices included genetic screening, forced birth control, compulsory sterilization, forced abortions, marriage restrictions, and segregation. The most infamous example of applied eugenics was engineered by Adolf Hitler, who cited eugenic theories as a justification for Aryan superiority and the genocide of those he considered to be defectives and racially inferior.
Toward the end of World War I, tests were developed to systematically and objectively evaluate those recruited by the U.S. military. These tests were devised by the Committee on the Classification of Personnel in the Army, established in 1917. Its membership included the psychologists and early psychometricians E. L. Thorndike, Lewis Terman, Robert Yerkes, L. L. Thurstone, and Truman Kelley. By the end of 1918, the Army had tested over 1.7 million men using the âAlphaâ and âBetaâ Army tests.
The Army Alpha test measured verbal ability, numerical ability, ability to follow directions, and knowledge of information, and was administered in English. Soldiers who were illiterate or who were not sufficiently proficient in English would take the Army Beta test. It was more complex to administer and score than the Army Alpha test. The Army Beta test used demonstration charts and pantomime to convey instructions to the persons being tested. The performance tasks on the Army Beta test used geometrical designs, mutilated or incomplete pictures, e.g. a table with a leg missing, a baby carriage with no handle, and other types of test questions that required different principles in its construction and response evaluation than those used on the Army Alpha test. Consequently, scores on the two tests did not measure the same thing and fair comparisons of the scores could not be made. For sample items from both tests, see http://official-asvab.com/armysamples_coun.htm. Yoakum and Yerkes (1920) gave a detailed description of both instruments.
Carl Brigham (1923) wrote a book based on the Army Alpha and Beta test data, A Study of American Intelligence. Several of the tables in the book report results based on a âcombination scaleâ on which Alpha and Beta scores and Stanford Binet scores were all expressed. Based on these results, he concluded that Blacks, Jews, Mediterraneans, and Alpines were inherently intellectually inferior to Nordics.3
Table 33 of Brigham (1923) contains estimates of the proportions of the three types of white âbloodâ in each European country. According to Table 35 of Brigham (1923), from 1840 to 1890, immigrants of Nordic blood accounted for at least 40% of the immigrants to the United States. Between 1890 and 1920, the Alpine race supplanted the Nordic race. For those in the eugenics movement who were concerned about dilution of the gene pool, the shift away from Nordics to other groups was a cause for alarm.
By todayâs standards, Brighamâs book would be considered racist. In the 1920s, it was widely, though not universally, accepted as an accurate representation (Cole and Zieky, 2001). During that time, Congress passed the Immigration Act of 1924, a federal law that limited the annual number of immigrants who could be admitted from any country to 2% of the number of people from that country who were already living in the United States in 1890. The law restricted the flow of Southern and Eastern Europeans and prohibited the immigration of Middle Easterners, East Asians, and Indians. The purpose of the law was to preserve the homogeneity of the American people.
Brigham went on to develop the Scholastic Aptitude Test (SAT) for the College Board in 1926. Based on his analyses of early SAT data, he concluded that test scores may not be a function of unitary dimensions, and that they were influenced by cultural factors that were not rooted in genetics, such as familiarity with the language of the test. These analyses are summarized in the book A Study of Error (Brigham, 1932).
Prior to publication of that book, Brigham (1930) wrote an article in Psychological Review in which he recanted his earlier work. The abstract of that article states:
In the light of recent investigations showing that test scores may not represent unitary things, the author criticizes attempts to establish racial differences and national differences with existing tests, in which mixture of verbal, quantitative, and spatial intelligence factors and dependence on vernacular destroy the significance of the scores. The author includes his own comparative racial study in this criticism.
(p. 158)
One technical concern related to Brighamâs research was the comparability of scores achieved on the Alpha and Beta versions of the Army test. A special sample of military personnel was tested with both, and these data were used to put the Alpha and Beta on a common seven-point scale (A, B, C+, C, Câ, D, Dâ). Because these two tests were quite different in terms of format and questions asked, and measured different constructs, scores from these tests could not be treated as if they were interchangeable. When interpreting the results of his research, Brigham (1923) had treated scale alignments of the Army Alpha and Beta tests as if they produced interchangeable scores. By 1930, he realized that was a mistake. He doubted whether the subcomponents of the Alpha test measured a unitary construct and acknowledged the effects of culture, particularly knowledge of the language of the test.
The Emergence of the Civil Rights Movement after World War II
World War II changed much in American society, including race relations and the role of women in the workforce. During World War II, President Franklin Roosevelt issued an executive order in June 1941 in response to complaints about discrimination at home against Black Americans, who constituted about 10% of the population. This order directed that Black workers be accepted into job-training programs in defense plants, and forbade discrimination by defense contractors. Still, the military remained segregated until July 1948 when President Harry Truman issued an executive order ordering full integration of the armed services. Full integration was not achieved until the end of the Korean War.
Integration also occurred in the national pastime, baseball, after World War II. On April 15, 1947, Jackie Robinson, a college graduate and military veteran who had been groomed by Brooklyn Dodgers general manager Branch Rickey to break the color line that kept gifted Black athletes from pursuing their living in baseballâs major leagues, broke that barrier. Robinson encountered widespread racism, including legally sanctioned segregation in the South, and vicious abuse including death threats simply because of his race. Robinson maintained his composure and succeeded in breaking the color line, a major symbolic step away from segregation.
During World War II, Robinson was arrested for failing to go to the back of an unsegregated Army bus, as was required. He was court-martialed and eventually acquitted. The mistreatment he experienced prepared him for the abuse that he would experience integrating baseball.
A quiet seamstress, Rosa Parks, refused to go to the back of a bus in Montgomery, Alabama, in 1955. Her stoic defiance of the law landed her in jail and is considered a pivotal moment in American history. In the first of his trilogy, America in the King Years, Taylor Branch (1988) reports that the Montgomery boycott, organized by Martin Luther King in response to Rosa Parksâ arrest, marked Martin Luther Kingâs emergence as a leader of the Civil Rights movement. The nonviolent protest practiced by King and his associates often met with resistance, and in some cases deadly force. Despite the blood that was shed by some of its members, the Civil Rights movement persisted and served as a catalyst for change in America in the 1960s. In 1964, King received the Nobel Peace Prize in recognition of his leadership. Those interested in the Civil Rights movement in the 1950s and 1960s should consult the three-volume work by Branch (1988, 1998, 2006).
The Zenith of the Civil Rights Movement and its Aftermath
The Civil Rights movement reached its peak in the mid-1960s, with the passage of the Civil Rights Act of 1965, the Voting Rights Act of 1966, and the Civil Rights Act of 1968, which was passed shortly after Kingâs assassination in Memphis, Tennessee. As documented in the fourth volume of Robert Caroâs The Years of Lyndon Johnson (Caro, 2012), President Lyndon Johnsonâs commitment to civil rights, his empathy for the poor, and his political acumen helped convert the momentum created by the Civil Rights movement into law.
While these laws made discrimination on the basis of color, creed, and gender illegal and removed barriers to voting and access to housing, they did not address the long-standing historical consequences of legal discrimination and slavery. In 1961, President John Kennedy issued an executive order mandating that projects financed with federal funds take what was called affirmative action to ensure that hiring and employment practices are free of racial bias. Affirmative action was synonymous with anti-discrimination. The meaning of affirmative action changed with executive orders issued in 1965 by President Johnson that attempted to redress the consequences of past discrimination. These efforts were later expanded by President Richard Nixon, during his first term as president, with the Philadelphia Plan, which required government contractors to hire minorities.
These attempts to remedy past discrimination met with much opposition, as noted by Hartigan and Wigdor (1989), who summarize the arguments for and against the practice of preferential treatment circa 1985. That volume examines a since-abandoned experimental practice by the U.S. Employment Service of the Department of Labor that represented an extreme form of affirmative action, namely the use of within-group percentiles by race as measures of proficiency on the General Aptitude Test Battery. This practice, which began during the early years of President Ronald Reaganâs administration, was halted at the request of the U.S. Justice Department in 1986 on the grounds that it was an unlawful violation of an applicantâs right to be free from racial discrimination, a right guaranteed under the Civil Rights Act of 1965.
Within-group norming is still used today, albeit the norming is not conducted by racial group. In 1997, the state of Texas, after other forms of affirmative action were successfully challenged, passed a rule that guaranteed admissions, to any public university, to students who had a high school GPA in the top 10% of their high school graduating class. To date, this rule has not been successfully challenged. As discussed in the Zwick and Dorans chapter in this volume, the National Merit Scholarship Program uses within-group norming by state to identify semifinalists for their scholarship competition.
To summarize, in the 1920s, there was a widespread use of intelligence tests that were developed during World War I. Many users of these tests believed that the test results were valid and buttressed eugenic claims, as illustrated in Chapters 4 and 5 of Kelves (1985). By the early 1980s, concerns about the legacy of over two centuries of racial discrimination had led to within-group norming. This use of test scores in itself violated a law that grew out the Civil Rights movement of the mid-20th century.
The Civil Rights movement was in the vanguard of other rights movements, such as womenâs rights, the rights of Spanish-speaking and Asian minorities, and the rights of individuals with disabilities. In time, test takers as a group were given the right to see their scored exams and question the answer key. These other rights movements followed the path that was forged by the travail of trailblazers of the Civil Rights movement.
As indicated in this chapter, the definition of fairness varies over time. Segregation gave way to integration, and affirmative action was instituted to address the consequences of that formal discrimination, only to be challenged as discriminatory itself. We have also shown how shifts in attitudes about testing reflect shifts in how society perceives difference in test scores, their antecedents, and the consequence of their use.
This brief selective summary of the interplay between testing and society has implications for the testing of today and tomorrow.4 Comparisons are often made of test take...