This celebrated primer presents an introduction to all of the key ingredients in understanding computerized adaptive testing technology, test development, statistics, and mental test theory. Based on years of research, this accessible book educates the novice and serves as a compendium of state-of-the-art information for professionals interested in computerized testing in the areas of education, psychology, and other related social sciences. A hypothetical test taken as a prelude to employment is used as a common example throughout to highlight this book's most important features and problems.

Changes in the new edition include:
*a completely rewritten chapter 2 on the system considerations needed for modern computerized adaptive testing;
*a revised chapter 4 to include the latest in methodology surrounding online calibration and in the modeling of testlets; and
*a new chapter 10 with helpful information on how test items are really selected, usage patterns, how usage patterns influence the number of new items required, and tools for managing item pools.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

eBook ISBN

Topic

Subtopic

1	Introduction and History

Howard Wainer

PROLOGUE

As we approach the end of the twentieth century we see the influence of computers all around us. In the 1970s computers worked behind the scenes to balance books, write paychecks, prepare weather reports, and do any number of tasks whose characteristics usually included odious repetitive operations. In the 1980s there was a change. Computers came out of the basement. The bank’s computer began to deal with the customer first hand, without the human intervention of bank employees. On most desks was a personal computer that processed both words and data, and could be connected with others through telephone networks, which themselves were run by computers. Tasks that computers now do are starting to get more complex. Machine intelligence, Inference engines, and Expert Systems are terms that are increasingly in vogue.

The use of computers within the context of mental testing has paralleled this development. In the 1970s large testing programs used computers to score tests and process score reports. In the 1980s we have begun to see computers administer exams. The increasingly broad availability of high-powered computing has made possible the administration of types of exam questions that were previously impractical. Moreover, exams could be individualized to suit the person taking them. Of course the development of procedures that adapt to the proficiency of the examinee required the solution of many difficult statistical and psychometric problems. These problems have presented challenges that have only now been solved sufficiently well for practical large-scale application. This volume is a description of how to build, maintain, and use a computerized adaptive testing system (a CAT).

Aristotle, in his Metaphysics, pointed out, “We understand best those things we see grow from their very beginnings.” We agree. Thus, our description of what we believe is the future of testing begins with a brief glimpse into its past.

THE FIRST FOUR MILLENNIA OF MENTAL TESTING

The use of mental tests appears to be almost as ancient as western civilization. The Bible (Judges 12:4–6) provides an early reference in western culture. It describes a short verbal test that the Gileadites used to uncover the fleeing Ephraimites hiding in their midst. The test was one item long. Candidates had to pronounce the word shibboleth; Ephraimites apparently pronounced the initial sh as s. Although the consequences of this test were quite severe (the banks of the Jordan were strewn with the bodies of the 42,000 who failed), there is no record of any validity study.

Some rudimentary proficiency testing that took place in China around 2200 B.C. predated the bibJical program by almost a thousand years. The emperor of China is said to have examined his officials every third year. This set a precedent for periodic exams in China that was to persist for a very long time. In 1115 B.C., at the beginning of the Chan dynasty, formal testing procedures were instituted for candidates for office. Job sample tests were used, with proficiency required in archery, arithmetic, horsemanship, music, writing, and skill in the rites and ceremonies of public and social life.

The Chinese discovered that a relatively small sample of an individual’s performance, measured under carefully controlled conditions, could yield an accurate picture of that individual’s ability to perform under much broader conditions for a longer period of time. The procedures developed by the Chinese (Têng, 1943) are quite similar to the canons of good testing practice used today. For example, they required objectivity—candidates’ names were concealed to insure anonymity; they sometimes went so far as to have the answers redrafted by another individual to hide the handwriting. Tests were often read by two independent examiners, with a third brought in to adjudicate differences. Test conditions were as uniform as could be managed—proctors watched over the exams given in special examination halls that were large permanent structures consisting of hundreds of small cells. Sometimes candidates died during the course of the exams.

This testing program was augmented and modified through the years and has been praised by many western scholars. Voltaire and Quesnay advocated its use in France, where it was adopted in 1791 only to be (temporarily) abolished by Napoleon. It was cited by British reformers as their model for the system set up in 1833 to select trainees for the Indian civil service—the precursor to the British civil service. The success of the British system influenced Senator Charles Sumner and Representative Thomas Jenckes in developing the examination system they introduced into Congress in 1868. There was a careful description of the British and Chinese system in Jenckes’ report “Civil Service in the United States,” which laid the foundation for the establishment of the Civil Service Act passed in January 1883.

Universities lagged far behind in their efforts to install examination systems. The first appears to be the formal exams begun at the University of Bologna in 1219. This was exclusively an oral exam. This structure was also described by Robert de Sorbon, the chaplain of Louis IX, as being used in that court. It was adopted for use in 1257 in the community of scholars that evolved into the Sorbonne. Written tests within universities seem to have their genesis much later with the sixteenth century Jesuits. The first pioneering effort at the development of formal test standards came from this order. In 1599, after several preliminary drafts, eleven rules for the conduct of exams were published. These rules (see McGucken, 1932) are almost indistinguishable from those used today.

The tradition of oral exams spread quickly and by mid-seventeenth century were a standard part of an Oxford education. Written exams were also used and by the middle of the nineteenth century were widely applied in the United States and Western Europe. By the beginning of the twentieth century, serious research efforts had begun on the use and usefulness of various testing procedures. These were done in the United States by Cattell, Farrand (later president of Cornell), Jastrow, Thorndike, Wissler, and Witmer (who founded the first psychological clinic) and in Europe, where Kraepelin (one of Wundt’s first students) and Ebbinghaus did important work that eventually led to Binet’s intelligence test and Terman’s use of it to study “Genius and Stupidity” in his dissertation.

The flurry of activity in testing at the beginning of the twentieth Century spanned a broader range of disciplines than just psychology. One of the most crucial contributions was from statistics, when Spearman provided the rudiments of psychometrics. He invented reliability coefficients and much of the ancillary statistical machinery that allowed their estimation and interpretation.

Tests of all descriptions began to appear to measure performance on such diverse tasks as verbal analogies (devised by Burt, 1911), shoving various shapes through holes (Woodworm, 1910), solving mazes (Porteus, 1915), and drawing a man (Goodenough, 1926). A major change in test administration was occurring at this same time, when there was a shift in practice from individualized to mass administration. This had positive and negative aspects. It allowed much more efficient testing and provided the possibility of a homogeneous testing environment. But it also increased the possibility of examinees not following the directions properly or for some other reason not performing up to their ability.

As the group administered test was evolving, the multiple choice format became increasingly widespread. E. L. Thorndike, at Columbia, and L. L. Thurstone, at Chicago, arranged test material so that items could be scored with a key. Otis, working with Terman at Stanford, was the first to develop an intelligence test that could be scored completely objectively. Prior to the formal publication of Otis’ test, the United States entered World War I; nevertheless Otis’ test became the prototype of the Army Alpha—the instrument that inaugurated large-scale mental testing.

THE ORIGINS OF MENTAL TESTING
IN THE U.S. MILITARY

Robert M. Yerkes, president of the American Psychological Association, took the lead in involving psychologists in the war effort. One major contribution was the implementation of a program for the psychological examination of recruits. Yerkes formed a committee for this purpose which met in May of 1917 at the Vineland Training School. His committee included: W. V. Bingham, H. H. Goddard, T. H. Haines, L. M. Terman, F. L. Wells, and G. M. Whipple. This group debated the relative merits of very brief individual tests versus longer group tests. For reasons of objectivity, uniformity and reliability, they decided to develop a group test of intelligence.

The criteria they adopted (from DuBois, 1970, p. 62) for the development of the new group test were:

1. Adaptability for group use.

2. Correlation with measures of intelligence known to be valid.

3. Measurement of a wide range of ability.

4. Objectivity of scoring, preferably by stencils.

5. Rapidity of scoring.

6. Possibility of many alternate forms so as to discourage coaching.

7. Unfavorableness of malingering.

8. Unfavorableness to cheating.

9. Independence of school training.

10. Minimum of writing in making responses.

11. Material intrinsically interesting.

12. Economy of time.

In just 7 working days they constructed ten subtests with enough items for ten different forms. They then prepared one form for printing and experimental administration. The pilot testing was done with fewer than 500 subjects. These subjects were broadly sampled, coming from such diverse sources as a school for the retarded, a psychopathic hospital, a reformatory, some aviation recruits, some men in an officers’ training camp, 60 high school students and 114 Marines at a Navy yard. They also administered either the Stanford-Binet intelligence test or an abbreviated form of it. The researchers found that their test correlated .9 with the Stanford-Binet and .8 with the abbreviated Binet.

The items and instructions were then edited, time limits revised, and scoring formulas developed to maximize the correlation of the total score with the Binet. Items within each subtest were ordered by difficulty and four alternate forms were prepared for mass administration.

By August, statistical workers under Thomdike’s direction had analyzed the results of the revised test after it had been administered to 3,129 soldiers and 372 inmates of institutions for mental defectives. The results prompted Thorndike to call this the “best group test ever devised.” It yielded good distributions of scores, correlated about .7 with schooling and .5 with ratings by superior officers. This test was dubbed Examination a.

In December of the same year, Examination a was revised once again. It became the famous Army Alpha. This version had only eight subtests; two of the original ten were dropped because of low correlation w...

Cover
Half Title
Title Page
Copyright Page
Table of Contents
Foreword to the First Edition
Foreword to the Second Edition
Preface to the First Edition
Preface to the Second Edition
1 Introduction and History
2 System Design and Operation
3 Item Pools
4 Item Response Theory, Item Calibration, and Proficiency Estimation
5 Testing Algorithms
6 Scaling and Equating
7 Reliability and Measurement Precision
8 Validity
9 Future Challenges
10 Caveats, Pitfalls, and Unexpected Consequences of Implementing Large-Scale Computerized Testing
References
Abbreviations and Acronyms Used
Author Index
Subject Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Computerized Adaptive Testing by Howard Wainer,Neil J. Dorans,Ronald Flaugher,Bert F. Green,Robert J. Mislevy in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

1

Introduction and History

PROLOGUE

THE FIRST FOUR MILLENNIA OF MENTAL TESTING

THE ORIGINS OF MENTAL TESTINGIN THE U.S. MILITARY

Table of contents

Frequently asked questions

THE ORIGINS OF MENTAL TESTING
IN THE U.S. MILITARY