Large-scale assessments (LSAs) play a growing role in education policy decisions, accountability, and education planning worldwide. This book focuses on central issues that are key components of successful planning, development and implementation of LSAs. The book's main distinction is its focus on practice- based, cutting-edge research. This is achieved by having chapters co-authored by world-class researchers in collaboration with measurement practitioners. The result is a how-to book whose language is accessible to practitioners and graduate students as well as academics.

No other book so thoroughly covers current issues in the field of large-scale assessment. An introductory chapter is followed by sixteen chapters that each focus on a specific issue. The content is prescriptive and didactic in nature but based on the most recent scientific research. It includes successful experiences, exemplary practices, training modules, interesting breakthroughs or alternatives, and promising innovations regarding large-scale assessments. Finally, it covers meaningful topics that are currently taking center stage such as motivating students, background questionnaires, comparability of different linguistic versions of assessments, and cognitive modeling of learning and assessment.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Routledge

Year

2012

Print ISBN

9780415894562

eBook ISBN

9781136578342

Topic

Didattica

Subtopic

Didattica generale

Introduction

Marielle Simon, Kadriye Ercikan, and Michel Rousseau

Brief History of LSA

Key Issues Related to LSAs

Part I: Assessment Design, Development, and Delivery

Part II: Assessing Diverse Populations

Part III: Scoring, Score Reporting, and Use of Scores

Part IV: Psychometric Modeling and Statistical Analysis

Conclusion

Large-scale assessments (LSAs) play an important role in education policy decisions, accountability, and education planning worldwide. They are also the object of a very active area of research regarding the planning, development, and implementation of LSAs and dissemination of their results. The Standards for Educational and Psychological Testing (1999) defines assessment as “Any systematic method of obtaining information from tests and other sources, used to draw inferences about characteristics of people, objects, or programs” (p. 172). LSAs are standardized assessments conducted on a regional, national, or international scale involving large student populations. In this book, the focus is primarily on assessments of learning and achievement. In this chapter we provide a brief historical background to LSAs and a description of the unique contribution of each chapter, concluding with a word on the future of LSAs.

Brief History of LSA

Of all the historical developments that have helped shape LSA, at least three stand out. One is the influence of the scientific paradigm in the 1900s, which focused on psychology and on the measurement of human behavior, sensory processes, and mental abilities (Thorndike & Thorndike-Christ, 2010). The era of mental testing started as early as 1905, with the works of influential researchers such as Charles Spearman, Alfred Binet, Lewis Terman, and Arthur Otis. As a result of World Wars I and II, there was an even greater surge in measurement and testing for military recruitment purposes. Another major development that paralleled the mental measurement movement was an interest in comparative education, which can be traced back to the 1800s with philosophers such as Marc-Antoine Jullien and, later, in universal public education with American educational reformers such as Horace Mann, Joseph Kay, and Matthew Arnold, who traveled abroad to develop the concept (Cowen & Kazamias, 2009). A third influential era was student assessment in education. As late as the early 1900s, only the best students were admitted into public school systems, and testing was used to select the brightest minds (Cowen & Kazamias, 2009). By the mid-1900s, however, the civil rights movement in the United States led to access for all, which was soon followed by the period of accountability in order to determine whether government-funded programs were achieving their goals (Cowen & Kazamias, 2009; Thorndike & Thorndike-Christ, 2010, p. 6). The accountability movement evolved from the program level in the 1960s, to the school and district levels in 1980s, and to standards-based accountability systems in the 1990s (Linn, 2000) and into the 2000s.

Some of the earliest initiatives of LSAs that are still ongoing date back to 1958 with the birth of what was to later become the International Association for the Evaluation of Educational Achievement (IEA). During the period of 1959–1962, the IEA implemented the Pilot Twelve Country Study of samples of 13-year-old students’ achievement in five subjects: mathematics, reading comprehension, geography, science, and nonverbal ability (Lafontaine & Simon, 2008). This led to the launch, in 1964, of the First International Mathematics Study (FIMS), which also tested samples of 13-year-olds and preuniversity students from 12 countries. Similar efforts were deployed in the 1970s with the First International Science Study (FISS). The second rounds of the IEA-led LSAs (SIMSS) occurred in the 1980s in up to 20 countries. In the 1990s, the IEA administered two parallel LSAs: the Trends in Math and Science Study (TIMSS) and the Program for International Reading Literacy Study (PIRLS). Between 1995 and 2011, TIMSS was administered five times and the PIRLS, three times. Over 60 countries participated in the TIMSS 2011 and 55 in the PIRLS 2011.

In 2000, another major player, the Organisation for Economic Co-operation and Development (OECD), administered the Programme for International Student Assessment (PISA). That year 32 countries participated in PISA, which measured 15-year-old students’ literacy in reading, mathematics, and science. In 2009, PISA was administered to samples of students from 70 countries (OECD, 2010). Although the purpose of both IEA and OECD was to provide the participating educational jurisdictions (countries, states, provinces, districts) with comparative data, IEA favored a decentralized, cooperative management model of decision making and implementation, whereas OECD members formed an official coalition of participants that planned long-term objectives for cyclical testing (Lafontaine & Simon, 2008).

One of the main impacts of these two major organizations was contribution to a culture of standardized assessments to support educational public policy and reform within participating educational jurisdictions. This culture of data-driven public policy and reform was particularly evident in the United States, as Crundwell (2005) explains:

The publication of the 1983 National Commission of Excellence in Education document, A Nation at Risk, brought to the forefront the issue of educational accountability in the United States (U.S.).… More recently, the passing of the Elementary and Secondary Educational Act (ESEA) has placed increased emphasis on the role of the U.S. federal government in education. ESEA, better known as No Child Left Behind (NCLB), places increased emphasis on standards-based accountability and mandates large scale assessment from grade 3 to grade 8 to ensure continued progress towards academic proficiency and standards (MASP, 2004). Within the NCLB legislation, the performance of a school or district on the assessment can result in either incentives or severe sanctions for schools that meet or fail to meet predetermined levels of proficiency. Schools that fail to meet the predetermined threshold of proficiency may lose funding or be taken over by the federal government.

This quotation also reflects distinct shifts in LSAs’ goals and use over the years. Since the 1960s, census testing has been conducted regionally (i.e., by school districts and school boards) under mandated conditions (e.g., Title 13) or on a voluntary basis. Whereas the LSAs administered in the 1980s and 1990s randomly sampled groups of students to provide data for monitoring overall student achievement, in the 2000s state or province-wide LSAs were being used to test all students for high-stakes accountability purposes from institutional and governmental perspectives but with low stakes for students. Later, in the 2000s, LSAs were also used for gatekeeping purposes, with higher stakes for the students (Nagy, 2000). Increasingly, students’ LSA scores were being partially or fully integrated in their course final grades for promotion or graduation purposes.

Throughout the implementation of LSAs, whether at the international, national, or state/provincial levels, a number of conceptual, technical, and pragmatic issues became the subject of intense debate, discussion, and research. One of the main concerns during the Pilot Twelve Country Assessment was test translation. Over the years, a number of concerns related to validity and reliability of results arose. These included, for instance, opportunity to learn, test accommodations, and bias. Other issues were related to student and item sampling, test equating, standard setting, communication of results, and security of tests items. A great deal of scientific literature on these topics allowed researchers and practitioners to solve many of the problems and concerns involved in designing and implementing LSAs.

Interestingly, in speculating about the future of LSAs, Bennett (1997, p. 3) claimed that despite major advancement in that field over the years, in practice, they had still undergone very little change:

Although there has been much recent intellectual ferment and experimentation in educational assessment, the practice of large-scale testing is much the same today as it was twenty years ago. Most large-scale tests still serve only institutional purposes, are administered to big groups in single sittings on a few dates per year, make little use of new technology, and are premised on a psychological model that probably owes more to the behaviorism of the first half of this century than to the cognitive science of the current half.

Nearly 15 years later, it appears that today’s LSAs, particularly those conducted within educational jurisdictions, tend to lean toward a common and universal format that closely follows the measurement model, with its emphasis on reliability, on the use of a combination of multiple-choice and short open response items, and on speedy, cost-effective, and objective scoring methods (Crundwell, 2005). The cognitive and constructive models of learning, which favor the development of “big ideas” in reading, mathematics, and sciences, thus continue to take a back seat (Lane, 2004; Suurtamm, Lawson, & Koch, 2008). Since the publication of Knowing What Students Know (Pellegrino, Chudowsky, & Glaser, 2001), there has been greater emphasis on assessments to capture complex thinking, to be more closely aligned with learning and instruction, and to provide information that can guide student learning. Even though there have been great improvements with respect to these goals (Ercikan, 2006), there is evidence that LSAs may not be sensitive to instruction and may even contribute to narrowing it to the teaching of superficial contents (Wiliam, 2007). Thus an often heard principle associated with contemporary assessment is that decisions should be based on the results of more than one assessment.

Key Issues Related to LSAs

The previous summary brings to light a number of issues that were the subject of intense discussion, debate, and research across several decades. Despite intellectual advancements, many of these problems continue to plague today’s LSAs and are constantly revisited by leading authors and experienced practitioners who are currently involved in these respective areas. This book features many of these issues in four sections: (a) assessment design, development, and delivery; (b) assessment of diverse populations; (c) scoring, score reporting, and the use of scores; and (d) psychometric modeling and statistical analysis.

Part I: Assessment Design, Development, and Delivery

This first section deals with the processes of planning, reviewing existing options, and organizing all aspects of LSAs prior to their administration. It features four chapters specifically dealing with such issues as investigating conceptual frameworks and designs for the measurement of student cognition and learning; developing background questionnaires to collect contextual data from students, teachers, school principals, and parents; motivating students to be fully engaged in LSAs; and adopting computer-based or computer-adaptive testing as a delivery option for LSAs.

The first chapter in this section, by Leighton, offers a thorough discussion of some of the latest research and practice for designing and developing LSAs of student knowledge and skills based on cognitive-psychological and learning principles. The author uses examples from operational testing programs to demonstrate key assessment design and development issues.

In their chapter, Childs and Broomes explore ways to improve background questionnaires typically associated with LSAs. These questionnaires are often administered to students, teachers, principals, and parents. They constitute an essential component of LSAs to produce contextual information that can serve to interpret students’ achievement results. Resulting data are the object of primary and secondary descriptive or inferential analyses whose findings often contribute to the advancement of learning theories. Using examples from LSAs administered in Canada, this chapter outlines the types of information that can be collected using background questionnaires. The authors offer practical guidance for designing background questionnaires and writing questionnaire items.

The chapter on student motivation—with Van Barneveld as the lead author and developed in collaboration with Pharand, Ruberto, and Haggarty—stresses the importance of having students participate fully in responding to LSA tests and background questionnaires. It contains a summary of research evidence regarding the relationships between examinee motivation, examinee behaviors, and key elements of the assessment context. Personal perspectives provided by these authors—one a superintendent, one a teacher, and the other a student—highlight the variation and complexity of individual motivation orientations related to LSAs. Finally, guidelines are offered for motivating individuals to engage in the various LSA activities.

A fourth chapter by Luecht offers a number of assessment design, development, and implementation considerations when LSA is computer-based or computer-adaptive. His chapter provides information regarding venues, platforms, interfaces, item types, and delivery designs that are relevant for testing in various sectors, such as medicine, health sciences, accounting, and more specifically education. The chapter also refers to computer-based testing (CBT) options for end-of-grade and end-of-course assessments. Given CBT’s complex designs, related security issues, and financial costs, it is not surprising that common regional, national, and international large-scale assessments have not yet implemented CBT. Although the reader is warned about the complicated nature of CBT, sufficient details, explanations, and corresponding examples are offered in the chapter for graduate students and practitioners with minimal knowledge of advanced measurement principles—such as item design and item banking and psychometric theory such as item response theory (IRT)—to apply these to the context of LSA.

Part II: Assessing Diverse Populations

This section focuses primarily on the assessment of diverse populations such as English language learners (ELLs), students in minority language contexts, and students with special needs, such as those with cognitive disabilities. It features three chapters. The chapter by Solano-Flores and Gustafson examines the limitations of current practices in the assessment of ELLs. According to these authors, there is a need to adopt a probabilistic view of language and to address (a) the dynamic nature and heterogeneity of language groups and (b) the multiplicity of categories of English proficiency. The capacity of assessment systems to deal with randomness in language and linguistic groups and to model language variation as a source of measurement error is raised as critical to improving assessment systems for ELLs.

In their chapter, Ercikan, Simon, and Oliveri examine issues, challenges, and potential solutions regarding comparability of scores resulting from a minimum of two linguistics versions of LSAs. It is intended to inform educational jurisdictions where resources for establishing and verifying comparability of multiple language versions of tests are not always available and when student samples are small. Some educational jurisdictions around the world, in particular those serving small numbers of students, may not have linguistic, curricular content area, or sophisticated measurement expertise to ensure linguistic comparability of results. Moreover, some linguistic minority groups may have such small sample sizes that they prohibit using the recommended statistical and psychometric analyses for establishing comparability. The chapter offers options for these limited contexts.

With in-depth knowledge, experience, and expertise in LSA accommodation, Sáez, Jamgochian, and Tindal offer a chapter on ways to improve accommodations strategies (a) to assist persistently low-performing students or students with significant cognitive disabilities and (b) to increase accessibility for all students.

Part III: Scoring, Score Reporting, and Use of Scores

This section covers four key aspects related to LSA scores. The main outcome of LSAs is performance results of students and inferences based on these results. The first chapter in this section reviews different issues in scoring student results; the second chapter focuses on performance-level descriptors and standard setting. Among the many methods of reporting LSA scores, web-based reporting is becoming one of the most popular. Such a trend calls for the development of models, which are featured in the third chapter in this section. Finally, given the widespread use of LSA results for making inferences about school quality and teaching effectiveness using value-added models (VAMs), the fourth chapter in this section is dedicated to discussing VAMs.

Scoring involves the interpretation of examinees’ responses or performances and transforming them into numeric or alphanumeric codes representing levels of completion or correctness. Scoring is often based on scoring rubrics, and the chapter by Oliveri, Gunderson-Bryden, and Ercikan offers a discussion of two important topics regarding the scori...

Cover Page
Title Page
Copyright Page
Contents
List of Illustrations
Preface
Acknowledgments
1 Introduction
PART I: ASSESSMENT DESIGN, DEVELOPMENT, AND DELIVERY
PART II: ASSESSING DIVERSE POPULATIONS
PART III: SCORING, SCORE REPORTING, AND USE OF SCORES
PART IV: PSYCHOMETRIC MODELING AND STATISTICAL ANALYSIS
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Improving Large-Scale Assessment in Education by Marielle Simon, Kadriye Ercikan, Michel Rousseau, Marielle Simon,Kadriye Ercikan,Michel Rousseau in PDF and/or ePUB format, as well as other popular books in Didattica & Didattica generale. We have over 1.5 million books available in our catalogue for you to explore.

Improving Large-Scale Assessment in Education

Theory, Issues, and Practice

Improving Large-Scale Assessment in Education

Theory, Issues, and Practice

About this book

Trusted by 375,005 students

Information

Table of contents

Frequently asked questions