Part I
Foundations of the field
1 Methodological issues and new trends in educational effectiveness research
Pam Sammons
Introduction
Over the last thirty years or more school effectiveness research (SER) has emerged as a fast growing and dynamic field of study with a growing international profile (Scheerens and Bosker, 1997; Mortimore, 1998; Teddlie and Reynolds, 2000; Muijs, 2006; Sammons, 2006a and 2006b, 2007; Van Damme et al., 2006; Townsend, 2007; Creemers and Kyriakides, 2008; Creemers et al., 2010) that explicitly focuses on studying the variation between schools, departments and teachers in their effects on studentsâ outcomes and the school and classroom processes that seem to support better outcomes. In addition to the study of school and, more recently, of teacher or class level effects and attempts to delineate the characteristics of effective schools and effective teaching (Muijs and Reynolds, 2010; Ko and Sammons, 2011), the field is now paying greater attention to contextual influences including the role of local authorities and school districts, for example (Tymms et al., 2008; Reynolds et al., 2011) and the role of comparative studies in different international contexts (Reynolds et al., 2002; Teddlie et al. 2006; Van de Grift et al., 2007). Moreover, SER-type studies of institutional effects are also addressing other areas of education including pre-school settings and nurseries (Sammons et al., 2008; Melhuish et al., 2008; Sylva et al., 2010), and colleges and higher education settings that serve students who are beyond the compulsory school leaving age. Given these developments the term educational effectiveness research (EER) is becoming increasingly used (Muijs, 2006; Creemers et al., 2010; Reynolds, 2010; Teddlie, 2010). This is a more appropriate description because it recognizes the broader remit of recent research (Creemers and Kyriakides, 2008) and a wider focus of enquiry than just the study of the effects of individual schools. For example, there is an increasing interest in the way EER can promote school improvement (Stringfield et al., 2008a) using the evidence base from High Reliability Schools research and can be used to evaluate educational initiatives, for example in England the introduction of Federations (Lindsay et al., 2007).
More than a decade ago I posed the question âHas school effectiveness come of age?â, noting the turbulent debates both methodological and philosophical that had accompanied this growth and the high policy profile accorded SER in some countries (Sammons, 1999). In 1999 I suggested that, at the turn of the millennium, SER could be seen as still in its adolescent phase, affected by heated controversy over its political and philosophical underpinnings, given the policy emphasis on raising standards and increasing accountability adopted by many education systems, as well as in relation to the role of local and national context in shaping school and teacher performance.
A few years later a special issue of the journal School Effectiveness and School Improvement was devoted to the topic âCritique and Response to twenty years of SERâ edited by Townsend (2001). Interestingly, at this time the focus of criticism remained largely on political and philosophical grounds rather than on methodological issues although there was evidence of a strong anti-quantitative stance. Yet the methodology of EER remains crucially important in efforts to enhance the knowledge base and the practical application of findings (Luyten and Sammons, 2009). After a lull, critiques of the field and its methodology have resurfaced recently (Gorard, 2010). Such arguments have been addressed by Muijs et al. (2011), who have comprehensively demonstrated flaws in the statistical critiques and knowledge of the field of EER.
More than a decade on from my comments on SER as still in its adolescent phase, the EER field can be seen to have matured and evolved as the various contributions to this volume demonstrate. This chapter seeks to provide a brief review of the current state of EER, its achievements and limitations, and to suggest some fruitful directions for future research. First, the chapter examines methodological and measurement issues before discussing the need for a greater emphasis on theory development in EER and the role of new approaches to model development. The third section explores the potential value of adopting a pragmatic philosophical approach and the role of mixed methods designs in EER that combine and integrate qualitative and quantitative approaches. The concluding section examines possible directions for future studies.
1. Methodological and measurement issues
Methodological debates were particularly evident in the early development of the SER field. For example, the seminal 15000 Hours (Rutter et al., 1979) study of London secondary schools was criticized severely by statisticians, amongst others, due to certain features of the methodology, including the small sample size of schools and inability to take account of the clustering of the student sample. Such criticisms stimulated significant advances in subsequent SER designs and approaches to analysis. Most notably, the development of hierarchical regression approaches using multilevel modelling that recognizes the impact of clustering in educational datasets and the need for longitudinal samples with individual student level data to compare school performance were led by authors such as Goldstein (1995) and Bryk and Raudenbush (1992). Improvements in the size, scale and statistical approaches used in EER during the late 1980s and 1990s (e.g. work by Hill and Rowe, 1998, that demonstrated not only that teacher effects tend to be larger than school effects, but also that in combination they could account for a substantial proportion of the variance in student outcomes). More recent methodological advances have been discussed by Creemers et al. (2010).
However, despite the methodological limitations of pioneering earlier studies such as 15000 Hours many of the key findings have been supported by later, more sophisticated multilevel research. For example, another study of secondary school effectiveness also conducted in inner London but almost two decades later (Forging Links: Effective Schools and Effective Departments, Sammons et al., 1997) supported and extended the original conclusions of 15000 Hours using a much larger sample of schools and multilevel approaches with longitudinal data for three successive student cohorts. In addition, a replication and extension of the Forging Links research conducted in Ireland (Smyth, 1999) also supported and extended the findings of the Forging Links research. As well as examining the size of school effects on studentsâ academic outcomes, both these studies addressed three important features of theoretical and practical importance â namely the size, stability and consistency of school effects and drew attention to the importance of departmental differences in academic effectiveness. Sammons et al., (1997) concluded that there is considerable internal variation in school effectiveness and that school effectiveness is best seen as a relative, retrospective concept dependent both on the choice of appropriate outcome measures and the timescale and methods of analysis used, including the adequacy of the intake predictors available for inclusion in appropriate multilevel models. The topic of within school variation (WSV) remains relatively unexplored however and is identified as an important focus of future enquiries (Reynolds, 2008).
The creation of the International Congress for School Effectiveness and Improvement (ICSEI) in 1990 helped to promote international collaborations and the development and wider dissemination of SER approaches. The first International Handbook of School Effectiveness Research (Teddlie and Reynolds, 2000) examined the methodology and scientific properties of SER, and summarized the research knowledge base, achievements and limitations and some important issues for future development. The next (Townsend, 2007) sought to link the school effectiveness and improvement fields more closely, gave greater attention to context issues, the classroom and international comparative perspectives, but paid little attention to methodological issues. However, despite wider recognition of the need for appropriate research designs and statistical techniques by those engaged in SER, it was only in 2005 that the MORE (Methodology of Research in Educational Effectiveness) group was established as part of ICSEI to stimulate further methodological advances in the EER field. Similarly, although the Educational Effectiveness SIG (Special Interest Group) at EARLI has only been in existence for a couple of years it too has sought to encourage more rigorous and innovative approaches and promote further international collaborative work, including the development of relevant instruments to measure school and classroom processes of theoretical and practical interest that could be used in a range of contexts (Teddlie et al., 2006). It is recommended that closer links be promoted between such groups to further encourage and enhance future EER studies. The increased emphasis on methodological issues evident during the last decade is illustrated by the production of a new volume on Methodological Advances in Educational Effectiveness Research (Creemers et al., 2010) that seeks to document: the current state of the art in EER and the challenges it faces; the contribution of different methodological orientations to the development of EE research, and to provide a conceptual map for further methodological advancement in EER studies.
There have been a number of important methodological achievements in EER, particularly related to the use of multilevel models and large-scale longitudinal research that recognizes the complexity and hierarchical structure of most educational systems. Gorard (2010) has recently criticized EER methodology based on the use of contextualised value-added indicators, arguing that simplicity is important in educational research. Yet education is a feature of complex social systems that demonstrate a hierarchical structure in many aspects of life (because of clustering effects linked to neighbourhoods, schools or classes) and complexity cannot be avoided as Goldstein (1998: 2) argued: âin order to describe the complex reality that constitutes educational systems we require modelling tools that involve a comparable level of complexityâ. Goldstein and Noden (2004) and Plewis and Fielding (2003) have drawn attention to the value of multilevel statistical modelling in a range of fields because of its ability to answer questions at a much greater level of detail and complexity. Muijs et al. (2011) similarly argue the case for the use of appropriate multilevel models and other appropriate statistical techniques in order to investigate institutional influences.
Many in the EER field have warned against the use of single measures of school performance for accountability or research purposes (e.g. Mortimore et al., 1988, 1989; Nuttall et al., 1989; Sammons et al., 1993, 1995; Goldstein, 1995, 1998; Teddlie and Reynolds, 2000). In standard SER designs, the residual estimate of an individual schoolâs effectiveness is always based on its relative position in comparison with other schools, taking into account differences in student intake this is often known as a contextual value-added (CVA) indicator. There is a need to consider the confidence limits (CL) associated with individual school, departmental or class level residuals derived from multilevel value-added analyses. This prevents fine (rank ordered) distinctions being made between most schools, and thus the production of ranked âleague tablesâ is regarded as statistically invalid (Goldstein, 1997). A consequence of using multilevel value-added approaches is that the extent a school is identified as more or less effective is thus largely determined by the performance of the other schools to which it is compared and the adequacy of intake controls to ensure more appropriate âlike with likeâ comparisons of relative performance levels. Of course a similar argument about the adequacy of model fit can be made with respect to the interpretation of estimates and effect sizes for individual predictors included in any multivariate analyses, but an advantage of multilevel analysis over traditional regression approaches is that this allows more precise estimates to be obtained because of the control for variance attributable to higher levels such as the school or teacher level (Goldstein, 1995; Elliot and Sammons, 2004). The use of CLs for school residual estimates means that it is most appropriate to identify groups of schools whose performance is either significantly better or poorer than other schools in a given sample for a given outcome (Sammons, 1996). This has important implications for educational policy makers and practitioners in education systems that seek to promote greater educational accountability by publishing school performance data.
In England, CVA indicators have been made available and welcomed by most schools as fairer than raw results for evaluating their performance. However, it is unfortunate that raw results are still used as the main means of judging school performance and continues to penalize schools serving disadvantaged communities (Sammons, 2008). Moreover, any one measure is both outcome and time specific and to evaluate schools appropriately more attention needs to be paid to WSV (Sammons et al., 1997; Reynolds, 2008; Muijs et al., 2011).
Luyten and Sammons (2010) provide a brief summary of multilevel approaches and their application to datasets that are hierarchically structured. In those cases two or more levels can be distinguished with the units at the lower levels nested within the higher level units, typically this can be a dataset of students nested within classrooms and schools. The hierarchical structure may be extended further, if one takes into account the nesting of schools within geographical units (such as local communities, regions or nations). They argue that the advantages of multilevel analysis include its flexibility and capability to deal with unbalanced data and data with incomplete records on the outcome measures. In addition to the analysis of longitudinal data the multilevel approach may also be useful for analysis of data with two or more distinct outcome measures per individual (multivariate multilevel modelling).
The ability to take account of the role of clustering in educational data and identify variance at different levels in hierarchical structures also has the advantage that it provides more efficient and accurate estimates of the effects of predictor variables and their associated standard errors. It also allows the estimation of overall size of effects at higher levels (e.g. school or classroom/teacher) in terms of the proportion of unexplained variance attributable to each level using the intra-class correlation. Comparison of the null model (partitioning the variance at different levels in an empty model with no predictors) with various more complex models allows the researcher to show the percentage of total variance and the percentage of student or of teacher or school level variance explained (accounted for) by different sets of predictors.
Of course, good control for student prior attainment and background characteristics remains essential for value-added analyses of possible school or teacher effects, because poorly specified models may lead to over-estimates of institutional differences in student outcomes through failure to control sufficiently for pre-existing student intake differences (Elliot and Sammons, 2004; Van de Grift, 2009). In addition, there is a need to explore more complex models that test random variation at higher levels to study differential effects and possible cross-level interactions. Work by Opdenakker and Van Damme (2006, 2007) illustrates the use of more complex models to study the relationships between measures of school type, school context, group composition, school practices and school effects on student outcomes including multilevel growth curve modelling.
Multilevel meta-analysis
Further refinements in multilevel approaches include multilevel meta-analysis that has the potential to provide better estimates of the size and variation in educational effectiveness for a range of outcomes, phases of education and contexts (Hox and De Leeuw, 2003). Meta-analysis uses statistical ...