Assessing student learning outcomes internationally: insights and frontiers
Hamish Coates
As higher education systems and institutions expand, more energy is being invested in ensuring that sufficient learning has been achieved to warrant the award of a qualification. Many commonly used assessment approaches do not scale well, and there remains a pressing need for reform. This paper distils insights from international investigations of student learning outcomes assessment, using this analysis to chart frontiers for innovation. This paper sets out principles for guiding change in this field, presents an evaluation of progress via a review of signature assessment initiatives, reviews likely facilitators and blockers and, through these analyses, derives a strategy for spurring development.
Building a strategy for change
The assessment of studentsâ learning outcomes is a pressing change frontier for higher education, seemingly in most countries and higher education institutions. Almost everyone with an involvement appears to contend that this area could be improved, though there is marked divergence of opinion regarding the nature and extent of change. While assessment is a topic that energises or enervates people, it is helpful to stand back and draw from research in the field to form a more considered view on areas which appear most important to progress.
Finding ways to progress this field is important, for the assessment of learning is of substantial and growing significance to institutional research and higher education. Assessment provides essential assurance to stakeholders that people have attained expected competencies, and that they are ready for employment or further study. Assessment marks the character of an institution and its education programmes. Assessment shapes education and how people learn in powerful ways. Much assessment is expensive, making it an important focus for review. Assessment is highly relevant to individuals, defining life chances and directions.
Given such significance, it is surprising that much assessment in higher education has not changed materially for a long time, and that economically and technically unsustainable practice is rife. It is possible that current practice reflects the pinnacle of assessment, but given the lack of substantial advance over recent decades â or even centuries â this seems unlikely. Rather, given the enormous changes reshaping core facets of higher education, and pressures and prospects surrounding assessment, it is more likely that the âtransformational momentâ has yet to come.
Hence, this paper is structured to clarify a strategy for change. It sets out helpful principles for guiding change in this field, presents an evaluation of progress via a review of signature assessment initiatives, reviews likely facilitators and blockers and, through these analyses, derives a strategy for spurring development. It goes beyond detailed technical, political or educational analysis to provide joined-up perspectives for advancing the field. Strategic planning techniques are used to identify and position options for development. Of course, âassessmentâ is a word that covers much territory, and stratagems that work well for one purpose or context may not deliver in another which, as discussed below, always makes assessment design an exercise in trade-offs.
As the title conveys, the paper examines developments which have international and cross-institutional relevance, regardless of their initial scope of application. The paper draws from several international, national and institutional projects (Coates 2014; Melguizo and Coates in press) to explore opportunities for research and innovation. As such, the paper brings together insights from substantial technical and operational research, expert advice and wide-scale stakeholder consultation.
Principles for reform
A two-dimensional framework is proposed as a mechanism for advancing principles for reforming the field of learning outcomes assessment. These dimensions are described, and the frameworkâs value is teased out via a number of illustrative change areas. One dimension of this framework divides change into those aspects which are substantive in nature, which are technical in nature and which are practical in nature. The other dimension is hierarchical and partitions consideration instead by the level or zone at which change might occur.
Substantive â policy, disciplinary and conceptual â considerations are the most significant forces shaping learning outcomes assessment. Assessment is of little use unless it is relevant to students, to policy-makers, to institutional leaders and managers, to academics, or to the general public. Establishing such relevance is tricky, as it involves not just identifying but also then defining what counts, and of course stakeholder interests play a role in this. Power plays a key role, manifest through the formal or informal authority of individuals or institutions. The oligopolistic character of many established higher education systems has limited the extent to which change has been driven by research and technological development, though appetite for research-driven change appears to be increasing with the increasingly competitive nature of higher education markets.
It is imperative that assessment is cogent technically. This means that assessment resources and approaches should aim to be valid and measure and report what is intended. Assessment should be reliable, which means that assessment should provide consistent measurement of the target focus area. There are a host of methods for assessing and reporting these kinds of technical properties, which of course are the focus of active scientific debates. At a minimum, it might be expected that explicit consideration has been given to measurement considerations, but ideally a set of statistics should be provided as with professionally validated assessment instruments.
Substantive relevance and technical integrity are not sufficient to spur change in assessment. Practice is critical in that it must be feasible to collect, analyse and report data. Though institutional budgets are getting tighter, many entrenched assessment methods have high fixed costs and limited economies of scale. It is vital that more viable options are explored. Really important changes in assessment might be costly or slow to deliver. They may waste studentsâ time and hinder learning experiences and outcomes. Indeed, such practical constraints are often claimed as impediments to progress. Technology carries the potential to make a huge difference, as is evolving thinking about the nature of the evidence, hence processes required to assure student achievement.
The second dimension of the framework distinguishes the level at which change in assessment might unfold. The OECD (2015, 15) notes of this dimension that it âdistinguishes between the actors in education systems: individual learners and teachers, instructional settings and learning environments, educational service providers, and the education system as a wholeâ. The level at which information is reported is not the same as the level at which information is collected. Data are often collected at a lower level, then aggregated and often also combined with other data for reporting. Similarly, the interpretation level might be different again, and will likely vary with the interests and concerns of stakeholders. Many current institution rankings, for instance, aggregate information on individual researcher performance, report this at the institution level, then the information is interpreted in all sorts of ways, including in relation to fields of education. For current purposes, it is proposed that assessment change is required for those involved in education, such as students and teachers, and that change is required by broader communities, including the general public, business and industry, and people associated with planning education policy and strategy.
A series of framing ideas can be evoked from this two-dimensional framework. Substantively, it is important for assessment to be relevant or authentic to students and teachers. This often means that a diversity of assessment practice is required. At the same time, stakeholders more removed from everyday practice seek evidence which is more general in nature. Hence, a substantive principle which might be derived is that future reform should ensure that assessment is locally relevant and externally generalisable. A technical principle is that reform should advance transparency regarding the validity and reliability of assessment. The most well-designed and validated assessments are meaningless unless they are feasible to implement. Hence, a further principle for reform is that assessment must make efficient use of money and time. In terms of practice, emphasis might be placed on delivering feasible and efficient assessment to large student cohorts given tight resource constraints, whereas those more removed from the process may give more regard to the technical veracity of the evidence produced. Stereotypical remarks made by employer groups can suggest a lack of confidence in the everyday assessment by institutions of studentsâ knowledge and skills.
Such principles could be nuanced differently or elaborated more exhaustively, but these formulations are sufficient to tease out the main points at play. None of the three principles are particularly surprising or controversial, though they can provoke substantial complexity and be difficult to implement. Part of the trouble arises from the conundrums provoked by attempts to harmonise or jointly optimise the principles considered in unison. Further trouble flows from negotiating the dialectic between internal and external interests. Broader considerations flow from complexities associated with generalising the assessment of complex higher order skills across national and cultural contexts. Resolving these issues offers a chance to unlock substantial progress in the assessment of student learning outcomes. Hence, the principles provide a useful normative rubric against which to evaluate current progress and change dynamics, and to forecast insights and frontiers for reform. With these ideas in hand, the next two sections take stock of change efforts and characteristics to inform the findings regarding specific areas for reform.
Evaluating progress in assessment
The lack of modernisation of assessment is not a result of lack of imagination or effort. In the last few decades, many endeavours have sought to unblock the development of assessment. It is helpful to take evaluative stock of the field to showcase recent work and ground the analyses that follow. Clearly, taking critical stock of a field as large and diverse as higher education assessment is a useful, though challenging task: there are an enormous number of actors and initiatives, each at varying stages of maturity and diffusion. Rather than conduct an exhaustive review it is feasible to conduct a review of a series of signature case studies which have sought to shift policy and practice.
One broad line of development has involved specifying qualification-level outcomes. Examples include the European Qualifications Framework (EC 2015a), the United Kingdomâs Qualifications and Credit Framework (Ofqual 2015), the Australian Qualifications Framework (AQFC 2015) and the United States Degree Qualification Profile (Lumina Foundation 2015). As such titles convey, this work is developed and owned by systems, and such initiatives have served as important policy instruments for shifting beyond an anarchic plethora of qualifications, generating conversations about finding more coherence and indeed articulating the general outcomes graduates should expect from a qualification (Chakroun 2010). These system-wide structures can suffer from unhelpful collisions with fruitfully divergent local practice, but their inherent constraint is that they go no further than articulating only very general graduate outcomes (Allais, Young, and Raffe 2009; Wheelahan 2009). They offer little beyond broad guidelines for improving the assessment of student learning.
Going one step deeper, another line of work has sought to specify learning outcomes at the discipline level. The Tuning Process (GonzĂĄlez and Wagenaar 2008) is a prominent example which has been initiated in many education systems, and across many diverse disciplines. Broadly, Tuning involves supporting collaboration among academics with the aim of generating convergence and common understanding of generic and discipline-specific learning outcomes. Canada adapted this work in an innovative way, focusing the collaborations around sector-oriented discipline clusters rather than education fields (Lennon et al. 2014), while in Australia a more policy-based and regulatory-focused approach was deployed (ALTC 2010). Such collaboration travels several steps further than qualification frameworks by engaging and building academic capacity within disciplinary contexts. Like the qualification frameworks, however, the work usually stops short of advancing assessment resources, and tends to focus instead on advancing case studies or best practice guidelines. Hence while it may arise in particular fields, there are no shared assessment materials or data.
A slightly deeper line of development involves shared rubrics to compare assessment tasks or student performance. Moderation in assessment can play out in many ways (Coates 2010), as indeed has been the case in recent higher education initiatives. The moderation of resources has involved rudimentary forms of peer review through to slightly more extensive forms of exchange. Mechanisms have also been developed to help moderate student performance. In the USA, for instance, the Association of American Colleges and Universities (AAC&U) (Rhodes and Finley 2013) has developed VALUE rubrics for helping faculty assess various general skills. This has been progressed in more recent cross-institutional moderation work (AAC&U and SHEEO 2015). The UKâs external examiner system (QAA 2014) is a further example. Several such schemes have been launched in Australia, including a Quality Verification System and a Learning and Teaching Standards Project, both of which involve peer review and moderation across disciplines (Marshall, Henry, and Ramburuth 2013). This work travels more widely than qualification- or discipline-level specifications, for it involves the collation and sharing of evidence on student performance, often in ways that engage faculty in useful assurance and development activities. Such moderation work is limited, however, in being applied in isolation from other assessment activities and materials.
Collaborative assessments build from the developments discussed so far to advance more coherent and expansive approaches to shared assessment. As with other developments addressed here, such work plays out in myriad ways. For instance, medical progress testing in the Netherlands (Schuwirth and van der Vleuten 2012) involves the formation of shared assessment materials, and administration of these in a longitudinal sense. Other assessment collaborations have focused on the development of shared tasks, analytical or reporting activities: for instance, the Australian Medical Assessment Collaboration (AMAC) (Edwards et al. 2012) and the German initiative titled Modelling and Measuring Competencies in Higher Education (KoKoHs) (Zlatkin-Troitschanskaia, Kuhn, and Toepper 2014). In 2015, the Higher Education Funding Council for England (HEFCE) funded a suite of mostly collaborative projects to assess learning gains in higher education (HEFCE 2015), and the European Commission funded a large-scale collaboration titled Measuring and Comparing Achievements of Learning Outcomes in Higher Education in Europe (EC 2015b). Such work is impressive as it tends to involve the most extensive forms of outcome specification, task production, assessment administration, analysis and reporting, and at the same time develop faculty capacity. Work plays out in different ways, however, shaped by pertinent collegial, professional and academic factors. This can mean, for instance, that extensive work is done that leads to little if any benchmarking or transparent disclosure.
Standardised assessment is easily the most extensive form of development, and would appear to be growing in scope and scale. Licensing examinations are the most long-standing and pervasive forms of assessment, though their use is cultural and they tend to be far more common in the United States than Europe, for example. Other related kinds of national effort are evident in certain countries: for instance, in Brazil (Melguizo in press), Colombia (Shavelson et al. in press) and the USA (Shavelson 2007; ETS 2014). A series of international graduate outcomes tests have also been trailed in recent years, such as the OECDâs Assessment of Higher Education Learning Outcomes (AHELO) (Coates and Richardson 2012), the International Association for the Evaluation of Education Achievem...