1. Motivation for This Book
The global demand for English proficiency in the workforce and academia has grown steadily in recent years and estimates suggest that in 2020 over two billion people will be using English on a regular basis at some level of proficiency (Howson, 2013). Due to the increasingly large number of English learners around the world, there is a growing demand for assessments of English proficiency, in particular, assessments of English speaking proficiency. These assessments can be used for a variety of purposes that range from low-stakes, such as providing information to instructors and students about a studentâs learning progress, to high-stakes, such as informing hiring decisions in a global company that requires the use of English on the job. In order for any of these assessments of English proficiency to be a valid indicator of speaking ability, it should include some tasks that elicit actual spoken responses from the test takers instead of only including selected response tasks, such as ones requiring test takers to recognize conventions of standard spoken English.
However, the inclusion of such constructed response tasks in language proficiency assessments poses challenges for scoring, especially for large-scale assessments. Human rating, which has traditionally been the only means to evaluate such constructed spoken responses, can be costly, time consuming, and subject to factors that may negatively impact the validity of the scores, such as rater fatigue, rater bias, etc. (Engelhard, 2002). In order to address these challenges, an increasing number of speaking tests now make use of automated speech scoring technology, either as the sole source of scores or in combination with human raters.
Automated speech processing technology has improved substantially in recent years. Recent advances, such as the use of Deep Neural Network algorithms for training automatic speech recognition models, have brought about impressive gains in the performance and reliability of systems that make use of this technology. Although these systems are still far from perfect, they can be used productively in a wide range of products and services; the impact of automated speech processing technology can be seen, for example, in the commercial success of digital assistants (such as Amazon Alexa and Google Home) and the widespread use of telephone-based automated customer support systems. Similarly, the performance of automated speech scoring systems has also improved substantially in recent years and they are being considered for use in a much wider range of assessments than when they were first introduced, approximately 30 years ago.
Despite the growing use of automated speech scoring technology, it is still a relatively new field and there is still a relative lack of knowledge about how automated speech scoring systems work and what strengths and weaknesses they may have. Scientists, engineers, and other experts in the field of speech processing technology who develop the core components of automated speech scoring systems may not have a full understanding of reliability, validity, and fairness issues arising from how the technology is used in practice (for example, the need to provide a transparent description about how a score is produced based on construct-relevant measures in order to encourage a positive washback effect). On the other hand, other stakeholders who may not have speech processing expertise, such as language learners, language instructors, test developers, and score users (university program administrators, corporate hiring managers, etc.) may not understand the technical components underlying an automated speech scoring system and therefore may not be able to appropriately interpret and apply the output of the system.
This book aims to bridge this gap by serving as a comprehensive handbook on the subject of automated speech scoring that can provide guidance to stakeholders involved in all steps of the design and use of this technology. To achieve this end, it contains a comprehensive overview of how automated speech scoring systems work and how they can be applied to assess English speaking proficiency, covering topics such as the main components of an automated speech scoring system, the aspects of spoken language proficiency that they can assess, and related psychometric considerations. The presentation is situated in the framework of research and development into automated speech scoring that has been conducted at Educational Testing Service since the early-2000s.