Background
Learning a second, additional, or foreign languageāthe process of which we henceforth call second language acquisition (SLA) or L2 learningāis a human endeavor that has been, and always will be, taking place because humans are social beings who interact, forge relationships, and want to understand one another in the present, past, and future. As described by Gass and Mackey (2012, p. 3), individuals vary in how they learn second languages, have various capabilities to learn, and have various levels and types of ultimate (end-state) learning outcomes. A difficulty, Gass and Mackey stressed, is in trying to account for individual and contextually bound differences in L2 learning and L2 learning outcomes. That is where SLA researchers step in; they endeavor to measure SLA in order to document, describe, explain and predict L2 learning and its outcomes. SLA researchers work to āconsider as many relevant factors as possibleā to understand āwhat it means to know a second languageā (Gass & Mackey, 2012, p. 3). Doing so can shed light on large, philosophical issues, such as āthe nature of the human mind and intelligenceā (Doughty & Long, 2003, p. 5, emphasis original). Or, field-specifically, SLA research can help develop, optimize, and substantiate L2 learning theories. On a practical level, SLA research can reveal important information that can be applied by pedagogues who develop classroom methods, materials, and tests.
Critical to understanding SLA research is knowing that language itself is a latent trait that must be somehow measured to investigate specific SLA theory. Language is holistically unobservable except through manifest observations: that is, through instances of peopleās language performances, captured through various observational techniques. To study or research L2 learning achievement (through empirical, cross-sectional, or experimental research designs) or trajectories (through longitudinal designs) has, therefore, at its heart the measurement of second languages at their various stages and within their developmental contexts; contexts that comprise methodological, social, cultural, and physical elements, that likewise interface with human cognition and emotions, and that are time bound.
Without a doubt, SLA researchers can (and do) study important SLA factors without directly measuring the L2: Extremely significant qualitative, descriptive, and empirical studies have investigated the circumstances, socio-cultural affordances, and practices that strongly influence, shape, or hinder SLA. Such research is particularly virtuous in uncovering important information about nascent SLA concepts. For example, Norton and De Costa (2018) stressed the urgent need for SLA researchers to conduct ethnographic studies to understand how race is implicated in teachersā experiences of language-teacher legitimacy (p. 95), and how educational policy impacts language teachersā practices and their identity constructions (p. 100). Such field-expanding research may not necessarily involve direct measurement of language, but rather may involve nuanced qualitative investigations into complex social and transnational constructs, such as agency and legitimacy, that shape SLA. Language testing researchers, such as Shohamy (2013), have entered into such inquiry by positioning language tests as policy elements that delegitimize and marginalize L2 learners, turning tests from information providers about L2 constructs to covariates that can strongly affect the SLA process. Such qualitative research has extreme value and requires transparency, but perhaps not strict empirical replicability (see Markee, 2017; Polio, 2012), as its findings can contribute uniquely to the field of SLA and keep SLA (and, we think, language testing) from being myopic (see Firth & Wagner, 2007, p. 801). As explained by Norris and Ortega (2008), different theories within SLA conceptualize L2 learning from different perspectives, and concomitantly, āmeasurement practices differ systematically according to the varying theoretical premisesā (p. 718). Thus, we stress that SLA and language testing overlap, but not entirely; however, as Shohamy (2013) has shown, even when one is in a research area where SLA and language testing seem to not overlap, indeed, they can.
Within the incredibly broad and interdisciplinary field of SLA, one specific line of research is to employ experimental methodologies and statistical techniques to uncover and understand empirically replicable L2 outcomes, patterns, and their influencers. This line of research has a durable quantitative side, and most often necessitates robust and valid measurements of the L2, typically using tests, as well as necessitating robust and valid measurements of the constructs hypothesized to influence the L2ās development. As explained by Saito (2004), understanding āvariability in second/foreign language (L2) performance has long been a central issue in SLAā (p. 30). And this is where SLA and language testing strongly converge. Language testing expertise is useful in these SLA pursuits because defining and measuring the constructs of language ability are at the core of language testing.
Foundations that Bridge SLA and L2 Testing
Links between SLA research and L2 testing research have existed since the beginning of the field of SLA. The two areas have developed in tandem, and have emerged over the years as two parallel-running, reciprocal disciplines (Shohamy, 2000). We do not venture to write that SLA and language testing are distinct and separate disciplines (see Davies, 2012; Douglas, 2001, who seemed to suggest so), but rather we opine and observe, alongside many others who have done so in the past (i.e., Bachman, 1988; Bachman & Cohen, 1998; Gu, 2014; Huhta et al., 2014), that SLA and language testing are strongly interfaced, that is, synthetically connected by shared goals and methods. As noted by the Douglas Fir Group, or DFG (2016), and commentary on the DFG work, transdisciplinary work, such as that done by SLA and language testing researchers, can āachieve a wider and more holistic understanding than is possible from one vantage point aloneā (Hult, 2019, p. 136). āTrue collaboration between language testers or measurement specialists and measurement-informed SLA researchersā (Norris & Ortega, 2008, p. 749) is a golden ticket to expand the broader fields of SLA and applied linguistics, as described by Kunnan and Lakshmanan (2006, pp. 91ā92), because language testing researchers have built skills to accurately and validly measure L2 development and its mediating constructs, while SLA researchers have researched new constructs and contexts, and developed new hypotheses, to probe deeper and further into the mechanisms of SLA (Shohamy, 2000). As noted by Alderson et al. (2017, p. 380), SLA researchers and language assessment practitioners/researchers have long benefited from their collaborations, and together they better explore constructs that constitute or contribute to L2 development.
For example, interactional competence (see Galaczi & Taylor, Chapter 32, this volume) and willingness-to-communicate (MacIntyre & Ayers-Glassey, Chapter 18, this volume) are constructs within SLA and within language testing that seemingly influence each other. These constructs are latent traits that can be measured through tests, which record instances of them. Repeated testing of the constructs, respectively, from the same individuals, strung together, side-by-side and over time, can measure the constructsā parallel or non-parallel change or development within the measurement time frame and test-administration context(s). The longitudinal data can inform SLA growth theories, such as whether, how much, when, and why (and specifically for whom) willingness-to-communicate affects the development of interactional competence, and vice versa. A statistical model, such as a cross-lagged panel model or a longitudinal structural equation model, could be used to test such theories, if the modelās mathematical assumptions concerning the data are met. Foundationally, the measurements of the constructs must be robust and sensitive enough to document differences in the constructsā developments both within and between cases over time (Spolsky, 2000, p. 547), a measurement consideration that researchers with their psychometric hats on can investigate. In sum, it is these shared constructs, and the desire to measure them well, that bind SLA and language testing.
An important point to keep in mind, as science has shown, is that no single measure of a latent trait is ever a perfectly accurate and reliable measure of that trait. The general call is to test more than once (as more estimates will better represent the underlying construct), and as many times as would be practical and beneficial. However, the number of times to test should be driven by information that is educationally, cognitively, socially, and ethically construed. Different kinds of tests can be viewed as tapping into latent traits from different angles, thus providing different information about the traits. Some tests or assessments may measure various aspects of a latent trait better than others. Theorists can debate whether various measurements of a trait should produce converging information, or if measures that provide diverging information may be revealing novel areas of the trait, such as a new sub-construct that will need further definition and exploration. SLA and assessment specialists must speculate together how much measurement unreliability is acceptable in measuring their shared constructs (see Swain, 1993), because it could be the case that when a measurement is not perfect, it can still produce scores that are useful or valuable for specific types of theorizing, or for particular real-world, decision-making purposes. At the interface of SLA and language testing, a well-i...