1
Perspectives on the Validity of Classroom Assessments
Michael T. Kane and Saskia Wools
DOI: 10.4324/9780429507533-2
This chapter examines how some general principles of validity theory might apply to classroom assessment. In particular, we consider two perspectives on the evaluation of classroom assessments, a functional perspective and a measurement perspective, and we consider how these two perspectives play out in classroom assessments. We suggest that the functional perspective does and should play a larger role in classroom assessment than the measurement perspective.
For all assessments, validity is an important concern (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014). The concept of validity has been developed mainly in the context of summative high-stakes testing, but we will discuss validity for classroom assessment and emphasize the evidence needed for the validation of assessments in this context.
We define validity in terms of the plausibility and appropriateness of the interpretations and uses of assessment results, and therefore validity depends on the requirements inherent in these interpretations and uses. A systematic and effective approach to validation involves three activities: the development of a clear sense of the proposed interpretation and uses of the assessment results; the development (or identification) of an assessment that would be expected to support the intended interpretation and uses; and an evaluation of how well the assessment supports the interpretation and uses.
Cronbach (1988) described two perspectives on the validity of assessments, a measurement perspective and a functional perspective, and we make use of both of these perspectives in evaluating the validity of classroom assessments. The measurement perspective focuses on the accuracy and precision of scores as measures of some construct, and the functional perspective focuses on how well the assessment serves its intended purposes. The measurement perspective and the functional perspective are both relevant to the validation of all assessments, but they focus on different evaluative criteria. We will argue that for classroom assessment, the functional perspective is of central concern, and the measurement perspective plays a supporting role.
We define classroom assessment broadly as involving the collection of information from a variety of sources, with the intention of promoting effective teaching and learning. Classroom assessments take a variety of forms, such as teacher observations of the students in various contexts, interactions with students, quizzes, tests, assignments, and projects. This variety causes classroom assessments to be quite varied in their levels of standardization and formality, but it provides very rich sources of information on student performance, skills, and achievement. Classroom assessments also serve a variety of purposes (e.g., monitoring student progress, diagnosing gaps and problems in learning, motivating students, and informing parents and others about student performance and progress). The main users of these assessments are teachers and students.
The validity of classroom assessments will depend mainly on how well they support the intended uses of the assessment results by teachers and students. Although all potential uses of classroom assessments might be informative to discuss, in this chapter we will focus on the use of the results by teachers for providing feedback to students, evaluating student competencies on particular tasks and over content domains, and diagnosing studentsā strengths and weaknesses.
When validity is studied in the context of large-scale high-stakes tests, the technical, or psychometric, characteristics of the tests play a central role. In these high-stakes contexts, those characteristics include, for example, standardization, consistency, and fairness (Cronbach, 1988). Since the results from these standardized tests are used for high-stakes decisions that extend well beyond the context in which the assessment took place, standardization and empirical evidence for consistency over contexts serve an important function in supporting trust in the processes being employed and in the trustworthiness of the results (Porter, 2003).
In a classroom, assessment-based decisions generally involve less far-reaching inferences. Rather, the results are interpreted and used locally. The results need to be practical and useful in fulfilling the main goal of classroom assessment: promoting effective teaching and learning. These decisions are generally less high-stakes than those based on standardized test results, but this does not imply that technical characteristics become irrelevant. An inaccurate conclusion about a studentās ability might not be catastrophic, but it is not likely to be helpful in planning future instruction, and therefore in supporting learning. For classroom assessments, a functional perspective that focuses on how well the assessment promotes learning by improving the quality of instruction is the central concern, and measurement characteristics are of concern mainly in terms of their impact on the effectiveness of the assessment in supporting teaching and learning.
The bottom line in validating classroom assessments (as in all assessments) is to identify the qualities that the assessment results need to have, given their particular interpretations and uses in the context at hand, and then to examine whether the assessment results meet these requirements.
The next section outlines an argument-based approach to validation, and the following section describes the functional and measurement perspectives on validation. The two perspectives are complementary in that each focuses on characteristics that are necessary for an effective assessment, but the relative importance of the two perspectives in evaluating an assessment will vary depending on the goals and contexts of the assessment. In the third section, we describe some uses of classroom assessments and examine how these assessments might be evaluated in terms of interpretations and uses and the two perspectives. We conclude that the functional perspective should be primary in classroom assessment, with the measurement perspective playing a supporting role in this context.
Argument-Based Approach to Validation
As indicated earlier, the validity of assessment interpretations and uses depends on the plausibility of the interpretation and the appropriateness of the uses. A natural approach to validation is to specify the interpretation and use, develop (or identify) an assessment program that would be expected to meet the specified requirements, and then evaluate how well the interpretations and uses are justified. Validation is most often associated with the last of these three steps, but in fact it depends critically on all three steps.
The argument-based approach to validation (Cronbach, 1988; Crooks, Kane, & Cohen, 1996; House, 1980; Kane, 2006, 2013; Shepard, 1993) provides a general framework for specifying and validating interpretations and uses of assessment results. If we are going to make claims and base decisions on assessment results, these claims and decisions should be well founded (AERA et al., 2014; Messick, 1989).
A relatively simple and effective way to specify proposed interpretation and uses of the assessment results is to develop an interpretation/use argument (IUA) that lays out the reasoning leading from observed assessment performances to the claims being made. The general idea is to identify the inferences and assumptions inherent in the interpretations and uses of the assessment results.
The argument-based approach is contingent in the sense that the structure of the validity argument and the conclusions reached about validity depend on the structure and content of the IUA. For modest interpretations that do not go much beyond the observed performances, the IUA will be modest, including few inferences and assumptions; for ambitious interpretations (involving broad generalizations, constructs, or predictions), the IUA will require strong inferences and supporting assumptions. If the IUA is found wanting, because it lacks coherence and completeness or because the evidence does not support some of its inferences and assumptions, the interpretation and use would not be accepted as valid. If the IUA is coherent and complete, and its inferences and assumptions are adequately supported, the proposed interpretation and uses can be considered valid. The inferences based on classroom assessments tend to be local and limited, and therefore do not require strong assumptions.
Interpretation/Use Arguments (IUAs)
The IUA is to provide an explicit statement of the sequence or network of inferences and supporting assumption that gets us from the observed performances to the claims based on these performances. The inferences are supported by warrants, which are general rules for making claims of a certain kind based on certain kinds of data. Warrants are based on assumptions and generally require backing, or support. For example, in drawing conclusions about a studentās level of competence in a domain on the basis of a sample of performances, we rely on a warrant that says that such generalizations are reasonable, and this warrant can be backed by evidence indicating that the sample is large enough and representative enough to support the generalization. The IUA would consist of a sequence or network of such inferences leading from the assessment results to the conclusions and decisions based on these performances.
The IUA provides a general framework for drawing inferences based on assessment results, and thereby for interpreting and using the assessment results for individual students. Although they may not be explicitly mentioned in discussing the results, the warrants for various inferences are integral parts of the IUA. Assuming that the warrants employed in the IUA are supported by appropriate evidence, the IUA provides justification for claims and decisions based on assessment results.
Validity Arguments
The validity argument provides an overall appraisal of the IUA, and thereby of the proposed interpretation and uses of the assessment results. It depends on the scope and content of the IUA, which specifies the inferences and assumptions that need to be evaluated. A simple interpretation in terms of skill in performing a particular kind of task (e.g., solving two-digit addition problems presented horizontally, such as ā23 + 46 = . . .ā) would focus on the adequacy of sampling of this type of task as a basis for deciding whether students can solve this kind of problem. Assessments of more broadly defined domains of skill would typically require more evidence and more kinds of evidence.
The validity argument starts with a critical review of the IUA, with particular attention given to identifying the most questionable inferences and assumptions. Many assumptions may be accepted without much discussion. Some assumptions may be evaluated in terms of the appropriateness of the procedures used (e.g., the relevance of observed performances to the skill of interest, the size of the sample of observations). Some assumptions (e.g., that the students were motivated to perform well) may be based on experience and/or observations made during the assessment.
In order to make a strong case for an interpretation or use of assessment results, the validity argument has to provide backing for the IUA as a whole, and particularly for its most questionable inferences and assumptions. Serious doubts about any inference or assumption can raise questions about the IUA as a whole. Therefore, the IUA needs to be understood in enough detail so that the inferences and assumptions on which it depends can be identified and evaluated. A validity argument is never definitive because we cannot exhaustively evaluate all of the IUA, and therefore the most doubtful parts of the argument should get the most attention. As Cronbach (1980) suggested, āThe job of validation is not to support an interpretation, but to find out what might be wrong with it. A proposition deserves some degree of trust only when it has survived serious attempts to falsify itā (p. 103). The question is whether the interpretation and use of the assessment results makes sense, given all of the evidence.
Note that it is not necessary to be concerned about assumptions that are not included in the IUA. For example, if the proposed interpretation and use assumes that the attribute being assessed would not vary much over extended periods of time, we would be concerned about the extent to which the performances are stable over time. But if the characteristics being assessed are expected to vary (e.g., due to learning), stability would not be required, and it might even constitute evidence against the validity (the instructional sensitivity) of the assessment.
The basic ideas guiding the argument-based approach is that we should be clear about the reasoning that is to take us from observed student performances to conclusions about the student, and that we should critically evaluate this reasoning and its embedded assumptions.
Perspectives on Assessment
Assessments can be evaluated from multiple perspectives, and it is generally helpful to consider the evaluative criteria associated with different perspectives (Cronbach, 1988: Dorans, 2012; Holland, 1994). Different perspectives focus on different aspects of interpretation and use, and therefore on different criteria for evaluating validity. The perspectives are not mutually exclusive, and any that are relevant in a particular case deserve attention.
Addressing concerns about the assessmentsā interpretation and use from multiple perspectives may seem like a major burden, but it is not particularly burdensome if the evaluation is approached reasonably; in fact, it may facilitate the process of validation. It has long been recognized that validation requires that the assessment results be evaluated by identifying potential challenges (e.g., sources of bias, construct-irrelevant variance, construct underrepresentation) and evaluating their impact (Cronbach, 1988), and the different perspectives can be a fruitful source of legitimate challenges to proposed interpretations and uses.
We will consider two perspectives on classroom assessment, the functional perspective and the measurement perspective. As noted earlier, the functional perspective focuses on how well the assessments support the attainment of various goals in some contexts, while a measurement perspective focuses on the assessment as a measurement instrument (i.e., in terms of precision and accuracy of the results). Assessment uses need to achieve the purpose for which they are intended, and they need to be defensible as measurements. Both perspectives can be accommodated in an argument-based approach to validation that supports the claims inherent in the intended interpretations and uses of assessment results, and that addresses challenges to these interpretations or uses.
The Functional and Measurement Perspectives
The functional perspective (Cronbach, 1988) views assessments primarily as tools that can be helpful in realizing desired outcomes, and therefore it focuses on how well the intended outcomes are achieved and on the extent to which undesirable outcomes are avoided. From a functional perspective, an assessment is evaluated mainly in terms of its consequences, intended and unintended.
Cronbach (1988) begins his discussion of the functional perspective by contrasting it with more descriptive concerns about the accuracy of interpretations:
(p. 5)
The functional perspective is concerned with the functional worth, or utility, of the assessment in achieving the goals that it is intended to help achieve. An assessment is implemented to achieve some purpose, and it is evaluated in terms of its functional worth in achieving this purpose.
The measurement perspective views assessments primarily as measurement instruments, and as a result it focuses on certain technical criteria, particularly the generalizability (or reliability) of scores and their accuracy as estimates of the attribute of interest. It emphasizes standardization and objectivity (Porter, 2003) and generally relies on statistical m...