1
Variability in Evaluation Practice
Best practices have become the most sought after form of knowledge.
Not just effective practices, or decent practices, or better practicesâbut best.1
A focus on best practices, toolkits, practice guidelines, and the like arises fairly naturally in the context of concerns about variation in professional practice. The terms variation and variability, as commonly used, signify unevenness, deviation, or divergence from norms or standards, discrepancy, or inconsistency.
Every profession is concerned about variability both for reasons of quality assurance and from the standpoint of ensuring that practitioners consistently deliver on the fundamental purpose of the practice regardless of the particular circumstances in which they work.
There are two general views on the nature of variability in professional practice:2 One holds that variation is a challenge to the rational basis of practice and could be eliminated if practitioners had clear guidelines, including protocols and rules for decision making. Researchers as well as some practitioners who argue that decisions made in practice are too often based on habit and intuition and lack a firm grounding in empirical evidence endorse this view. (One contemporary example of this idea is evident in the book Moneyball, the story of Billy Beane, who pioneered the use of sabermetricsâobjective knowledge about baseball playersâ performance based on playersâ statisticsâversus the tradition of relying on the intuitions of a teamâs scouts to evaluate players.)3 Concerns about restoring the rational basis of practice are also informed, in part, by nearly fifty years of empirical studies examining whether predictions made by expert clinicians are superior to those made by simple statistical rules or algorithmsâabout 60% of the studies have shown significantly better accuracy for the statistical rules.4 Finally, support for this way of thinking also comes from those who argue that practice ought to be primarily technically based; that is, it should consist of the application of scientifically validated knowledge.5
A second view holds that variability is inherent in multiple dimensions of the environment where a practice is performed (think of variation in both resources and the composition of patient populations in hospitals in rural versus metropolitan areas, for example) and thus it is always an aspect of normal practice. In this view, by definition, practice involves flexibility and constant adjustments and modifications. Generally, those who hold this view subscribe to a judgment-based view of practice as comprised of actions informed by situated judgments of practitioners.6 Rather than encouraging the development of practices that are protocol-driven and rule following, advocates of this view of variability support the idea of developing practical wisdom. They also challenge the idea that intuition often employed by practitioners is an irrational, unscientific process and cannot be improved or sharpened, so to speak.7 However, this perspective on variability as a normal dimension of practice does not necessarily mean that addressing and managing variation in practice is not a problem.
Responses to practice variation encompass a range of different actions. Evidence-based approaches to practice promote the use of protocols, practice guidelines, and in some cases rules (consider rules regarding nursing care for patients with dementia, for example) for how practitioners should provide services.8 Another response involves developing performance measures for practitioners based on protocols. In order to identify behaviors or actions considered outside practice norms, a practitionerâs performance is measured and compared to standards or targets; in other words, professional practice is audited and the results of the audit fed back to the practitioner to change behavior.9 Still other responses involve developing lists of best practices and toolkits that, while perhaps not intended to achieve complete standardization of practice, aim to help practitioners operate on some common ground with shared understandings of concepts, methods, ethical guidelines, and so on.10
As suggested by the broad description of the field in the Prologue, variation in evaluation practice is common. No doubt, heightened awareness of this state of affairs of the practice and the motivation to address it have been fueled by the evidence-based movement that has developed across the professions of nursing, social work, teaching, counseling, and clinical medicine. At the heart of this movement is the idea that practitioners ought to use models and techniques that have been shown to be effective based on scientific research. (However, whether what are often touted as best practices are actually backed by scientific evidence is another matter.) Exactly where and when the idea of best practices that originated in the business world migrated to the field of evaluation is not clear, yet the term is no longer a buzzword confined to business enterprises. The literature is full of best practice approaches for evaluating just about everything, including leadership development programs, faculty performance, think tanks, public health interventions, and teacher education programs, to name but a few targets.11 Moreover, there is a growing sense in the field that although evaluators operate in a world marked by complex contextual conditions, that world âis not so fluid that meaningful patterns cannot be appreciated and used as a basis for action.â12 Hence, in recent years we have witnessed efforts to develop practice guidelines for matching methods to specific evaluation circumstances, as well as guidelines for choosing appropriate means for determining program value in different contexts.13
Multiple sources of variability in evaluation practice will be discussed throughout this book. Here, I focus on four primary sources: how evaluation is defined, what methods an evaluator ought to employ, how the professional evaluator relates to and interacts with parties to an evaluation, and how the purpose of the practice is understood.
Defining âEvaluationâ
There is no universally agreed upon definition of evaluation, although there are two primary points of view. The first emphasizes that evaluation is an activity concerned with judging value; the second views evaluation as a form of applied research.
In a precise senseâwhat one would find in dictionary definitionsâevaluation refers to the cognitive activity of determining and judging the value of some object, which could be an activity, event, performance, process, product, policy, practice, program, or person. Evaluation is a matter of asking and answering questions about the value of that object (its quality, merit, worth, or significance).14 The four-step logic involved in doing an evaluation defined in this way is as follows:
1. Select criteria of merit (i.e., those aspects on which the thing being evaluated must do well on to be judged good).
2. Set standards of performance on those criteria (i.e., comparative or absolute levels that must be exceeded to warrant the application of the word âgoodâ).
3. Gather data pertaining to the performance of the thing being evaluated on the criteria relative to the standards.
4. Integrate the results into a final value judgment.15
Defenders of this understanding of evaluation argue that unless one is using this logic and employing evaluation-specific methodologies directly concerned with means of determining valueâfor example, needs and values assessment and evaluative synthesis methodologies (combining evaluative ratings on multiple dimensions or components to come to overall conclusions)âone is literally not doing evaluation.16 In this view, evaluation is a particular kind of critical thinking that follows a specific logic of analysis necessary for appraising value. That logic can be applied to the evaluation of literally anything. A strong statement of this view appears in the Encyclopedia of Evaluation, where we find that it is the âjudgment about the value of something . . . that distinguishes evaluation from other types of inquiry, such as basic science research, clinical epidemiology, investigative journalism, or public polling.â17
However, many practitioners of evaluation define it differently as a specific type of applied social science research (i.e., evaluation research) concerned with the processes of collecting, analyzing, interpreting, and communicating information about how a program or policy is working and whether or not it is effective.18 These practitioners employ the standard tools of the social scientist (experiments, surveys, interviews, field observations, econometric methods) to monitor program processes and to answer questions of whether a policy or program works and why. A prominent concern in evaluation research is establishing the causal link between a program or policy and intended outcomes. A strong advocate of this way of understanding evaluation summarized it as follows: âEvaluation is social research applied to answering policy-oriented questions. As such, an important criterion for judging evaluations is the extent to which they successfully apply the canons of social science.â19
The central issue here is whether this disagreement in definition is a difference that makes a difference, and for whom. From the perspective of many agencies both domestically and internationally that promote and conduct evaluation, it appears that the two perspectives are sometimes combined and broadly interpreted in yet even different ways. For example, the W.K. Kellogg Foundation defines evaluation not as an event occurring at the completion of program but as a process âproviding ongoing, systematic information that strengthens projects during their life cycle, and, whenever possible, outcome data to assess the extent of changeâ and âhelps decision makers better understand the project; how it is impacting participants, partner agencies and the community; and how it is being influenced/impacted by both internal and external factors.â20 The United Nations Development Programme (UNDP) Independent Evaluation Office defines evaluation as âjudgment made of the relevance, appropriateness, effectiveness, efficiency, impact and sustainability of development efforts, based on agreed criteria and benchmarks among key partners and stakeholders.â21 For the U.S. Government Accountability Office, âa program evaluation is a systematic study using research methods to collect and analyze data to assess how well a program is working and why.â22 Finally, the Department of International Development (DFID) of the U.K. government argues that evaluation is a collection of approaches âfocuse[d], in particular, on whether planned changes have taken place, how changes have impacted, or not, on different groups of people and investigates the theory behind the change. . . .â23
From another perspective, held by at least some evaluators, it is quite important to take an analytical philosophical approach to answering the question, âWhat can and should legitimately be called the activity of âevaluatingâ irrespective of the circumstances in which it is conducted, the object being evaluated, and expectations for its use?â It is important because defining what is actually âevaluationâ is intimately related to establishing oneâs professional identity as an evaluator as distinct from the identity of others engaged in related pursuits like applied social research, program auditing, organization development, and management consulting. After all, claims to expertise are built around distinctive knowledge, theories, and methods. If evaluators cannot lay claim to and agree on the definition and central purpose of their practice (and their own unique body of knowledge) then their identity as a specific kind of professional practice is at risk.24 Other evaluation practitioners appear to be less concerned with this matter of professional identity, or at least do not see that the issue depends on resolving this definitional dispute. Many who regard evaluation as the application of social science methods are content to view evaluation as a compendium of approaches, concepts, and methods serving multiple purposes and taught as a specialization within social science fields such as economics, psychology, sociology, and political science.
The definitional problem surfaces in an additional way in the international development evaluation ...