A Comprehensive Critique of Student Evaluation of Teaching
eBook - ePub

A Comprehensive Critique of Student Evaluation of Teaching

Critical Perspectives on Validity, Reliability, and Impartiality

Dennis E. Clayson

Compartir libro
  1. 152 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

A Comprehensive Critique of Student Evaluation of Teaching

Critical Perspectives on Validity, Reliability, and Impartiality

Dennis E. Clayson

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

This thought-provoking volume offers comprehensive analysis of contemporary research and literature on student evaluation of teaching (SET) in Higher Education.

In evaluating data from fields including education, psychology, engineering, science, and business, this volume critically engages with the assumption that SET is a reliable and valid measure of effective teaching. Clayson navigates a range of cultural, social, and era-related factors including gender, grades, personality, student honesty, and halo effects to consider how these may impact on the accuracy and impartiality of student evaluations. Ultimately, he posits a "popularity hypothesis", asserting that above all, SET measures instructor likability. While controversial, the hypothesis powerfully and persuasively draws on extensive and divergent literature to offer new and salient insights regarding the growing and potentially misleading phenomenon of SET.

This topical and transdisciplinary book will be of great interest to researchers, faculty, and administrators in the fields of higher education management, administration, teaching and learning.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es A Comprehensive Critique of Student Evaluation of Teaching un PDF/ePUB en línea?
Sí, puedes acceder a A Comprehensive Critique of Student Evaluation of Teaching de Dennis E. Clayson en formato PDF o ePUB, así como a otros libros populares de Didattica y Valutazione e giudizio nella didattica. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial
Routledge
Año
2020
ISBN
9781000281927

1Issues and Debates Surrounding Student Evaluations of Teaching

What Are the Issues?

Louis thought of himself as a good teacher. He also loved research. These two passions were what led him to become a professor. Even as an undergraduate, he found he could explain things to his classmates so they readily understood them. Numerous times, someone he helped would say, “Why didn’t the professor explain it like that?” Upon gaining his doctorate, Louis went to work for a private college where he enjoyed his interaction with students, and felt pride in their achievements. Yet, in the second decade of teaching, Louis saw his student evaluations go from the top 90th percentile to the bottom 20th percentile. In other words, he, in the eyes of his students, went from one of the best instructors to one of the worst. What could have caused such a dramatic reversal? The only change in Louis’s life was a change in schools. He left the private school where he had taught for almost ten years for a larger public university, which increased his salary and gave him more opportunities for research. Prior to his move, he was twice nominated by graduating seniors for the college’s faculty award, the highest honor given by the college to faculty. In his last year, he was nominated by his peers to give the faculty lecture, another honor, this time by his colleagues. After his first year at the new university, he was shocked to find his student evaluations were lower than he had ever experienced. Years later, his scores remained below average. Louis became convinced there was something wrong with the evaluation process. His approach in the classroom had not changed. His personality and his ability to explain complicated material had not been suddenly modified. He knew his students were different, but were they that much different? They were all undergraduate students. Nothing he had done had created this change, so what was the evaluation actually measuring? Was he a good teacher or was he not? Where once Louis couldn’t wait to get into the classroom, he now dreaded each new class.
What can we tell Louis? Were the evaluations his students completed almost every term assessing his actual teaching ability? If not, then what were they measuring? Perhaps they were a measure of Louis himself, independent of his teaching abilities. Yet, what he was doing in the classroom had not changed, and he had not changed either. On the other hand, the students were different. On average, the students in the private college had better standardized scores and came from a higher socioeconomic background, but was that enough to create such a dramatic shift in his evaluations? Were the evaluations just a measure of compatibility? If a good teacher could reach a certain group of students, and not another, did that then imply the evaluations were a measure of the students themselves, and only a secondary indicator of the instructor? Would not accepting his evaluations at face value suggest that “good” or “effective” teaching was whatever the students said it was? Nevertheless, shouldn’t a teacher adapt to his or her students?
Louis, in an interview with his dean, suggested his classes were too rigorous for the students at his present school, but the dean assured him this was one of the primary reasons he was hired. “Our students need to be challenged,” she said. Later, Louis smiled at the irony, mentally noting that the connection between rigor and learning had simply been assumed. No one had suggested his students be independently tested to discover if they were actually learning. Further, with her statement, the dean was admitting that she didn’t believe the evaluations and learning were necessarily related. All Louis really knew was his students at the new school did not like him.

Introduction

There have been divisive issues among academicians in the past, but few have been as well researched and long-lasting as the discussion about the student evaluation of teaching (SET). Every aspect of the process has been investigated. Even the title of the evaluations became a matter of debate. The issue revolved around what students were doing, and what they were qualified to do, when they responded to the forms. Some suggested students were not qualified to evaluate instruction, but they could “rate” their experience in class by utilizing a ranking scale. In this view, the “evaluation” is not actually performed by students, but by professionals utilizing student input. Others insisted students were qualified to evaluate teaching because they are the ones actually present when the instruction takes place (Berk, 2013; Hativa, 2014). In this historical debate, it is enlightening to see what some of the oldest rating scales were called. Early instruments at the University of Washington were titled a “Survey of Student Opinion of Teaching” (Langen, 1966), while those at Michigan State were simply called the “Teacher Evaluation Sheet” (Dressel, 1961). At Purdue, it was a “Rating Scale for Instruction” (Page, 1974) and at the University of Minnesota, the form was a “Student Opinion Survey” (Doyle, 1975). Note that the titles not only reflect a wide range of views, they also imply there was no consensus about what the process was designed to measure. Is the purpose to create a scale to supposedly measure a wide expanse, summarized by the word “instruction,” or simply to survey “opinions?” Further, are the students attempting to measure an instructor, or what the instructor does? Consequently, for those who don’t quibble about who is evaluating whom, the terms Student Rating of Instruction (SRI) or Student Evaluation of Instruction (SEI) are used, or the more common term, Student Evaluation of Teaching (SET) (see Baldwin, Ching, & Hsu, 2018). There are also a number of widely used evaluation instruments created by researchers and consultants, including the Individual Development & Educational Assessment (IDEA) created by a research and consulting group called IDEA out of Kansas State University, and a form created by Herbert Marsh called the Student Evaluation of Educational Quality (SEEQ). As can be seen from these titles, there is little agreement about what students are actually doing, and what is supposedly being measured.
In this book, we will simply refer to the process as the Student Evaluation of Teaching (SET), realizing, as we proceed, that the title may not reflect all the nuances of the research.

History

Even with a lack of consensus, and even while a vigorous debate was occurring questioning the validity of the process, the utilization of the evaluations became, for all practical purposes, universal. Initially, much of the research was positive and justified the popularity of the procedure. However, a dramatic increase in the utilization and impact of the evaluations occurred in the last several decades, ironically just as the research on SET was becoming increasingly negative.
Although investigations of SET date from the 1920s (Gump, 2007; Langen, 1966; Wachtel, 1998), one of the first more readily accessible research papers on SET was a report of a survey and subsequent development of an evaluation procedure at the University of Washington in 1944 (Guthrie, 1949). From that survey, procedures for faculty promotion and evaluation were developed. Some of the findings from the developmental process would sound familiar to someone studying the procedure almost 80 years later. No correlation was found between teaching effectiveness and research contribution, and full professors did not rate better than assistant professors. During the 1960s, there was an increase in the interest of students evaluating faculty, but many schools did not embrace SET, even though there was a growing consensus the instruments were, “systematic and tangible kinds of evidence for evaluation of teaching performance” (Centra, 1977, p. 19 of 26). By 1973, it was reported that 28% of campuses used some sort of student evaluation of instructors. By 1984, the number had increased to 68%, and by the early 1990s, 86% of universities used SET for important faculty decisions (Seldin, 1993). Business schools appear to be ahead of the curve; by 1994, about 95% of the deans of accredited business schools used the evaluations as a source of teaching information (Crumbley, 1995). Shortly thereafter, a study by the Carnegie Foundation found 98% of universities were using some form of student evaluations (Magner, 1997). At about the same time, another study reported 99% of business schools utilized evaluations by students, and deans placed a higher importance on these evaluations than either administrative or peer assessments (Comm & Manthaisel, 1998). A more recent American Association of University Professors (AAUP) poll (Vasey & Carroll, 2016) found only 4% of instructors reported the student evaluations were not required. Yet, even for this small group, the evaluations were still recommended. Currently, it would be difficult to find a university that does not utilize some form of the student evaluation of teaching.
Not only was the utilization of the instruments becoming normative, on many campuses SET became the most important and, in many cases, the only measure of teaching ability (Wilson, 1998). The instruments were also being used to make important non-instructional decisions. In one survey, almost 90% of accounting professors reported SET instruments were used to determine tenure decisions, and 70% said the evaluations were utilized to determine merit pay (Crumbley & Reichelt, 2009). Seldin (1999) reports a California dean as saying, “If I trust one source of data on teaching performance, I trust the students” (p. 15).
As would be expected, the universal utilization of an assessment that could establish reputations, merit pay, promotion, and tenure would be extensively researched. As early as 1990, it was reported at least 2,000 citations to SET existed (Feldman, 1997; Centra, 2003). One source stated there was close to 3,000 articles published on SET just in the 15 years between 1990 to 2005 (Al-Issa & Sulieman, 2007). Reports on the topic were so voluminous that many researchers began to utilize meta‑analysis, in which a case was not a student or class average, but an entire published article (see Clayson, 2009; Cohen, 1980, 1981; Feldman, 1989; Spooren, Brockx, & Mortelmans, 2013; Stephen, Wright, & Jenkins-Guarnieri, 2011; Uttl, White, & Gonzalez, 2017 as examples). Nevertheless, little agreement had been made on key points. The defenders of the system were typically found in the colleges of education, and among those who consulted in the area. Some defended the evaluations almost as if they were religious tenets, and even referred to sources who identified contrary findings in strong and uncharacteristically negative terms (Aleamoni, 1999; Marsh, 1984; Marsh & Hattie, 2002; Marsh & Roche, 2000). These advocates typically had an advantage in the publication process since pedagogic research is the essential academic work of their profession. Other disciplines generally look upon research on instruction as less prestigious, and those opposed to the evaluation process are more dispersed among academic disciplines and more isolated in their publication outlets. They were, however, equally emphatic. In such an environment, it became relatively easy to select research findings that reinforced a point of view.
The following summary of the evaluation process is not free of these problems, but it does attempt to present information from a wider assortment of venues than is found in much of the traditional educational discipline outlets.

Question of Era

There are era and cultural matters related to SET. This is an issue which should influence our understanding of the evaluation process, but one that is rarely addressed. As previously indicated, there has been a change in the consensus about the validity of SET. Much of the current literature is negative, but, by the mid-1980s, the existing research on the evaluations was positive enough that negative attitudes toward them were referred to as “myths,” “half-truths,” and “witch hunts” (Aleamoni, 1987, 1999; Feldman, 1997; Marsh & Hattie, 2002; Theall & Franklin, 2001), a response that has been perpetuated by some compilers who have attempted to summarize the data (Gravestock & Gregor-Greenleaf, 2008; Hativa, 2014)
There were several reasons for this pre-millennial optimism.
First, a large amount of research had occurred. As a prominent scholar at the time noted, “Probably, students’ evaluations of teaching effectiveness are the most thoroughly studied of all forms of personnel evaluation, and one of the best in terms of being supported by empirical research” (Marsh, 1987, p. 369). As previously noted, at least 2,000 reports of SET existed before 1990 (Feldman, 1997). Much of this research, especially the research published in the top journals, was positive.
Second, most of the research was conducted and published fr...

Índice